Dissertations and Theses

Date of Award


Document Type



Civil Engineering

First Advisor

Alison Conway

Second Advisor

Naresh Devineni

Third Advisor

Yunhai Zhang


transportation data, NPMRDS, clustering, congestion, performance measures


Traffic congestion is common in urban areas. It results in loss of productivity, increased risk of passenger safety, increased fuel consumption, environment pollution, etc. Improving performance measurement is a tool to improve traffic flow and capacity planning.

The main data source in this research is the National Performance Management Research Data Set (NPMRDS) for 2018. NPMRDS is a form of commercial GPS probe data, obtained from vehicles with on-board probe technologies. This dataset is licensed by the Federal Highway Administration and is made available to State Departments of Transportation (DOTs) and Metropolitan Planning Organizations (MPOs) for the purpose of federally mandated traffic performance measurement. This dataset includes average travel time measured every 5 minutes on each highway segment in 10 counties of New York, and provides passenger data, freight data and combined passenger and freight data. In this study both freight data and combined data will be used to analyze transportation performance measurements and freight congestion.

The purpose of this research is to find out what we can learn from NPMRDS, how these find outs stimulate more research in this dataset, how to utilize performance measures efficiently in evaluating congestion, how to help government agencies make more data-driven decisions.

Research in this paper is separated into five phases. Phase I is calculating classic performance measures includes: Travel Time Index (TTI), Planning Time Index (PTI), Buffer Index (BI), etc., and investigating how these performance measures change at segment/county level through exploratory data analysis and spatial analysis for both the combined and freight data sets. Understanding these basic performance measures provides a foundation for more advanced anal. Phase II aims to study a critical performance measure: Peak Period Excessive Delay (PHED), and analyze its relationship with Reliability, Road Types, and Truck Travel Time Reliability (TTTR). This phase investigates significant attributes that could affect PHED using General Linear Models (GLMs) and predict Total Excessive Delay (TED) using Artificial Neural Network (ANN). Phase III intends to find congestion patterns through testing various classic clustering algorithms (K-means, K-medoids, Hierarchical, and DB-Scan) and selecting the optimal method to apply in each county. Phase VI studies how travel speed is affected by different weather types. A significant unsupervised machine learning method, Self-Organizing Maps (SOMs), is applied to classify weather attributes. The last phase is performance based (using TTI, PTI, BI, TED obtained in previous research) classification of segments by using SOMs, and cases studies for labelled groups.

Available for download on Thursday, August 25, 2022