Graduation Year

2023

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Jarred Ligatti, Ph.D.

Committee Member

Sean Barbeau, Ph.D.

Committee Member

Paul Rosen, Ph.D.

Committee Member

Tempestt Neal, Ph.D.

Committee Member

Yueng DeLaHoz Isaza, Ph.D.

Keywords

data science, high-peformance computing, visualization

Abstract

According to the Population Division of the United Nations, in the United States, almost 90% of the population will live in urban areas by the year 2050. As the population in a given area increases, higher traffic congestion follows due to an increase of vehicles in the road. A possible way to alleviate congestion could be with widespread use of public transit. However, according to the US Census Bureau, the percentage of individuals commuting through public transportation has been decreasing steadily over time, and the American Community Survey reports that during 2019, only around five percent of the US population reported relying on public transit on a daily or weekly basis.

In order to reverse the trend of decreasing ridership, changes to how cities organize their communities and transit systems need to take place. Part of these efforts include improving information systems operated by transit agencies through artificial intelligence and other means to aid operators in the decision-making process and help riders be better informed of any changes to their commute.

This work pertains to the implementations and improvements to techniques used when performing machine learning activities in the public transportation space. Standard machine learning pipelines are organized as four modules: data collection, feature extraction, model learning and evaluation. Throughout this dissertation, three of the four modules are evaluated, their limitations explored, and addressed through novel techniques that address issues such as: reproducibility across datasets, technique transfers across transit operators and decreasing the barrier of entry into the field.

The data collection module is transformed from an active process of limited observations to a passive process capable of capturing all of the available data a transit operator has available. The feature extraction module evolves from a sequential process involving the combination and preprocessing of tabular data into a module capable of generating datasets from hierarchial time-series data streams at a rate of thousands of records per second. This technique would allow feature generation to participate as part of the fine-tuning process. Finally, the evaluation module introduces new standards for evaluating public transit models by proposing visualizations and test sets based on routes and times rather than a single-value performance metric. The module also proposes the use of shifting binary classification techniques to provide a easier-to-understand insights in model performance.

Share

COinS