Student Theses

Date of Award

Spring 5-28-2026

Document Type

Thesis

Language

English

First Advisor

Arthur J. O'Connor

Abstract

Forecasting ridership for new transit infrastructure is difficult in the absence of observed outcomes, particularly under domain shift between an existing system and a proposed corridor. This study develops a station-level direct demand modeling (DDM) framework to forecast average weekday ridership for the proposed Interborough Express (IBX) in New York City — a 14-mile circumferential rapid transit corridor connecting Brooklyn and Queens. The approach pairs unsupervised learning with supervised estimation in a common feature space defined by transit service, accessibility, and built-environment characteristics. K-means clustering identifies latent station typologies (node–place regimes), and IBX stations are projected into this topology to define a corridor-relevant training subset, a strategy termed cluster-based domain focusing. Forecasting models — ordinary least squares, regularized linear estimators (Ridge, Lasso, Elastic Net), and tree-based ensembles (Random Forest, Gradient Boosting, XGBoost) — are evaluated through holdout testing and cross-validation. Regularized linear models outperform nonlinear alternatives in both predictive accuracy and stability, with Ridge regression selected as the preferred specification (out-of-sample R² ≈ 0.72 in the IBX-restricted domain). The resulting baseline forecast is approximately 61,000 average weekday entries, substantially below the MTA’s official projection of 115,000. Cross-model agreement is high among regularized estimators, indicating that the dominant predictive signal is approximately linear in log-transformed space and that divergence from official projections arises primarily from scenario assumptions — induced demand, long-run growth, and full network reconfiguration — rather than model misspecification. The results demonstrate that feature-space–based domain restriction provides a principled, empirically grounded framework for forecasting transit ridership under missing outcomes.

Comments

Master's Research Project (Capstone), M.S. Data Science Program, City University of New York.

Course: DATA 698 - Master's Research Project. 

Faculty Advisor: Professor Arthur J. O'Connor

April 2026

Included in

Data Science Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.