Dissertations and Theses
Date of Award
2025
Document Type
Thesis
Department
Engineering
First Advisor
Hao Tang
Second Advisor
Zhigang Zhu
Keywords
Machine Learning, Audio Data Processing, Assistive Navigation, Blind or Low Vision
Abstract
Navigating urban environments poses significant challenges for blind and low vision (BLV) individuals, particularly at street intersections where determining when it is safe to cross can be life-threatening. In New York City, where pedestrian fatalities are on the rise and only 2% of intersections are equipped with Accessible Pedestrian Signals (APS), alternative solutions are urgently needed. This thesis proposes an audio-based deep learning approach to support BLV individuals at crosswalks by detecting traffic movement direction and idling states using spatial sound. With 4-channel audio capturing capabilities of wearables, such as Meta Project Aria glasses, we explore state-of-the-art sound event localization and detection (SELD) models, specifically ResNet-based feature extractors and transformer-enhanced architectures such as EINV2. To train these models, a synthetic dataset of quadraphonic audio from simulated 4-way traffic scenes is generated using Unity Engine, addressing both the scalability and privacy concerns of real-world data collection. This work begins an exploration into a hands-free, audio-centric crosswalk navigation aid for BLV individuals.
Recommended Citation
Lam, Wayne, "Utilizing Deep Learning Audio Models for Blind and Low Vision Crosswalk Assistance" (2025). CUNY Academic Works.
https://academicworks.cuny.edu/cc_etds_theses/1273
