Dissertations and Theses

Date of Award

2025

Document Type

Thesis

Department

Engineering

First Advisor

Hao Tang

Second Advisor

Zhigang Zhu

Keywords

Machine Learning, Audio Data Processing, Assistive Navigation, Blind or Low Vision

Abstract

Navigating urban environments poses significant challenges for blind and low vision (BLV) individuals, particularly at street intersections where determining when it is safe to cross can be life-threatening. In New York City, where pedestrian fatalities are on the rise and only 2% of intersections are equipped with Accessible Pedestrian Signals (APS), alternative solutions are urgently needed. This thesis proposes an audio-based deep learning approach to support BLV individuals at crosswalks by detecting traffic movement direction and idling states using spatial sound. With 4-channel audio capturing capabilities of wearables, such as Meta Project Aria glasses, we explore state-of-the-art sound event localization and detection (SELD) models, specifically ResNet-based feature extractors and transformer-enhanced architectures such as EINV2. To train these models, a synthetic dataset of quadraphonic audio from simulated 4-way traffic scenes is generated using Unity Engine, addressing both the scalability and privacy concerns of real-world data collection. This work begins an exploration into a hands-free, audio-centric crosswalk navigation aid for BLV individuals.

Included in

Data Science Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.