Dissertations and Theses

Date of Award


Document Type



Civil Engineering

First Advisor

Reza Khanbilvardi

Second Advisor

Naresh Devineni

Third Advisor

Tarendra Lakhankar


urban flooding, data science, machine learning, hydrology, statistical modeling


Street flooding is problematic in urban areas, where impervious surfaces, such as concrete, brick, and asphalt prevail, impeding the infiltration of water into the ground. During rain events, water ponds and rise to levels that cause considerable economic damage and physical harm. The main goal of this dissertation is to develop novel approaches toward the comprehension of urban flood risk using data science techniques on crowd-sourced data. This is accomplished by developing a series of data-driven models to identify flood factors of significance and localized areas of flood vulnerability in New York City (NYC). First, the infrastructural (catch basin clogs, manhole issues, and sewer back-ups) and climatic (precipitation) contributions toward street flooding are investigated by using Stage IV radar precipitation data and crowd-sourced sewer reports (NYC 311 complaints), spanning a 10-year period. By applying a Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis, with an embedded Zero-Inflation (ZI) model, the variables statistically significant as predictors, specific to each zip code, are detected. Second, with an intent to understand how factors affect the spatial variability of street flooding, the Random Forest regression machine learning algorithm is employed, where the 311 street flooding reports serve as the response, while the explanatory variables include topographic and land feature, physical and population dynamics, locational, infrastructural, and climatic influences. This model also analyzes socio-economic variables as predictors, as to allow for better insight into potential reporting biases within the NYC 311 crowdsourced platform. Third, utilizing the machine learning method of hierarchical clustering, the NYC zip codes are further analyzed for flood susceptibilities. The three variables are street flooding reports, catch basin blockages reports and radar precipitation data. Aggregated to the zip code level, the severe days of precipitation and street flood occurrence, over a ten-year period, are examined. Then, by the application of the algorithm, the zip codes with similar joint behavior (rainfall, street flooding and catch basin complaints) are clustered. Therefore, using crowdsourced data, three data driven models have been created, revealing the significant flood factors of NYC, the causes of variability among neighborhoods, and areas prone to urban flooding.

Localized urban flood forecasting proves to be a difficult undertaking in major U.S. metropolitan areas. In these cities, the drainage information may be incomplete, or the access to the underground system may be restricted. Subsequently, with the capacity of the urban system unknown, traditional rainfall-runoff calculations are unrealistic. This research advances our knowledge of the variables associated with urban flooding, and, by various data analytic techniques, determine the extent of their effects within the study area of NYC. The research further builds upon this understanding of the factors to develop an urban risk zones map, pinpointing the localized areas (zip codes) of which street flooding will likely occur when there is a forecasted rain event. Utilizing regression and machine learning methodologies, with a unique investigation into infrastructural elements from crowd-sourced data, invaluable information towards advancements in urban flooding detection and prevention is provided.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.