Dissertations, Theses, and Capstone Projects

Date of Degree


Document Type


Degree Name



Earth & Environmental Sciences


Brett Branco

Committee Members

Matt Hipsey

Zhongqi Cheng

Hamidreza Norouzi

Subject Categories

Artificial Intelligence and Robotics | Biogeochemistry | Biological and Chemical Physics | Data Science | Environmental Chemistry | Environmental Monitoring | Numerical Analysis and Scientific Computing | Other Earth Sciences | Other Environmental Sciences | Water Resource Management


artificial urban shallow lakes, physical and chemical properties of water, primary producer dominance and regime shifts, HABs, duckweed, floating yellow hearts, floating primrose, internal loading, machine learning, self-organizing maps and predictions


An artificial urban shallow lake, Prospect Park Lake (PPL), is situated on a terminal moraine in Brooklyn New York, and supplied with municipal water treated with ortho-phosphates. The constant input of the phosphate nutrient is the primary source of eutrophication in the lake. The numerous pools along the water course houses various aquatic phototrophs, which influence the water quality and the state of the system, driving conditions into favoring the survival of their species. In the first half of the dissertation, the focus of the project is on analyzing how the different primary producers in different regions of PPL affect the water’s physical and biogeochemical processes. It was shown in a previous study of the lake that internal loading of ortho-phosphates enables the lake to house large populations of aquatic species [1]. Furthermore, in the study it was concluded that the conditions for enabling internal loading was unknown and was an open question for further research. In another study, using a small dataset collected over the course of a single summer in 2015, it was shown that different aquatic phototrophs in the different pools of the lake alter the physical and chemical characteristics of the water column [2]. The importance of studying this particular artificial urban shallow lake is not only to fill in the gaps of knowledge for the conditions that favor certain primary producer dominance and internal loading, but also to understand how such lakes are unique and different when compared to the general knowledge of shallow lakes. The first part of the dissertation seeks to find the differences in the chemical and physical properties of a water body inhabited by different primary producers and how the water conditions lead to internal loading of phosphates.

The three locations for field samplings at PPL were chosen on the basis of the different primary producer dominance in each of the locations. The Boathouse (BH), which houses duckweed and algal species, the Peninsula (P) which houses a mix of floating yellow hearts and primrose, and algal species, and Lake Proper (LP) which houses mostly algal species. A YSI multi-meter professional probe was used to measure the concentrations of dissolved oxygen, temperature, pH, and conductivity at 20cm interval from the surface of the water to the bottom of the column at each location. Samples were also collected at the surface, middle and bottom layers of the column for chemical analysis to measure the concentrations of phosphate and chlorophyll a. Three unique patterns were characterized in the dataset analysis. The BH dominated by duckweed, showed lower temperatures, pH, conductivity and dissolved oxygen and implicitly higher concentrations of CO2 compared with the other locations. In the locations with mixed phototrophs (P and LP), the data showed the reverse with increased temperatures, pH, conductivity and dissolved oxygen and implicitly reduced CO2 when compared to the BH. In all locations the benthic conditions were similar showing reduced concentrations of dissolved oxygen and pH, and increased CO2. It is also shown in a separate mesocosm experiment that these benthic conditions are what lead to internal loading of phosphate in each of the locations. The analysis also shows implications that there seems to be a biological control of the internal loading. At the BH, the duckweed covers the surface of the water preventing any growth within the column. The decay of the biomass (duckweed and algae), and the lack of dissolved oxygen in the water column induces the internal loading. Whereas, in the P and LP locations, mass decay of algal bloom crashes leads to the same effects.

In the second part of the dissertation, the focus was on the use of machine learning as a predictive tool for water quality parameters, but also to find patterns and physical/ chemical characteristics in the dataset. Self-Organizing Maps (SOM) within the SuSi framework was used for this purpose. SOM is a machine learning technique that is commonly used in unsupervised training, where the goal is to find patterns and underlying structures of an unlabeled dataset. SuSi, uses the output of the unsupervised training as input in a supervised training step, where the final output is used for predictions. As a proof-of-concept the technique was used with an EPA dataset and showed some promise in its predictive capabilities evaluated mostly by the coefficient of determination (R2 score). The results showed that the trained model represented about 40-80% (with only nitrogen R2 scores near 70-80%) of the data. In a separate set of runs the hyperparameters were tuned in an attempt to improve the model training but yielded similar results. The final proof-of-concept experiment was to test if a model can be trained using two parameters to predict all the others in the set, with the goal of creating a tool to reduce the time and cost of sample collection. This experiment yielded similar results, but with slightly lower R2 scores, proving that this approach provides the means of substituting field work.

In the final part of the project, SOM was used on the PPL dataset. In an effort to improve the training process, metadata from the sampling expeditions were included within the data matrix. Both spatial and temporal information was included in the data matrix where every profile data point was used in the training process instead of averaging into single values. After hyperparameters were tuned, the R2 scores of the models went above 80%, yielding the best results in the predictive performance. Multi-linear regression and random forest regression were also used to compare machine learning and statistical models.