Dissertations, Theses, and Capstone Projects

Date of Degree

9-2024

Document Type

Dissertation

Degree Name

Ph.D.

Program

Physics

Advisor

Ariyeh Maller

Advisor

Viviana Acquaviva

Committee Members

Rachel Somerville

Alyson Brooks

Charles Liu

Subject Categories

Artificial Intelligence and Robotics | Data Science | External Galaxies | Physical Processes

Keywords

galaxies, hydrodynamical simulations, semi-analytic models, machine learning, physical models, galaxy sizes

Abstract

Galaxies are the breathtakingly beautiful starry islands of the Universe. The process of galaxy formation involves the transformation from simple initial conditions in the early Universe to the complex galaxy structures we observe today. Spanning an immense spatial range and tremendous time scales - from the vastness of the Universe to the scale of individual stars - the physics of galaxy formation is both complex and crucial for understanding the Universe we live in. However, despite significant advancements, our theoretical understanding of galaxy formation remains incomplete.

In the era of big data available from hydrodynamical simulations and observations, Machine Learning algorithms are well equipped to identify complex relationships such as those in galaxy formation. In this thesis, I propose a workflow that uses Machine Learning to understand the relationship between features and target, and further, to understand the underlying physics of the system by searching for an analytical equation that best describes this relationship. I apply this pipeline to address the long-lasting puzzle of galaxy sizes in galaxy formation theory, which argues that the sizes of disk galaxies are the result of their parent halo’s angular momentum conservation (Mo et al., 1998). First, I check that the problem is well posed by verifying that Machine Learning models are able to predict galaxy sizes, given all the input features at our disposal. Then, I extract the minimum number of variables that contribute to the prediction of galaxy sizes using a feature ranking analysis. Finally, I use this minimum set of features to search for the physical model using Symbolic Regression – a Machine Learning algorithm that searches for an analytical equation that best describes the relationship between input and output.

As a check for the viability of this pipeline, in the first part of this dissertation I use data from a semi-analytical model where the data-driven derived physical model can be compared to the ground truth. I find that I can recover the known physical model that sets galaxy sizes in the Santa Cruz semi-analytical model using the Machine Learning approach described. With this successful proof of concept, in the second part of this thesis I search for a physical model for galaxy sizes using the state-of-the-art IllustrisTNG hydrynamical simulations, which represent the real Universe and its complexities more realistically. In this case we do not know what physical processes set galaxy sizes and, therefore, we do not have a ground truth to compare our physical model with. I find that the main drivers of galaxy sizes and the ’physical model’ extracted from hydrodynamical simulations is different from the theoretical framework of Mo et al. (1998) and that of the semi-analytical models case.

This work is embargoed and will be available for download on Monday, March 31, 2025

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.

Share

COinS