Date of Degree
software engineering, software evolution, source code analysis, automated refactoring, empirical studies
Recent years have witnessed an explosion of work on Big Data. Data-intensive applications analyze and produce large volumes of data typically terabyte and petabyte in size. Many techniques for facilitating data processing are integrated into data-intensive applications. API is a software interface that allows two applications to communicate with each other. Streaming APIs are widely used in today's Object-Oriented programming development that can support parallel processing. In this dissertation, an approach that automatically suggests stream code run in parallel or sequentially is proposed. However, using streams efficiently and properly needs many subtle considerations. The use and misuse patterns for stream codes are proposed in this dissertation. Modern software, especially for highly transactional software systems, generates vast logging information every day. The huge amount of information prevents developers from receiving useful information effectively. Log-level could be used to filter run-time information. This dissertation proposes an automated evolution approach for alleviating logging information overload by rejuvenating log levels according to developers' interests. Machine Learning (ML) systems are pervasive in today's software society. They are always complex and can process large volumes of data. Due to the complexity of ML systems, they are prone to classic technical debt issues, but how ML systems evolve is still a puzzling problem. This dissertation introduces ML-specific refactoring and technical debt for solving this problem.
Tang, Yiming, "Towards Automated Software Evolution of Data-Intensive Applications" (2021). CUNY Academic Works.