Dissertations, Theses, and Capstone Projects

Date of Degree

2-2023

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Saptarshi Debroy

Committee Members

Raffi Khatchadourian

Kaliappa Ravindran

Prasad Calyam

Abstract

Data-intensive science applications in areas such as high-energy and particle physics, bioinformatics, genomics, and healthcare are increasingly using distributed resources for computation and storage. For distributed systems, such applications can be represented in terms of: i) the involved data (e.g., sensitive healthcare data requiring HIPAA-compliance) and ii) the different lifecycle stages of processing, transfer, and storage that the data goes through (i.e., the application workflow which is typically represented as a directed acyclic graph). In most cases, such applications have specific performance (i.e., in terms of compute/networking/storage resources) and security requirements for both the data and workflow stages. Hence, more often than not, applications have to rely on cloud, edge, and high performance computing (HPC) resources residing outside the boundaries of their home network or data generation site. However, such remote network domains (enterprise or institutional) have their own resource usage, security, and cost policies/rules that these multi-domain applications must comply with. This research seeks to examine the interactions and inter-conflicts among application requirements (both performance and security) and domain policies. The dissertation also explores how distributed system management strategies should consider such interactions and inter-conflicts towards efficient and effective resource management. In particular, the contributions of this dissertation are: i) a resource broker framework for data-intensive workflow management in distributed data centers; ii) trustworthy resource management within volunteer edge computing (VEC) systems; iii) trustworthy healthcare data brokering between data users and data custodians; and iv) loss of availability risk mitigation for data-intensive applications within distributed environments. This dissertation advances the current knowledge on how to balance data-intensive applications' performance and security requirements while performing resource allocations across distributed environments to reduce turnaround times in a secured and policy-compliant manner. The data-intensive application communities can benefit from the proposed approaches and augment their state-of-the-arts.

This work is embargoed and will be available for download on Saturday, February 01, 2025

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.

Share

COinS