Dissertations, Theses, and Capstone Projects
Date of Degree
2-2026
Document Type
Doctoral Dissertation
Degree Name
Doctor of Philosophy
Program
Computer Science
Advisor
Jun Li
Committee Members
Ping Ji
Liang Zhao
Xian Teng
Subject Categories
Databases and Information Systems | Systems Architecture | Theory and Algorithms
Keywords
Interactive Visualization, Uncertainty Visualization, Approximate Algorithm, Data Structure
Abstract
Modern datasets continue to grow in size, dimensionality, and heterogeneity, creating increasing tension between the need for responsive, interactive analysis and the computational cost of accessing, aggregating, and visualizing large volumes of data. Traditional database engines and visualization tools often assume that full data retrieval is feasible or that exact computation is necessary for meaningful insight. In practice, however, analysts frequently benefit from timely, uncertainty-aware approximations than from delayed and exact results. This thesis investigates how data summarization techniques, specifically mergeable sketches can be combined with progressive, out-of-core visualization methods to support interactive exploration of datasets that exceed main memory. The first contribution of this thesis is the design and theoretical analysis of a mergeable data sketch tailored for scalable visual analytics. Unlike sketches that are either not composable or lose accuracy when merged repeatedly, the proposed construction preserves formal approximation guarantees under merging while maintaining a compact memory footprint. This formulation allows sketch summaries to be built independently across chunks and then combined efficiently without compromising error bounds. Together with an out-of-core index mechanism, the sketch supports multi-resolution summarization, enabling rapid coarse estimates that can be refined progressively.
Building on this foundation, the second contribution is an out-of-core visualization engine that uses the sketches within a spatial–temporal Two-level indexing framework. The engine streams data from disk, incrementally constructs and merges summaries at multiple granularities, and serves the approximate results to the front end. As a result, users can interact with datasets far larger than memory, receiving immediate but approximate feedback that is refined over time. The system explicitly encodes approximation uncertainty and communicates the approximations back to users.
To examine the practical utility of this combined approach, we present a proof-of-concept application demonstrating sketch-based interaction for exploratory querying. The prototype integrates the multi-resolution index with the front end that supports real-time spatio-temporal querying, approximate matching, and real-time refinement. Through a staged evaluation including performance measurements, qualitative system behavior analysis, and scenario-based demonstrations, the PoC system illustrates how mergeability, progressive refinement, and uncertainty-aware visualization jointly support fluid exploration when the underlying dataset is very large.
Experiments show that the sketch construction scales nearly linearly with data size, preserves accuracy under deep merging, and enables query latencies compatible with interactive use. The indexing scheme further reduces disk access overhead by prioritizing partitions most relevant to the user’s current view. Collectively, these results demonstrate that principled sketch design and system-level engineering can substantially narrow the gap between large-scale data processing and interactive analytics.
The thesis concludes by discussing the remaining challenges, including adaptive parameter selection, refinement prioritization, support for heterogeneous data types, and the need for formal user studies on progressive and uncertainty-aware interfaces. The work establishes a foundation for future research at the intersection of approximate computation, out-of-core analytics, and human-centered visualization. By integrating mergeable sketches with a progressive visualization engine, this thesis provides both theoretical and practical contributions toward scalable, interpretable, and interactive data exploration.
Recommended Citation
Huang, Xueqi, "Online Visual Query System for Real-Time Large-Scale Spatio-Temporal Data Explorations with Error Bounds" (2026). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6603
Included in
Databases and Information Systems Commons, Systems Architecture Commons, Theory and Algorithms Commons
