Dissertations, Theses, and Capstone Projects
Date of Degree
6-2024
Document Type
Dissertation
Degree Name
Ph.D.
Program
Computer Science
Advisor
Zhigang Zhu
Committee Members
Hao Tang
Jie Gong
Jie Wei
William H. Seiple
Subject Categories
Artificial Intelligence and Robotics | Disability Studies | Other Computer Sciences
Keywords
Computer Vision, Context Understanding, Deep Learning, Knowledge Graph, Object Detection
Abstract
Contextual information has been widely used in many computer vision tasks, such as object detection, video action detection, image classification, etc. Recognizing a single object or action out of context could be sometimes very challenging, and context information may help improve the understanding of a scene or an event greatly. However, existing approaches design specific contextual information mechanisms for different detection tasks.
In this research, we first present a comprehensive survey of context understanding in computer vision, with a taxonomy to describe context in different types and levels. Then we proposed MultiCLU, a new multi-stage context learning and utilization framework, which is applied to storefront accessibility detection and evaluation. The MultiCLU has four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). Our experiment results show that the proposed framework can achieve significantly better performance than the baseline detector. As the fourth stage, we further design a new evaluation criteria for storefront accessibility dataset, which could provide a new way to think the evaluation standard in real world applications. For better data collecting and model refinement, we also utilize the MultiCLU storefront detection engine in a smart DoorFront platform for collecting new data and refining the deep learning models.
Furthermore, we generalize our MultiCLU into the GMC framework, a general framework for multi-stage context learning and utilization, which can be applied to various current deep learning models for different visual detection tasks. The GMC framework incorporates three major components (corresponding to the first three stages of the MultiCLU for storefront): local context representation, semantic context fusion, and spatial context reasoning. All three components can be easily added and removed from a standard object detector, which is demonstrated in a number of object recognition tasks including the storefront accessibility detection and the City Pedestrian detection tasks. The GMC framework is further extended to semantic segmentation tasks such as panoptic segmentation, which turns out to be both straightforward and effective. The outcomes of this research seek to provide a generalized approach on streamlining context learning in real world applications at various stages of the processing more flexibly and adapting to different tasks more efficiently.
Recommended Citation
Wang, Xuan, "Context in Computer Vision: A Taxonomy, Multi-stage Integration, and a General Framework" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5739
Included in
Artificial Intelligence and Robotics Commons, Disability Studies Commons, Other Computer Sciences Commons