Dissertations, Theses, and Capstone Projects

Date of Degree

6-2024

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Zhigang Zhu

Committee Members

Hao Tang

Jie Gong

Jie Wei

William H. Seiple

Subject Categories

Artificial Intelligence and Robotics | Disability Studies | Other Computer Sciences

Keywords

Computer Vision, Context Understanding, Deep Learning, Knowledge Graph, Object Detection

Abstract

Contextual information has been widely used in many computer vision tasks, such as object detection, video action detection, image classification, etc. Recognizing a single object or action out of context could be sometimes very challenging, and context information may help improve the understanding of a scene or an event greatly. However, existing approaches design specific contextual information mechanisms for different detection tasks.

In this research, we first present a comprehensive survey of context understanding in computer vision, with a taxonomy to describe context in different types and levels. Then we proposed MultiCLU, a new multi-stage context learning and utilization framework, which is applied to storefront accessibility detection and evaluation. The MultiCLU has four stages: Context in Labeling (CIL), Context in Training (CIT), Context in Detection (CID) and Context in Evaluation (CIE). Our experiment results show that the proposed framework can achieve significantly better performance than the baseline detector. As the fourth stage, we further design a new evaluation criteria for storefront accessibility dataset, which could provide a new way to think the evaluation standard in real world applications. For better data collecting and model refinement, we also utilize the MultiCLU storefront detection engine in a smart DoorFront platform for collecting new data and refining the deep learning models.

Furthermore, we generalize our MultiCLU into the GMC framework, a general framework for multi-stage context learning and utilization, which can be applied to various current deep learning models for different visual detection tasks. The GMC framework incorporates three major components (corresponding to the first three stages of the MultiCLU for storefront): local context representation, semantic context fusion, and spatial context reasoning. All three components can be easily added and removed from a standard object detector, which is demonstrated in a number of object recognition tasks including the storefront accessibility detection and the City Pedestrian detection tasks. The GMC framework is further extended to semantic segmentation tasks such as panoptic segmentation, which turns out to be both straightforward and effective. The outcomes of this research seek to provide a generalized approach on streamlining context learning in real world applications at various stages of the processing more flexibly and adapting to different tasks more efficiently.

Share

COinS