Date of Degree

2-2024

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Yingli Tian

Committee Members

Zhigang Zhu

Ioannis Stamos

Hassan Akbari

Subject Categories

Artificial Intelligence and Robotics | Other Computer Sciences

Keywords

Computer Vision, Pattern Recognition, Video Analysis, Scene Understanding, Action Detection, Action Recognition

Abstract

The understanding of human actions in videos holds immense potential for technological advancement and societal betterment. This thesis explores fundamental aspects of this field, including action recognition in trimmed clips and action localization in untrimmed videos. Trimmed videos contain only one action instance, with moments before or after the action excluded from the video. However, the majority of videos captured in unconstrained environments, often referred to as untrimmed videos, are naturally unsegmented. Untrimmed videos are typically lengthy and may encompass multiple action instances, along with the moments preceding or following each action, as well as transitions between actions. In the task of action recognition in trimmed clips, the primary objective is to classify action categories. In contrast, action detection in untrimmed videos aims to accurately identify the starting and ending moments of actions within untrimmed videos while also assigning the corresponding action labels. Action understanding in videos has significant implications across various sectors. It is invaluable in surveillance for identifying potential threats and in healthcare for monitoring patient movements. Importantly, it serves as an indispensable tool for interpreting sign language, facilitating communication with the deaf and hard-of-hearing community. This research presents innovative frameworks for video-based action recognition and detection. Annotating temporal boundaries and action labels for all action instances in untrimmed videos is a labor-intensive and expensive process. To mitigate the need for exhaustive annotations, this work introduces pioneering frameworks that rely on limited supervision. The proposed models demonstrate significant performance improvements over the current state-of-the-art on benchmark datasets. Furthermore, the applications of action understanding in sign language videos are explored by pioneering automated detection of signing errors. The effectiveness of the models is evaluated on the collected sign language datasets.

Recommended Citation

Vahdani, Elahe, "Deep Learning-Based Human Action Understanding in Videos" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5653

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Deep Learning-Based Human Action Understanding in Videos

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

CUNY Academic Works

Dissertations, Theses, and Capstone Projects

Deep Learning-Based Human Action Understanding in Videos

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links