Dissertations, Theses, and Capstone Projects

Date of Degree

2-2025

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Chia-Ling Tsai

Committee Members

Sos Agaian

Ioannis Stamos

Chao Chen

Keywords

medical image segmentation, deep learning, weakly supervised semantic segmentation, multimodal

Abstract

Accurate segmentation of medical images is essential for clinical diagnosis, enabling precise measurement, monitoring, and analysis of anatomical structures. While advanced models have achieved impressive per-pixel performance under fully supervised settings, ensuring fine-scale structural completeness remains a challenge, particularly for complex biomedical structures where coherence and integrity are critical. Fully supervised methods also face inherent limitations due to their reliance on extensive pixel-level annotations. These annotations are labor-intensive, costly, and often suffer from low inter-expert agreement, especially in cases like OCT images where small lesions and unclear boundaries make precise labeling difficult. This underscores the challenge of acquiring large, high-quality annotated datasets for medical image segmentation. This dissertation addresses these challenges through four progressive methods. The first two focus on fully supervised 3D segmentation, introducing topological supervision to enhance structural accuracy while keeping computational costs efficient. To reduce dependence on intensive annotations, the latter two methods transition to weakly supervised semantic segmentation (WSSS) using image-level labels only, enriching the training process with structural guidance and text-driven strategies.

In the initial phase, we focus on improving structural accuracy in fully supervised settings by introducing topological loss functions that leverage topological information across neighboring slices. To better capture complex medical structures in 3D anisotropic images, we propose the Topological Attention ConvLSTM Network (TACLNet), which facilitates the exchange of structural information across slices. As detailed pixel-level annotations become a limitation, the research extends to WSSS, relying on image-level labels. A novel anomaly-guided mechanism (AGM) is introduced for retinal OCT segmentation, integrating weak abnormal signals with global contextual information. To further enhance WSSS, we incorporate structural features to guide lesion localization and employ large vision-language models for text-driven strategies, including label-informed guidance and synthetic descriptive integration. This multimodal architecture significantly boosts segmentation performance while reducing reliance on labor-intensive pixel-level annotations.

Share

COinS