Date of Degree
Lisa M. Brown
image super-resolution, image inpainting, image captioning, machine learning, deep learning, self-guiding
Image enhancement has drawn increasingly attention in improving image quality or interpretability. It aims to modify images to achieve a better perception for human visual system or a more suitable representation for further analysis in a variety of applications such as medical imaging, remote sensing, and video surveillance. Based on different attributes of the given input images, enhancement tasks vary, e.g., noise removal, deblurring, resolution enhancement, prediction of missing pixels, etc. The latter two are usually referred to as image super-resolution and image inpainting (or completion).
Image super-resolution and completion are numerically ill-posed problems. Multi-frame-based approaches make use of the presence of aliasing in multiple frames of the same scene. For cases where only one input image is available, it is extremely challenging to estimate the unknown pixel values. In this dissertation, we target at single image super-resolution and completion by exploring the internal statistics within the input image and across scales. An internal gradient similarity-based single image super-resolution algorithm is first presented. Then we demonstrate that the proposed framework could be naturally extended to accomplish super-resolution and completion simultaneously. Afterwards, a hybrid learning-based single image super-resolution approach is proposed to benefit from both external and internal statistics. This framework hinges on image-level hallucination from externally learned regression models as well as gradient level pyramid self-awareness for edges and textures refinement. The framework is then employed to break the resolution limitation of the passive microwave imagery and to boost the tracking accuracy of the sea ice movements. To extend our research to the quality enhancement of the depth maps, a novel system is presented to handle circumstances where only one pair of registered low-resolution intensity and depth images are available. High quality RGB and depth images are generated after the system. Extensive experimental results have demonstrated the effectiveness of all the proposed frameworks both quantitatively and qualitatively.
Different from image super-resolution and completion which belong to low-level vision research, image captioning is a high-level vision task related to the semantic understanding of an input image. It is a natural task for human beings. However, image captioning remains challenging from a computer vision point of view especially due to the fact that the task itself is ambiguous. In principle, descriptions of an image can talk about any visual aspects in it varying from object attributes to scene features, or even refer to objects that are not depicted and the hidden interaction or connection that requires common sense knowledge to analyze. Therefore, learning-based image captioning is in general a data-driven task, which relies on the training dataset. Descriptions in the majority of the existing image-sentence datasets are generated by humans under specific instructions. Real-world sentence data is rarely directly utilized for training since it is sometimes noisy and unbalanced, which makes it ‘imperfect’ for the training of the image captioning task. In this dissertation, we present a novel image captioning framework to deal with the uncontrolled image-sentence dataset where descriptions could be strongly or weakly correlated to the image content and in arbitrary lengths. A self-guiding learning process is proposed to fully reveal the internal statistics of the training dataset and to look into the learning process in a global way and generate descriptions that are syntactically correct and semantically sound.
Xian, Yang, "Exploring the Internal Statistics: Single Image Super-Resolution, Completion and Captioning" (2017). CUNY Academic Works.