Date of Degree


Document Type


Degree Name



Computer Science


Ioannis Stamos

Committee Members

Ioannis Stamos

Zhigang Zhu

Lei Xie

Yin Cui

Subject Categories

Artificial Intelligence and Robotics


3D Object Detection, Instance Segmentation, Frustum, VoxNet, 3D CNN


We address the problem of 3D object detection and instance segmentation by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, Depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system 1) generates frustums from 2D detection results, 2) proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Although the volumetric data representation is widely used for 3D object classification, there are fewer works on 3D object detection based on this representation. Volumetric representations are advantageous compared with raw point clouds. First, they naturally support convolution and deconvolution operations, which play essential roles in object classification and segmentation tasks. Second, the memory requirements of this representation will not be increased when denser cloud points are collected. Hence, the computational complexity of the system will not be influenced. Third, stable inference results can be guaranteed as the sub-sampling of input data is unnecessary. This is in contrast with the fact that methods relying on non-voxelized point clouds have to sub-sample the input data due to complexity limitations. Results on the SUNRGB-D dataset show that our RGB-D based system can achieve better detection results based on several categories. Our inference speed is much faster than state of the art. At the same time, our depth only system can achieve results that are comparable to RGB-D based systems. Our improved system can achieve excellent 3D instance segmentation results. Also, the 3D object detection based on the instance segmentation results can further improve the detection performance.