Date of Award
Deep Neural Networks, Edge/Cloud Inference
The massive growth in availability of real-world data from connected devices and the overwhelming success of Deep Neural Networks (DNNs) in many ArtificialIntelligence (AI) tasks have enabled AI-based applications and services to become commonplace across the spectrum of computing devices from edge/Internet-of-Things (IoT) devices to data centers and the cloud. However, DNNs incur high computational cost (compute operations, memory footprint and bandwidth),which far outstrip the capabilities of modern computing platforms. Therefore improving the computational efficiency of DNNs wide-spread commercial deployment and success.In this thesis, we address the computational efficiency challenge in the context ofAI inference applications executing on edge/cloud systems, where conventionally data is sensed at the edge and then transmitted wirelessly to the cloud for inference. Edge devices are typically battery-driven, and are hence energy constrained. In always-on applications (e.g.,remote surveillance), transmitting the large amount of data sensed to the cloud incurs significant energy penalty on the edge device. Further, network availability may prohibit sustaining a high transmission rate required to meet real-time constraints. Embedding intelligence completely on the edge device itself to eliminate continuous data transmission is also not a viable solution due to the high computational cost of DNNs.A promising approach to address the aforementioned computational challenge is through partitioned edge/cloud inference, wherein the inference application is split between the edge and the cloud. A key intuition behind the idea is the observation that, in many applications, although a large amount of data is sensed and processed, only a small fraction of it is eventually important to the end-application.The key is to embed limited intelligence (with low computational cost) on the edge device, such that those uninteresting and potentially easy-to-classify instances can be filtered at the edge, while only the interesting ones are transmitted the to the cloud for more sophisticated processing.To this end, the thesis presents the concept and the design ofAdaptive EffortClassifiers, a new approach to designing AI inference classifiers in the context of partitioned edge/cloud inference. Adaptive effort classifiers are designed with the ability to modulate the degree of effort they expend based on the inherent difficulty of the input. They comprise of a chain of classifier stages that progressively grow in complexity and energy and based on a confidence threshold, each stage either gets classified or passed on to the next stage. Easy (or uninteresting) inputs are classified in the initial stages with very low effort, whereas the harder inputs (which is of interest to the application) progress through the classifier chain and are classified by the more complex final stages. In our system design, the initial classifier stages are executed on the edge device, whereas the final stages are executed on the cloud and the class probability is used as a confidence thresh-old to tune the aggressivity of classification in each stage. The thesis proposes two strategies to design adaptive effort classifiers: (i) Classifiers with progressive feature set, wherein each stage in the classification chain uses progressively more number of features, and (ii) Classifiers with progressive data bit-width, wherein the bit-width used for data representation is modulated across classifier stages to scale complexity and accuracy.We build adaptive effort versions of DNNs trained on 2 popular datasets viz.MNIST dataset for handwritten digit recognition and CIFAR-10 dataset for object recognition. We demonstrate up to 3.44×-11.29×improvement in ops with no loss in accuracy.
Sankar, Divya, "Adaptive Effort Classifiers: A System Design For Partitioned Edge/Cloud Inference" (2020). CUNY Academic Works.