Dissertations, Theses, and Capstone Projects

Date of Degree

9-2022

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Jun Li

Committee Members

Ping Ji

Saptarshi Debroy

Amlan Nayak

Subject Categories

Digital Communications and Networking

Keywords

distributed computing, erasure coding, matrix multiplication

Abstract

Matrix multiplication is a fundamental building block in many machine learning models. As the input matrices may be too large to be multiplied on a single server, it is common to split input matrices into multiple sub-matrices and execute the multiplications on different servers. However, in a distributed infrastructure, it is common to observe stragglers whose performance is significantly lower than other servers at some time. Compared to replicating each task on multiple servers, coded matrix multiplication, i.e., a combination of coding theoretic techniques and distributed matrix multiplication, can tolerate the same number of stragglers with much fewer servers. The recent years have witnessed the fast development of research in coded matrix multiplication. Coding schemes have been proposed to achieve the optimal recovery threshold, communication, and computation; leverage resource from stragglers; perform secure and private matrix multiplication. Also, coded matrix multiplication is applied in gradient descent and machine learning models.

In this dissertation, we propose a coding scheme for the matrix chain multiplication with a general number of matrices multiplied, which allows completing the chain multiplication in one single round. Then, we present Spinner, a framework for coded distributed matrix multiplication where we exploit the partial results of tasks on stragglers in a heterogeneous cluster. At last, we introduce ε-local coding that can adjust the encoding locality of the parity data sub-tasks, and it can leverage partial results returned by the straggling nodes and has better encoding complexity than existing coding schemes.

Share

COinS