Dissertations, Theses, and Capstone Projects

Date of Degree

6-2023

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Sven Dietrich

Committee Members

Robert Haralick

Ping Ji

Patrick Amon

Subject Categories

Information Security | Other Computer Sciences | Programming Languages and Compilers | Software Engineering

Keywords

Binary analysis, Code Clones and Reuse, Static analysis, Intermediate Representation and Compilers, Binary Abstraction, Binary Transformation

Abstract

Binary analysis allows researchers to examine how programs are constructed and how they will impact an underlying system. The various analysis techniques allow the determination of code authorship, reuse, and similarity. Detecting code reuse is significant because code reuse can be a method for vulnerabilities and security issues to spread among software projects. In this work, we examine techniques that can aid in binary analysis, especially those that abstract and transform binaries, so that they can be compared across compilation environments, including possible changes in compiler version, hardware architecture, and compilation options. Historically, this has been difficult to accomplish since changing the compilation environment can often lead to changes in the binary’s structure and layout.

First, we examine the concept of the binary executable and how it can be represented. We explore the different levels of abstraction, the information available at each level, and how that information can be used. We also discuss the purposes and types of Intermediate Representations, how they can be used, and how they relate to each other. With this, we can define the possible relationships between binary executables.

Next, through the use of transformation techniques and abstraction of binary executables, we discuss methods by which it is possible to compare binaries compiled in different compilation environments. This is through the definition of transformation functions that can be used at each stage of an executable’s life cycle, defining the operations that would be used to transform the binary into a different form. We then conduct a survey of the various tools that can be used to implement those functions.

Finally, we introduce a prototype abstraction language, μ-IR, designed to represent segments of binary executables in an environment-independent fashion so that they can be analyzed and compared. Through several case studies of model and real-world binary executables, we show that μ-IR can be used to represent and compare segments of binary executables compiled in different environments. We use multiple distance metrics to com- pare the μ-IR representations of binary executables from different compilation environments and compare them to the way that the state-of-the-art represents those same segments. Our method shows an improvement over the existing state-of-the-art when paired with the appropriate distance metric.

This work is embargoed and will be available for download on Thursday, January 02, 2025

Graduate Center users:
To read this work, log in to your GC ILL account and place a thesis request.

Non-GC Users:
See the GC’s lending policies to learn more.

Share

COinS