Date of Award

Spring 6-2026

Document Type

Thesis

Degree Name

Master of Science (MS)

Department/Program

Digital Forensics and Cybersecurity

Language

English

First Advisor or Mentor

Hunter Johnson

Second Reader

Fatma Najar

Third Advisor

Shweta Jain

Abstract

The rapid adoption of Large Language Models (LLMs) in software development has transformed coding practices by enabling automated code generation, completion, and optimization. Despite these advantages, concerns persist regarding the security and reliability of LLM-generated code. This study presents a comprehensive evaluation of both the functional correctness and security of code produced by three prominent LLMs as of early 2026. A total of 4,800 code snippets were generated using 100 security-focused programming prompts derived from the OWASP Top 10:2025, translated across eight natural languages and two phrasing styles (literal and natural developer-oriented prompts). To assess performance, a multi-stage experimental framework was developed, incorporating automated code generation, syntax auditing, and cross-model evaluation. Each code sample was scored for correctness and security using LLM-based evaluators, with additional classification of vulnerabilities based on OWASP risk categories and Common Weakness Enumeration (CWE) root causes. The findings reveal a consistent and significant “security-correctness gap,” where LLMs frequently produce functionally correct code that contains exploitable vulnerabilities. Statistical analysis indicates only a moderate correlation between correctness and security, reinforcing that functional success does not imply safety.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.