Artificial Intelligence Ratings of Facial Feature Similarity as Predictors of Eyewitness Performance
Date of Award
1-2027
Document Type
Thesis
Degree Name
Master of Arts (MA)
Department/Program
Forensic Psychology
Language
English
First Advisor or Mentor
Steve Penrod
Second Reader
Dilhan Töredi
Third Advisor
Jennifer Dysart
Abstract
Eyewitness identification plays an important role in the criminal justice system, yet it is often prone to error. Diagnostic Feature Theory (DFD) suggests that lineup fillers should match the eyewitness’s description of the suspect but be different in other ways (Wixted & Mickes, 2014). According to this theory, people remember faces using multiple features (e.g., age, race, face shape), some of which are useful for recognition and some that are not. This approach has been shown to improve the ability to tell the difference between guilty and innocent suspects (e.g., Colloff et al., 2021), but it also increases identification rates for both. Although the increase is usually higher for guilty suspects, the rise in innocent suspect identifications is still a concern. Identifying which features are most useful could help improve lineup construction by allowing fillers to match the suspect on some features but differ on others, which may increase correct identifications without increasing false ones. We aimed to explore this by using human rated similarity judgements, BetaFace (an AI tool which gives similarity judgements for face pairings) and ChatGPT (a widely used large language model) to produce similarity ratings between filler faces and suspect faces for each facial feature and then use these similarity ratings to assess whether they predict choosing behavior. BetaFace provides perceptual similarity ratings similar to those of humans when developing fair lineups (Lee, Mansour, & Penrod, 2021), but using ChatGPT is a novel exploration in respect to human perceptual ratings. Thus, we had a 3 (Perceptual Similarity Ratings: Human, BetaFace, ChatGPT; within-participants) study. The findings provided limited and inconsistent support for the role of individual facial features. In target-present lineups, featural similarity did not predict choosing behavior across any measurement method, with the exception of a single effect for human ratings, where greater forehead similarity was associated with increased choosing. In target-absent lineups, Betaface ratings showed that greater similarity in the nose, ears, and mouth to the target was associated with increased choosing, whereas no such effects were observed for human or ChatGPT ratings. Additionally, similarity to the innocent suspect predicted choosing in target-absent lineups, such that greater similarity in the eyes and chin (Betaface) and eyebrows (human ratings) was associated with increased filler selection, while ChatGPT ratings again showed no significant effects. Finally, there were no significant differences between human, Betaface, and ChatGPT ratings across the dependent measures, indicating that all three showed similar overall patterns, even though featural similarity did not strongly predict choosing behavior.
Recommended Citation
Bugajczyk, Agata K., "Artificial Intelligence Ratings of Facial Feature Similarity as Predictors of Eyewitness Performance" (2027). CUNY Academic Works.
https://academicworks.cuny.edu/jj_etds/383
