Dissertations, Theses, and Capstone Projects
Date of Degree
6-2026
Document Type
Master's Thesis
Degree Name
Master of Arts
Program
Linguistics
Advisor
Spencer Caplan
Keywords
Computational Linguistics, Classifiers, Language Models, NLP, Propaganda, Media
Abstract
Detecting propaganda is fundamentally difficult: effective propaganda must simultaneously conceal persuasive intent while achieving persuasive effect, which means that sophisticated propagandistic discourse necessarily occupies the ambiguous middle of a rhetorical spectrum rather than its obvious extremes. This thesis defines this property as the dual constraint problem and argues that it poses two challenges for the binary classification framing that dominates propaganda detection in Natural Language Processing (NLP). First, propaganda by definition does not come with clear labels: no propaganda arm willingly accepts the designation, so reliable ground truth does not exist at the source. Second, because the producer is actively concealing intent, the surface features of any article — including the choice of topic itself — can be chosen adversarially to evade the very signals a binary classifier would rely on.
This thesis builds a framework for propaganda measurement that takes the estimated probabilities from a binary classification task and adjusts for topic, isolating propaganda-related components of text such as framing and narrative from subject matter. The framework is applied to the information environment surrounding the Islamic Republic of Iran, using two institutionally opposed outlets as training corpora: Press TV, the English-language channel of Islamic Republic of Iran Broadcasting, and Iran International, a Persian-language satellite channel positioned in opposition to the Iranian state.
A Latent Dirichlet Allocation topic model characterizes the topical structure of the two corpora, and inverse-probability weighting (IPW) debiases the resulting estimators. Eight classifiers, combining TF-IDF n-gram and sentence-transformer representations with and without IPW, are treated as alternative estimators of rhetorical alignment, yielding two measurement instruments: the Raw Propaganda Score (RPS) and the Topic-Adjusted Propaganda Score (TAPS).
Applied to an external corpus of 1,414 articles by ten journalists covering Iranian affairs in Western media, TAPS diverges substantially from RPS and produces a structural reordering of journalist rankings that aligns far more closely with independent external benchmarks — including institutional affiliation and an independently produced activist classification of regime-aligned media figures — providing external validity for the framework and its measurements.
Recommended Citation
Habibi, Leila, "Measuring Islamic Republic Propaganda in Western Media Coverage of Iran: A Computational Framework" (2026). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/6675
