Dissertations, Theses, and Capstone Projects
Date of Degree
6-2024
Document Type
Capstone Project
Degree Name
M.S.
Program
Data Analysis & Visualization
Advisor
Howard T. Everson
Subject Categories
Databases and Information Systems | Data Science | Journalism Studies
Keywords
fake news, news classification, data diversity, algorithmic bias, misinformation analysis, text classification
Abstract
In today's digital world, detecting fake news has emerged as a critical challenge, one that has significant effects on democracy and public discourse at large both regionally and globally. This research studies how diversity of news sources in training datasets affects how well machine learning models can classify fake vs true news. I used the Linear Support Vector Classification (LinearSVC) to create and compare two classification models: one was trained on a dataset that only had real news from a singular source, Reuters (Dataset 1), and the other was trained on a dataset that contained real news from Reuters, The New York Times, and NPR (Dataset 2). Both datasets contained fake news articles from diverse sources. The datasets were prepared by cleaning the data and using Term Frequency - Inverse Document Frequency (TF-IDF) Vectorization. The models were then trained using LinearSVC, tested on a comparison dataset and evaluated using accuracy, precision, recall, and F1-score metrics. The study's results show that the model trained on Dataset 2, did better on all evaluation metrics than the model trained on Dataset 1. Seeing this improvement in performance shows how important it is to include different journalistic points of view in training datasets. This makes the model learn better and be better at the task. The study adds to what is already known about classifying and detecting fake news by showing how important it is to have a variety of sources in training datasets using different types of news sources to make classification models more accurate. Not only does this study provide insight about fake news classification, but it also underscores the broader implications of machine learning in media credibility and information consumption in the digital age.
Recommended Citation
Islam, Muhammad, "The Efficacy of Using Machine Learning Techniques for Identifying and Classifying “Fake News”" (2024). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/5905
datasets
fakenewsclassification-main.zip (73187 kB)
Export of GitHub repo at time of deposit.
Included in
Databases and Information Systems Commons, Data Science Commons, Journalism Studies Commons