Date of Degree


Document Type


Degree Name





Rivka Levitan

Subject Categories



dementia, gender, linguistics, computational linguistics, sex, Alzheimer's disease


Typically, about 60% of dementia patients are women. Researchers have historically dismissed this imbalance as a result of the life expectancy for women being longer, and since age is the primary risk factor associated with dementia, and women’s longer lifespan equates to a higher percentage of the dementia patient population (Mielke, 2018). While the exact cause of dementia is unknown, researchers and clinicians have historically treated male and female populations the same, asserting that there is no significant difference between the two sexes in regards to detecting dementia. The present study aims to address this potential gap in dementia research, where newer research (as recent as 2018) also demands for differences in gender to be addressed in this field. In the present study the Pitt Corpus from DementiaBank, to attempt to find significant results in how men and women with dementia utilize language. A statistical analysis was performed using linear regression and ANOVA models, which found significant interactions between sex and linguistic features. This same data was used to train and test machine learning models in attempts to categorize utterances from both sexes accurately. Logistic regression, Naive Bayes, and SVM models were used on various forms of TF-IDF vectors, with logistic regression performing at the highest accuracy at 56%. The implication of these results aligns with the hypothesis of this study, that there is a significant difference between the linguistic markers of both sexes.

Included in

Linguistics Commons