Dissertations, Theses, and Capstone Projects

Date of Degree

9-2024

Document Type

Dissertation

Degree Name

Ph.D.

Program

Educational Psychology

Advisor

Jay Verkuilen

Committee Members

David Rindskopf

Wei Wang

Howard T. Everson

Zhongmin Cui

Subject Categories

Educational Assessment, Evaluation, and Research | Educational Psychology

Keywords

Response Styles, Multidimensional Item Response Theory Model, Noncognitive Assessment, Multiscale Measures, Simulation Study, Empirical Study

Abstract

Response style, one common aberrancy in non-cognitive assessments in psychological fields, is problematic in terms of inaccurate estimation of item and person parameters, which leads to serious reliability, validity, and fairness issues (Baumgartner & Steenkamp, 2001; Bolt & Johnson, 2009; Bolt & Newton, 2011). Response style refers to systematic individual preference or avoidance of certain response categories (Bolt & Johnson, 2009), especially in self-reported Likert-type scales. When response style occurs, the selection of response options is not only influenced by the trait(s) of interest but also affected by one’s tendency to choose specific options stylistically.

Despite the fact that the response styles are sensitive to certain contents and formats of the scales, they are usually found to be consistent across multiple (sub)scales within an instrument (Weijters, Geuens, et al., 2010a, 2010b; Wetzel et al., 2013). In a multiscale psychological assessment which includes multiple subscales within a large scale, disentangling the effect of response styles from the estimates of latent traits becomes more complicated than in unidimensional scale, but multiple scales also add more useful information to help disentangling the effect of response styles. To accurately estimate multiple scales with response styles involved, one needs to simultaneously model item responses across scales and control the influence of response styles. Researchers have developed many traditional methods detecting response styles, such as calculating simple frequencies of different response options and the standard deviation (Greenleaf, 1992), creating heterogeneous items measuring response styles (Baumgartner & Steenkamp, 2001; Greenleaf, 1992), and applying person fit statistics (Albers et al., 2016; Conijnet al., 2014; Drasgow et al., 1991). However, there are various problems with these traditional methods (Dowling et al., 2016). To overcome the problems of these traditional methods, some methodological advancements have been developed to measure and possibly to adjust response styles in estimating substantive traits, such as multidimensional item response theory (MIRT, Bolt & Newton, 2011), mixture IRT (Eid & Zickar, 2007; Rost et al., 1997), and IRTree (Böckenholt, 2012; De Boeck & Partchev, 2012).

This study compared descriptive statistics, nonparametric person-fit statistics, multiscale parametric person-fit statistics, and MIRT models in detecting and/or measuring response styles in psychological assessments with polytomous items. The two research questions were: (a) Comparing with descriptive statistics and multiscale person-fit statistics, how well does the MIRT model work in identifying response styles across multiple subscales under various conditions? and (b) And are there any person covariates explaining the variation in parameters of latent traits as well as response styles? To address these questions, first, a simulation study was used to examine the effects of four design factors (i.e., (sub)scale length, sample size, proportions of responses with ERS, and proportions of responses with MRS) on the parameter recovery using the MIRT model, and power and Type I error and classification accuracy in MIRT models compared to a descriptive statistic and a multiscale person-fit statistic. Second, one uniscale dataset (i.e., Quality of Life [QOL]) and two empirical multiscale data sets (i.e., Big Five personality scale, and Depression, Anxiety, and Stress Scale (DASS), which vary in the number of substantive traits, the number of items per subscale, and the number of categories) were analyzed to investigate the efficacy of the MIRT models and explanatory MIRT (EMIRT) models involving response styles. The first two empirical data sets (QOL and Big Five) were analyzed to illustrate the use the MIRT models in measuring response styles, and the third empirical data set (DASS) with person covariates were analyzed to explore the predictors of latent traits and response styles under an EMIRT framework.

The results of the simulation study show that the descriptive method has the highest power in detecting respondents with response styles in this simulation but has highest Type I error and lowest classification accuracy as well, while the MIRT model with two response style factors performs the best in Type I error and classification accuracy but worst in power, and the multiscale person-fit statistic performs between the descriptive method and MIRT model in detecting respondents with response styles. As for the parameter recovery of the MIRT model, proportion of ERS and proportion of MRS are significant factors of average bias, but none of the four design factors have practical significance on the average RMSE and correlation. The combination of test length (10, 20), sample size (300, 3000), and mixing proportions of ERS and MRS (0, 10%, and 30%) results in some large proportions of aberrant respondents (20%, 30%, 40%, and 60%), which leads to highly challenging to achieve good parameter recovery. The MIRT model in this simulation is not good to recover item parameters accurately when there are large proportion of aberrant respondents with response styles.

The results of the empirical studies show that MIRT models with response style factors being modeled can help measure latent factors of response styles and identify aberrant respondents with response styles, while many descriptive statistics and nonparametric and parametric person-fit statistics can only detect outliers in the response patterns and hardly tease out the exact impact of response styles. The results also suggest that ignoring response style factors would create scoring bias, which can harm the validity of the scores in traits of interest for individuals. In general, MIRT models that control for response styles provide more accurate measurement of the uni- and multi-scale measures including QOL, Big Five, and DASS. Joint estimation of latent traits and response style factors in the MIRT model utilizes information from all the subscales, providing better control of response styles with respect to measurement of the substantive traits. Finally, the EMIRT results of the DASS study showed that education level, gender and race are good predictors of response styles and DASS subscales. Comparing to reference groups, most levels of these person covariates have significant effects on both response style factors and DASS subscales, either positively or negatively. In addition, after person covariates are considered in the EMIRT model, the correlations among DASS subscales rebound significantly from the final MIRT model, probably because the person covariates could explain significant proportion of variance and covariance of response style factors and DASS subscales.

Share

COinS