The PHQ-9 has become a measure of reference in depression research and clinical practice. However, the issue of the PHQ-9’s unidimensionality has not been fully elucidated, and the usability of the PHQ-9’s total score requires clarification. In this study, we examined the dimensionality, scalability, and monotonicity properties of the PHQ-9 as well as the scale’s total-score reliability. We did so based on exploratory structural equation modeling (ESEM) bifactor analysis and Mokken scale analysis (MSA). We relied on a total of 58,272 participants (63% female; Mage = 43, SDage = 13) from 29 samples involving seven different countries (e.g., Germany, the U.S.) and five different languages (e.g., German, English). We found no concerning deviations from measurement invariance for our ESEM bifactor model, neither across samples nor across sexes, age groups, and languages. The PHQ-9 met the requirements for essential unidimensionality in the pooled sample and across sex-, age-, and language-based subsamples. In each case, the general factor was strong (e.g., factor loadings ranged from 0.725 to 0.893 in the pooled sample) and Omega Hierarchical values exceeded 0.900. The correlations between the general factor and the observed total scores were large (≥ 0.952). Our MSA, including multilevel MSA, revealed that the PHQ-9’s scalability is satisfactory. No monotonicity violation was detected, suggesting that the scale’s total score accurately orders respondents on the latent Depression variable. Total-score reliability was good. This study provides robust evidence that the PHQ-9 can be used as a unidimensional measure of depressive symptoms by researchers and practitioners.