Psychometric properties of the 18-item Indonesian Mental Toughness Questionnaire using the Rasch model and Machine Learning
DOI:
https://doi.org/10.21580/pjpp.v10i1.23214Keywords:
differential item functioning, gradient boosting classification, Mental Toughness Questionnaire, rating scale model, Wright MapAbstract
The psychometric properties of the Indonesian version of the 18-item Mental Toughness Questionnaire (MTQ-18) remain vague. This study uses the Rasch model to elucidate these properties. In addition, boosting classification was adopted to assess the predictive validity of athletes’ achievement. The sample size comprised 400 athletes. According to the Martin-Loef likelihood-ratio test = 482, p = 1.0 and factor analysis of the Rasch residuals, the questionnaire tends to make unidimensional assumptions. The MADaQ3 = 0.074 shows the overall tendency of local independency across all items, with the majority clustered in moderate to low-level measures. Q11, Q15, and Q18 were clearly identified as showing gender bias, with significant effect sizes. According to the boosting classification, the performance between national vs no achievement (F1 = 0.7, AUC = 0.56) and international vs no achievement (F1 = 0.62, AUC = 0.58) was flagged as unsatisfactory predictive performance. In conclusion, the abridged questionnaire is not preferable for determining an individual’s future performance or achievement. Future studies are needed to develop a better version that is more unimpeded by gender bias, and to resolve the variability of the items.
Downloads
References
AERA, APA, & NCME. (2014). Standards for educational and psychological testing / American Educational Research Association, American Psychological Association, National Council on Measurement in Education. American Educational Research Association.
Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4), 581–594. https://doi.org/10.1177/014662167800200413
Belsti, Y., Moran, L., Du, L., Mousa, A., De Silva, K., Enticott, J., & Teede, H. (2023). Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model. International Journal of Medical Informatics, 179, 105228. https://doi.org/10.1016/j.ijmedinf.2023.105228
Benkendorf, D. J., Schwartz, S. D., Cutler, D. R., & Hawkins, C. P. (2023). Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models. Ecological Modelling, 483, 110414. https://doi.org/10.1016/j.ecolmodel.2023.110414
Bond, T., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental measurement in the human sciences. Routledge. https://doi.org/10.4324/9781315814698
Boone, W. J. (2016). Rasch analysis for instrument development: Why, when, and how? CBE—Life Sciences Education, 15(4), rm4. https://doi.org/10.1187/cbe.16-04-0148
Brand, S., Kalak, N., Gerber, M., Clough, P. J., Lemola, S., Sadeghi Bahmani, D., Pühse, U., & Holsboer-Trachsler, E. (2017). During early to mid adolescence, moderate to vigorous physical activity is associated with restoring sleep, psychological functioning, mental toughness and male gender. Journal of Sports Sciences, 35(5), 426–434. https://doi.org/10.1080/02640414.2016.1167936
Brand, S., Sabouri, S., Gerber, M., Sadeghi Bahmani, D., Lemola, S., Clough, P., Kalak, N., Shamsi, M., & Holsboer-Trachsler, E. (2016). Examining Dark Triad traits in relation to mental toughness and physical activity in young adults. Neuropsychiatric Disease and Treatment, 229. https://doi.org/10.2147/NDT.S97267
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert Scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106–116. https://doi.org/10.3844/jssp.2007.106.116
Chicco, D., & Jurman, G. (2023). The Matthews Correlation Coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Mining, 16(1), 4. https://doi.org/10.1186/s13040-023-00322-4
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623
Christensen, K. B., Bjorner, J. B., Kreiner, S., & Petersen, J. H. (2002). Testing unidimensionality in Polytomous Rasch Models. Psychometrika, 67(4), 563–574. https://doi.org/10.1007/BF02295131
Clough, P., Earle, K., & Sewell, D. (2002). Mental toughness: The concept and its measurement. In Solutions in Sport Psychology (pp. 32–46). Thomson Learning.
Cowden, R. G. (2017). On the mental toughness of self-aware athletes: Evidence from competitive tennis players. South African Journal of Science, 113(1/2), 6. https://doi.org/10.17159/sajs.2017/20160112
Dagnall, N., Denovan, A., Papageorgiou, K. A., Clough, P. J., Parker, A., & Drinkwater, K. G. (2019). Psychometric assessment of shortened Mental Toughness Questionnaires (MTQ): Factor structure of the MTQ-18 and the MTQ-10. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01933
Debelak, R., & Koller, I. (2020). Testing the local independence assumption of the Rasch Model with Q3 -based nonparametric model tests. Applied Psychological Measurement, 44(2), 103–117. https://doi.org/10.1177/0146621619835501
Denovan, A., Dagnall, N., Hill-Artamonova, E., & Musienko, T. (2021). Mental Toughness Questionnaire (MTQ18): A Russian version. National Security and Strategic Planning, 2021(3), 47–59. https://doi.org/10.37468/2307-1400-2021-3-47-59
Ferreira, A. J., & Figueiredo, M. A. T. (2012). Boosting algorithms: A review of methods, theory, and applications. In C. Zhang & Y. Ma (Eds.), Ensemble Machine Learning (pp. 35–85). Springer New York. https://doi.org/10.1007/978-1-4419-9326-7_2
Fokkema, M., Iliescu, D., Greiff, S., & Ziegler, M. (2022). Machine learning and prediction in psychological assessment. European Journal of Psychological Assessment, 38(3), 165–175. https://doi.org/10.1027/1015-5759/a000714
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5). https://doi.org/10.1214/aos/1013203451
Geiger, R. S., Yu, K., Yang, Y., Dai, M., Qiu, J., Tang, R., & Huang, J. (2020). Garbage in, garbage out? Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 325–336. https://doi.org/10.1145/3351095.3372862
Gerber, M., Brand, S., Feldmeth, A. K., Lang, C., Elliot, C., Holsboer-Trachsler, E., & Pühse, U. (2013). Adolescents with high mental toughness adapt better to perceived stress: A longitudinal study with Swiss vocational students. Personality and Individual Differences, 54(7), 808–814. https://doi.org/10.1016/j.paid.2012.12.003
Gerber, M., Feldmeth, A. K., Lang, C., Brand, S., Elliot, C., Holsboer-Trachsler, E., & Pühse, U. (2015). The relationship between mental toughness, stress, and burnout among adolescents: A longitudinal study with Swiss vocational students. Psychological Reports, 117(3), 703–723. https://doi.org/10.2466/14.02.PR0.117c29z6
Gonzalez, O. (2021). Psychometric and machine learning approaches to reduce the length of scales. Multivariate Behavioral Research, 56(6), 903–919. https://doi.org/10.1080/00273171.2020.1781585
Grömping, U. (2016). Practical guide to logistic regression. Journal of Statistical Software, 71(3), 1–5. https://doi.org/10.18637/jss.v071.b03
Gucciardi, D. F., Hanton, S., Gordon, S., Mallett, C. J., & Temby, P. (2015). The concept of mental toughness: Tests of dimensionality, nomological network, and traitness. Journal of Personality, 83(1), 26–44. https://doi.org/10.1111/jopy.12079
Guszkowska, M., & Wójcik, K. (2021). Effect of mental toughness on sporting performance: Review of studies. Baltic Journal of Health and Physical Activity, Supplement(2), 1–12. https://doi.org/10.29359/BJHPA.2021.Suppl.2.01
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1–11. https://doi.org/10.5121/ijdkp.2015.5201
Hsieh, Y.-C., Lu, F. J. H., Gill, D. L., Hsu, Y.-W., Wong, T.-L., & Kuan, G. (2024). Effects of mental toughness on athletic performance: A systematic review and meta-analysis. International Journal of Sport and Exercise Psychology, 22(6), 1317–1338. https://doi.org/10.1080/1612197X.2023.2204312
Irribarra, D. T., & Freund, R. (2022). Package ‘WrightMap.’ /https://cran.r-project.org/web/packages/WrightMap/WrightMap.pdf
Jamieson, S. (2004). Likert Scales: How to (ab)use them. Medical Education, 38(12), 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x
Kawabata, M., Pavey, T. G., & Coulter, T. J. (2021). Evolving the validity of a mental toughness measure: Refined versions of the Mental Toughness Questionnaire‐48. Stress and Health, 37(2), 378–391. https://doi.org/10.1002/smi.3004
Kilkenny, M. F., & Robinson, K. M. (2018). Data quality: “Garbage in – garbage out.” Health Information Management Journal, 47(3), 103–105. https://doi.org/10.1177/1833358318774357
Kim, J., & Oshima, T. C. (2013). Effect of multiple testing adjustment in differential item functioning detection. Educational and Psychological Measurement, 73(3), 458–470. https://doi.org/10.1177/0013164412467033
Kobasa, S. C. (1979). Stressful life events, personality, and health: An inquiry into hardiness. Journal of Personality and Social Psychology, 37(1), 1–11. https://doi.org/10.1037/0022-3514.37.1.1
Lang, C., Brand, S., Colledge, F., Ludyga, S., Pühse, U., & Gerber, M. (2019). Adolescents’ personal beliefs about sufficient physical activity are more closely related to sleep and psychological functioning than self-reported physical activity: A prospective study. Journal of Sport and Health Science, 8(3), 280–288. https://doi.org/10.1016/j.jshs.2018.03.002
Ley, C., Martin, R. K., Pareek, A., Groll, A., Seil, R., & Tischer, T. (2022). Machine learning and conventional statistics: Making sense of the differences. Knee Surgery, Sports Traumatology, Arthroscopy, 30(3), 753–757. https://doi.org/10.1007/s00167-022-06896-6
Lin, Y., Mutz, J., Clough, P. J., & Papageorgiou, K. A. (2017). Mental toughness and individual differences in learning, educational and work performance, psychological well-being, and personality: A systematic review. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.01345
Linacre, J. M. (1989). Rasch models from objectivity: A generalization. ERIC.
Linacre, J. M. (1998). Estimating measures with known polytomous item difficulties. Rasch Measurement Transactions, 12(2), 638.
Linacre, J. M. (2009). Dichotomizing rating scales. Rasch Measurement Transactions, 23(3), 1228.
Linacre, J. M., & Wright, B. (2012). Winsteps help for Rasch analysis. Winsteps, USA.
Magis, D., Beland, S., & Raiche, G. (2020). Package ‘difR.’ https://cran.r-project.org/web/packages/difR/difR.pdf
Mair, P., Rusch, T., Hatzinger, R., Maier, M. J., & Debelak, R. (2024). Package ‘eRm.’ https://cran.r-project.org/web/packages/eRm/eRm.pdf
Marôco, J. (2024). Factor analysis of ordinal items: Old questions, modern solutions? Stats, 7(3), 984–1001. https://doi.org/10.3390/stats7030060
Meggs, J., Chen, M. A., & Koehn, S. (2019). Relationships between flow, mental toughness, and subjective performance perception in various triathletes. Perceptual and Motor Skills, 126(2), 241–252. https://doi.org/10.1177/0031512518803203
Moustafa, R. E. (2011). Andrews curves. WIREs Computational Statistics, 3(4), 373–382. https://doi.org/10.1002/wics.160
Nicholls, A. R., Morley, D., & Perry, J. L. (2016). Mentally tough athletes are more aware of unsupportive coaching behaviours: Perceptions of coach behaviour, motivational climate, and mental toughness in sport. International Journal of Sports Science & Coaching, 11(2), 172–181. https://doi.org/10.1177/1747954116636714
Nicholls, A. R., Polman, R. C. J., Levy, A. R., & Backhouse, S. H. (2009). Mental toughness in sport: Achievement level, gender, age, experience, and sport type differences. Personality and Individual Differences, 47(1), 73–75. https://doi.org/10.1016/j.paid.2009.02.006
Orrù, G., Monaro, M., Conversano, C., Gemignani, A., & Sartori, G. (2020). Machine learning in psychometrics and psychological research. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.02970
Papageorgiou, K. A., Denovan, A., & Dagnall, N. (2019). The positive effect of narcissism on depressive symptoms through mental toughness: Narcissism may be a dark trait but it does help with seeing the world less grey. European Psychiatry, 55, 74–79. https://doi.org/10.1016/j.eurpsy.2018.10.002
Perry, J. L., Clough, P. J., Crust, L., Earle, K., & Nicholls, A. R. (2013). Factorial validity of the Mental Toughness Questionnaire-48. Personality and Individual Differences, 54(5), 587–592. https://doi.org/10.1016/j.paid.2012.11.020
Perry, J. L., Strycharczyk, D., Dagnall, N., Denovan, A., Papageorgiou, K. A., & Clough, P. J. (2021). Dimensionality of the Mental Toughness Questionnaire (MTQ48). Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.654836
Pornel, J. B., & Saldaña, G. A. (2013). Four common misuses of the Likert scale. Philippine Journal of Social Sciences and Humanities, 18(2), 12–19.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502. https://doi.org/10.1007/BF02294403
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197–207. https://doi.org/10.1177/014662169001400208
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear Estimation and Classification (pp. 149–171). Springer. https://doi.org/10.1007/978-0-387-21579-2_9
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability. Personnel Psychology, 53(4), 901–912. https://doi.org/10.1111/j.1744-6570.2000.tb02422.x
Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3(2), 205–231. http://www.ncbi.nlm.nih.gov/pubmed/12011501
Stimson, J. R., Cromwell, E. K., & Cromer, L. D. (2022). The predictive validity of the MTQ48 for academic and athletic success in student-athletes. Journal for the Study of Sports and Athletes in Education, 1–13. https://doi.org/10.1080/19357397.2022.2143148
Sullivan, G. M., & Artino, A. R. (2013). Analyzing and Interpreting data from Likert-type scales. Journal of Graduate Medical Education, 5(4), 541–542. https://doi.org/10.4300/JGME-5-4-18
Trognon, A., Cherifi, Y. I., Habibi, I., Demange, L., & Prudent, C. (2022). Using machine-learning strategies to solve psychometric problems. Scientific Reports, 12(1), 18922. https://doi.org/10.1038/s41598-022-23678-9
Vujovic, Ž. Ð. (2021). Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 12(6). https://doi.org/10.14569/IJACSA.2021.0120670
Wakita, T., Ueshima, N., & Noguchi, H. (2012). Psychological distance between categories in the Likert Scale. Educational and Psychological Measurement, 72(4), 533–546. https://doi.org/10.1177/0013164411431162
Wolins, L., Wright, B. D., & Masters, G. N. (1983). Rating scale analysis: Rasch measurement. Journal of the American Statistical Association, 78(382), 497. https://doi.org/10.2307/2288670
Wright, B. D. (1996). Comparing Rasch measurement and factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 3(1), 3–24. https://doi.org/10.1080/10705519609540026
Yarayan, Y. E., Solmaz, S., Aslan, M., Batrakoulis, A., Al-Mhanna, S. B., & Keskin, K. (2024). Sex differences in athletic performance response to the imagery and mental toughness of elite middle- and long-distance runners. Sports, 12(6), 141. https://doi.org/10.3390/sports12060141
Zheng, X., Wang, B., Liu, H., Wu, W., Sun, J., Fang, W., Jiang, R., Hu, Y., Jin, C., Wei, X., & Chen, S. S.-C. (2023). Diagnosis of Alzheimer’s disease via resting-state EEG: Integration of spectrum, complexity, and synchronization signal features. Frontiers in Aging Neuroscience, 15. https://doi.org/10.3389/fnagi.2023.1288295
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Psikohumaniora: Jurnal Penelitian Psikologi

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright of the accepted article shall be assigned to the publisher of the journal. The intended copyright includes the right to publish the article in various forms (including reprints). The journal maintains the publishing rights to published articles.
In line with the license, authors and any users (readers and other researchers) are allowed to share and adapt the material only for non-commercial purposes. In addition, the material must be given appropriate credit, provided with a link to the license, and indicated if changes were made. If authors remix, transform, or build upon the material, authors must distribute their contributions under the same license as the original.
 
						

