I recently received a reply to an old blog that I had on the OPRA blog.
My response covers some of the nuisances to Ipsative testing that may be of interest to some.
While my reading is now less in areas of issues of measurement, I do keep abreast on issues around ipsative testing, in so much that I can discuss with clients if asked. I have not written on the topic in a while. Your short comment has spurred me on to do an update-thank you.
I believe a critical paper that you may be referring to about quasi-ipsative testing may be the 2014 paper by Salgado, & Táuriz [i]. If I’m correct, then the first point that I would note, which, I believe may be in line with your position, in that it makes the distinction between ipsative and quasi-ipsative tests, in test scoring. I agree this is an important distinction to make in so much as it makes clear the difference. Arguments I have raised previously, such as challenging the supposed increase in criterion validity of ipsative testing, are not supported in the paper. The issue here is quasi-ipsative tests.
Hence, let me provide my thoughts on the paper, in so much as it relates to the critical argument, I have on quasi-ipsativitity for personality testing. I want to note from the outset that the paper’s analysis does not appear to include the Wave. While I appreciate that you are not making this claim, I think it is an essential point for clarity, given potential assumptions by readers.
The number of studies in the paper related to quasi-ipsative tests is small, which is in no way a critique of the researchers who have done a thorough review of the area. However, the proliferation of quasi-methods for scoring, in widely used commercial tests, is relatively new, and it is only in more recent years that there is a growth of these methods, in part to address known weaknesses with earlier versions of ipsative tests, such as previous versions of the OPQ.
The research looks at whether quasi-ipsative tests are more predictive than traditional ipsative measures, and the findings would indeed support this conjecture. The researchers identify six types of quasi-ipsativity in the paper, and it is unclear, at least on my reading, as to what change leads to improved validity. Suffice to say that the closer one is to pure, non-quasi, ipsative testing the lower the criterion validity.
Moreover, the criterion validities cited are so-called ‘true’ coefficients of the relationship between personality and outcome variables. While the techniques used are accepted practice in our industry, this type of analysis, is increasingly coming under criticism as it inflates the effect and draws away from the inherently large variance between studies. Criterion validity is a necessary but not sufficient marker for the application of psychometric tests. Given the statistical manipulations readers must take care as to their understanding of claims made based on the established validities, especially when those claims are that assessment X approach produces a higher relationship with job performance that assessment Y. My issue is not with a test per say but a methodology and I try and stay clear of debates that are I believe are more grounded in marketing than science.
Most importantly, the paper does not adequately address the fundamental measurement issues, that are at the base of my critique of the use of ipsative and quasi-ipsative testing for selection, as their focus is criterion-related validity. The paper is a good read, and the work important to our field but arguments on adjusted criterion-validity alone fails to address my concerns. Indeed, recent literature does nothing but confirms the problems of using ipsative testing for the measurement of personality for the application of screening (see below). I have made my views clear in previous blogs (1, 2, 3) and will not repeat the arguments here other than to give the central gist.
- Ipsative testing for the measurement of personality is problematic on both logical and measurement grounds:
- Personality is not a within-person construct. We understand as a measurement between people. To say one is extraverted is to say they are more extraverted than others.
- Ipsative tests often have a problem with reliability that challenge the consistency of the underlying constructs. While I recognise claims to the contrary, such in this paper, the bulk of the evidence, not mention, recent studies (such as those below), indicate reliability issues. In the age of the replication crisis, I will put my confidence in the bulk of the work.
- A traditional selling point for producers of ipsative tests is the capacity to get around the problem of faking. The problem, however, is that the likelihood that candidates will put their best forward in an assessment is, I would argue, a given. Hence, faking is not overcome by Ipsative testing; it merely makes it harder to trace. The issue is seemingly more related to a person’s ability to identify the criteria being selected for [ii].
In the spirit of dialogue on the conversation, I note recent research in the area which supports my claims (while also challenging the criterion superiority):
- Note even high-dimensionality can address the problems with ipsative measurement:
Schulte, N., Holling, H., & Bürkner, P. (2020). Can high-dimensional questionnaires resolve the ipsativity issue of forced-choice response formats?. Educational and Psychological Measurement (https://doi.org/10.1177/0013164420934861 ), In Press, 1-28.
- The criterion-related validity of ipsative tests is likely to be lower, not higher than CTT:
Fisher, P.A., Robie, C., Christiansen, N.D., Speer, A.B., & Schneider, L. (2019). Criterion-related validity of forced-choice personality measures: A cautionary note regarding Thurstonian IRT versus Classical Test Theory scoring. Personnel Assessment and Decisions (https://scholarworks.bgsu.edu/pad/vol5/iss1/3/ ), 5, 1, 1-14.
Moving on from ipsative tests, at a practical level, I think that the commercialisation of the industry often leads to variations designed around creating unique selling points, and marketing to the sky on these differences, rather than on the science of personality testing and the ethical approach to selection. The testing industry should promote the reality of occupational testing and the link with the science of personality.
- Admission of the underlying testing, given the science of personality – There are a small set of robust traits that make sense to assess. No test publisher has discovered, let alone owns, the occupational model of personality, as if there was such a thing.
- That as the factors in models increase the separation between scales decreases. Essentially you simply have rulers measuring the same thing and calling it something different.
- The Principles of measurement ( see anything by Michell) are important and relate to what we can or cannot achieve by assessing psychological attributes and human behaviour within a selection environment. Human behaviour is variable, and therefore we must be clear on there are significant constraints to our level of accuracy given the reality of human behaviour.
- That testing is not there to catch candidates out. Instead, it is a straightforward methodology to get someone to describe themselves using responses to behaviours or an agreed taxonomy which is then combined with other information to make an informed selection decision. Some people may overrepresent, but this is natural, and making claims that we have ways to stop this are exaggerated or may deliberately allow people to assume that their test is somehow cheat-proof.
For these reasons, not to mention potential issues of adverse impact [iii] I’m not swayed by the arguments for the introduction of quasi-ipsative testing, let alone ipsative testing for personality testing, especially in the area of selection.
[i] Salgado, J.F., & Táuriz, G. (2013) The Five-Factor Model, forced-choice personality inventories and performance: A comprehensive meta-analysis of academic and occupational validity studies. European Journal of Work and Organizational Psychology, earlyview, , 1-29
[ii] Kleinmann, M., Ingold, P.V., Lievens, F., Jansen, A., Melchers, K.G., & König, C.J. (2011). A different look at why selection procedures work : The role of candidates’ ability to identify criteria. Organizational Psychology Review, 1, 2, 128-146. And Klehe, U., Kleinmann, M., Hartstein, T., Melchers, K.G., König, C.J., Heslin, P.A., & Lievens, F. (2012). Responding to personality tests in a selection context: The role of the ability to identify criteria and the ideal-employee factor. Human Performance, 25, 4, 273-302.
[iii] Anderson, N., & Sleap, S. (2004). An evaluation of gender differences on the Belbin team role self‐perception inventory. Journal of Occupational and Organizational Psychology, 77(3), 429-437.