Two recent papers have questioned the assumption that validity scales in personality testing, such as social desirability, address inherent problems of self-report data. The argument is that the inclusion of a response bias indicator somehow provides a litmus test of the validity of a personality report. But like many assumptions when put to the test, this basis appears less robust.

McGrath, R., Mitchell, M., Kim, B.H., & Hough, L. (2010). Evidence for Response Bias as a source of error variance in applied assessment. Psychological Bulletin, 136(3), 450-470.


“After 100 years of discussion, response bias remains a controversial topic in psychological measurement. The use of bias indicators in applied assessment is predicated on the assumptions that (a) response bias suppresses or moderates the criterion-related validity of substantive psychological indicators and (b) bias indicators are capable of detecting the presence of response bias. To test these assumptions, we reviewed literature comprising investigations in which bias indicators were evaluated as suppressors or moderators of the validity of other indicators. This review yielded only 41 studies across the contexts of personality assessment, workplace variables, emotional disorders, eligibility for disability, and forensic populations. In the first two contexts, there were enough studies to conclude that support for the use of bias indicators was weak. Evidence suggesting that random or careless responding may represent a biasing influence was noted, but this conclusion was based on a small set of studies. Several possible causes for failure to support the overall hypothesis were suggested, including poor validity of bias indicators, the extreme base rate of bias, and the adequacy of the criteria. In the other settings, the yield was too small to afford viable conclusions. Although the absence of a consensus could be used to justify continued use of bias indicators in such settings, false positives have their costs, including wasted effort and adverse impact. Despite many years of research, a sufficient justification for the use of bias indicators in applied settings remains elusive”.

This is not merely a theoretical question but one that has real practical implications, as captured in the conclusion of the paper:

“What is troubling about the failure to find consistent support for bias indicators is the extent to which they are regularly used in high-stakes circumstances, such as employee selection or hearings to evaluate competence to stand trial and sanity. If the identification of bias is considered essential, perhaps the best strategy would be to require convergence across multiple methods of assessment before it is appropriate to conclude that faking is occurring (Bender & Rogers, 2004; Franklin, Repasky, Thompson, Shelton, & Uddo, 2002).”

The failure of response style indicators has led researchers, such as Uziel (2010), to dispute that their interpretation should be better defined. The argument is that response style indicators are not in themselves a measure of the validity of the assessment but have more to do with a person’s perceived impression management and self-control.

Uziel, L. (2010). Rethinking social desirability scales: From impression management to interpersonally oriented self-control. Perspectives on Psychological Science, 5(3), 243-262.


“Social desirability (specifically, impression management) scales are widely used by researchers and practitioners to screen individuals who bias self-reports in a self-favoring manner. These scales also serve to identify individuals at risk for psychological and health problems. The present review explores the evidence with regard to the ability of these scales to achieve these objectives. In the first part of the review, I present six criteria to evaluate impression management scales and conclude that they are unsatisfactory as measures of response style. Next, I explore what individual differences in impression management scores actually do measure. I compare two approaches: a defensiveness approach, which argues that these scales measure defensiveness that stems from vulnerable self-esteem; and an adjustment approach, which suggests that impression management is associated with personal well-being and interpersonal adjustment. Data from a wide variety of fields including social behavior, affect and wellbeing, health, and job performance tend to favor the adjustment approach. Finally, I argue that scales measuring impression management should be redefined as measures of interpersonally oriented self-control that identify individuals who demonstrate high levels of self-control, especially in social contexts”.

It is my belief that the solution to this problem for I/O psychologists lies in the application of the measures. Impression scales tend to be used only in selection settings, and as such, it is rare that personality reports become a ‘decision-maker’. They are merely part of a body of evidence to describe an individual’s suitability for a role. Validity scales should simply provide a measure of how much weight one can put on the personality measure. Any interpretation over and above this is overstepping the mark. In the absence of a ‘valid’ personality report, one must rely more heavily on other sources of data like interviews and CVs. It is not that people might intentionally misrepresent themselves, but we cannot be confident that a personal report is an accurate portrayal of character.

This same logic, when applied to covert measures of integrity (which at best are measures of conscientiousness), should not be used to screen individuals out of the selection process. With a covert test, there are assumptions made as to what is being measured. The point-to-point correspondence (George and Smith) is one step removed. One must ‘infer’ that integrity is measured, that the measure is work-related, and then define the construct. Hogan’s theoretical work on whether constructs even exist is very applicable here and the onus of proof is much more evident with a covert measure. I believe this is a problem given the high-stake nature of integrity testing.

Overt measures, such as the Stanton Survey of Integrity (SSI), don’t make such claims. The questions are overt and the construct is defined exactly as measured, such as rule-breaking. The report even highlights the questions people answer that may be of concern, i.e. ‘this person has admitted to …’ It is a far smaller leap of logic to assume that those who admit more of these behaviours are at higher risk than those that do not. This is supported by the various distributions for the SSI between law-abiding citizens and those who have broken the law.

Before looking at the SSI data, both in New Zealand and internationally, I was incredulous. However, the reality is that there is a good spread across the one scale measured and people do admit to a range of behaviours. Once that behaviour has been admitted, it is up to an organisation to decide what to do. There is little inference that needs to be made as the respondent has provided the information directly as to what they would or wouldn’t do, and what behaviour they see as acceptable. This latter point is key to how overt measures, such as the SSI, work – they examine what behaviours a person has normalised.

In conclusion, integrity measures and response style indicators share a common logic. They are both aimed at eliminating true negatives (not the identification of true positives as many researchers have assumed). What can be drawn from both measures is related to the overt nature of the questions. The more overt the questions are, the more justifiable the assumption is, that the respondent is presenting behaviours or a report that is undesirable. The more covert, the more the scale or measure can be used to determine the confidence level of the evidence and any further extrapolation from that point is unwarranted.