AI hiring favors women over equally qualified men, study finds

newsweek.com
As artificial intelligence takes on a bigger role in corporate hiring -- with many companies touting its impartiality -- one researcher's findings suggest the technology may be more biased than humans, and is alread favoring women over equally qualified men.

David Rozado, an associate professor at the New Zealand Institute of Skills and Technology and a well-known AI researcher, tested 22 large language models (LLMs) -- including popular, consumer-facing apps like ChatGPT, Gemini, and Grok -- using pairs of identical résumés that differed only by gendered names. His findings revealed that every single LLM was more likely to select the female-named candidate over the equally qualified male candidate.

"This pattern may reflect complex interactions between model pre-training corpora, annotation processes during preference tuning, or even system-level guardrails for production deployments," Rozado told Newsweek.

"But the exact source of the behavior is currently unclear."

Rozado's findings reveal not just that AI models tend to favor women for jobs over men, but also how nuanced and pervasive those biases can be. Across more than 30,000 simulated hiring decisions, female-named candidates were chosen 56.9 percent of the time -- a statistically significant deviation from gender neutrality, which would have resulted in a 50-50 split.

When an explicit gender field was added to a CV -- a practice common in countries like Germany and Japan -- the preference for women became even stronger. Rozado warned that although the disparities were relatively modest, they could accumulate over time and unfairly disadvantage male candidates.

"These tendencies persisted regardless of model size or the amount of compute leveraged," Rozado noted. "This strongly suggests that model bias in the context of hiring decisions is not determined by the size of the model or the amount of 'reasoning' employed. The problem is systemic."

The models also exhibited other quirks. Many showed a slight preference for candidates who included preferred pronouns. Adding terms such as "she/her" or "he/him" to a CV slightly increased a candidate's chances of being selected.

"My experimental design ensured that candidate qualifications were distributed equally across genders, so ideally, there would be no systematic difference in selection rates. However, the results indicate that LLMs may sometimes make hiring decisions based on factors unrelated to candidate qualifications, such as gender or the position of the candidates in the prompt," he said.

Rozado, who is also a regular collaborator with the Manhattan Institute, a conservative think tank, emphasized that the biggest takeaway is that LLMs, like human decision-makers, can sometimes rely on irrelevant features when the task is overdetermined and/or underdetermined.

"Over many decisions, even small disparities can accumulate and impact the overall fairness of a process," he said.

However, Rozado also acknowledged a key limitation of his study: it used synthetic CVs and job descriptions rather than real-world applications, which may not fully capture the complexity and nuance of authentic résumés. Additionally, because all CVs were closely matched in qualifications to isolate gender effects, the findings may not reflect how AI behaves when candidates' skills vary more widely.

"It is important to interpret these results carefully. The intention is not to overstate the magnitude of harm, but rather to highlight the need for careful evaluation and mitigation of any bias in automated decision tools," Rozado added.

Even as researchers debate the biases in AI systems, many employers have already embraced the technology to streamline hiring. A New York Times report this month described how AI-powered interviewer bots now speak directly with candidates, asking questions and even simulating human pauses and filler words.

Jennifer Dunn, a marketing professional in San Antonio, said her AI interview with a chatbot named Alex "felt hollow" and she ended it early. "It isn't something that feels real to me," she told the Times. Another applicant, Emily Robertson-Yeingst, wondered if her AI interview was just being used to train the underlying LLM: "It starts to make you wonder, was I just some sort of experiment?"

Still, some organizations defend the use of AI recruiters as both efficient and scalable, especially in a world where the ease of online job-searching means open positions often field hundreds if not thousands of applicants. Propel Impact told the Times their AI interviews enabled them to screen 500 applicants this year -- more than triple what they managed previously.

Rozado, however, warned that the very features companies find appealing -- speed and efficiency -- can mask underlying vulnerabilities. "Over many decisions, even small disparities can accumulate and impact the overall fairness of a process," he said. "Similarly, the finding that being listed first in the prompt increases the likelihood of selection underscores the importance of not trusting AI blindly."

Not all research points to the same gender dynamic Rozado identified. A Brookings Institution study this year found that, in some tests, men were actually favored over women in 51.9 percent of cases, while racial bias strongly favored white-associated names over Black-associated names. Brookings' analysis stressed that intersectional identities, such as being both Black and male, often led to the greatest disadvantages.

Rozado and the Brookings team agree, however, that AI hiring systems are not ready to operate autonomously in high-stakes situations. Both recommend robust audits, transparency, and clear regulatory standards to minimize unintended discrimination.

"Given current evidence of bias and unpredictability, I believe LLMs should not be used in high-stakes contexts like hiring, unless their outputs have been rigorously evaluated for fairness and reliability," Rozado said.

"It is essential that organizations validate and audit AI tools carefully, particularly for applications with significant real-world impact."
2