News Details

img

AI in Peer Review

AI peer review needs to be peer-reviewed

IoP Publishing’s discovery that researchers are split down the middle on the merits of using AI in peer review is not surprising given the complexity of the issue.

The publisher’s August poll of just under 350 physics researchers found that 41 per cent were positive about the use of AI in peer review and 37 per cent were negative.

The case for using AI is obvious. With more than 30,000 academic journals in existence, the peer review process is highly human-intensive. To illustrate this point, if the average journal published 50 papers per year, with each submitted paper being reviewed by two independent referees and each review requiring four hours of effort, that translates into an annual 12 million hours of peer review work.

Moreover, that figure does not include the work of the editorial boards that oversee the review process or the editorial staff who process the papers – not to mention the time editors take trying to find suitable reviewers willing to take on a manuscript. It also ignores the fact that many papers get rejected on first submission, creating a multiplier effect that may increase the annual reviewing burden by a factor of five or more.

Replacing human referees with generative AI instead would therefore ease a reviewing burden that is commonly agreed to be verging on unsustainable.

Then there is the issue of response time, from submission to getting the reviews back to the authors. Many journals promise rapid review. Yet speed must be balanced against the quality and depth of the reviews provided. With AI-generated reviews, time would no longer be an issue.

Furthermore, if AI peer reviewing did become widely available, researchers could use it to evaluate their research manuscripts before submitting to a journal. Incorporating such a system into repositories, such as ArXiv, would facilitate this, making peer review part of the research process itself by offering suggestions that impact the final product.

But, of course, there are also challenges to adopting AI. The purpose of peer review is to answer three questions about the manuscript. Is the research new? Are the research results correct? And do the results add intellectual value in a field or provide benefits in a discipline or beyond?

Most research incrementally builds on existing results, and AI systems are well suited to make such evaluations. They can respond to an informal checklist of measures that capture what the research manuscripts build on and how well it has used the scientific method to achieve its objectives. This is no different from how a human peer reviewer would proceed.

However, the answers to the second two of the three questions above are highly dependent on the field of study and the type of research conducted. Although the scientific method is likely at the foundation of most research and discovery in STEM disciplines and some of the social sciences, variations in theoretical, experimental and data analytics research make a one-size-fits-all approach to AI peer review problematic.

In addition, if the research breaks new ground, providing a quantum shift in thinking, making such evaluations would be more difficult given that the existing literature would not provide any foundation to evaluate such new ideas.

Perhaps the most difficult role for AI would be to assess the third question, pertaining to the value and benefits of the research. Although such evaluation is highly subjective, it is often what provides the insight that is at the core of peer review’s value.

Donald Trump’s recent executive order, ”Restoring Gold Standard Science”, calls for the adoption of “unbiased peer review” to improve the research process, including how research is disseminated and evaluated. That could be read as an implicit call for the adoption of AI peer review. But, of course, bias is always in the eye of the beholder. While Trump and his MAGA allies might see research on gender or climate change as being of little value, others will disagree. AIs are no more “unbiased” than humans in that sense – as the repeated reprogramming of Elon Musk’s Grok AI aptly demonstrates.

Moreover, large language models must be trained with data, which itself may – depending on your opinion – be biased or contaminated with information that is demonstrably false. Although AI systems look smart, they are doing nothing more than regurgitating what they learned when trained. As the data modelling adage goes: “garbage in, garbage out”.

To return to the issue of recognising the value of groundbreaking research, it is possible that an AI’s training data could inadvertently create a “group think” assessment, which uprates research that methodologically builds on existing knowledge but fails to recognise the benefits of “out-of-the-box” ideas, potentially disincentivising research creativity.

In my view, AI systems are not yet on par with human judgment when it comes to assessing the value and significance of research. But we should not rely on hunches. To test this possible limitation, an AI peer-review process should be implemented in parallel with human peer review, with the human peer reviewers allowed to see the AI peer review after they complete their own assessments. It may well turn out that humans and AIs agree with each other much more frequently in some fields than others, with the former more suited to a switch to AI reviewing.

Ultimately, AI’s most appropriate role might be to support human peer review, rather than replace it, picking up more perfunctory issues while the human reviewer connects the dots and has the final say. But we don’t know. And the bottom line is that we must proceed with caution until we do.

  • SOCIAL SHARE :