Liner AI peer review
1 April 2026
I recently submitted a manuscript on DNA replication dynamics in bacteria to yet another AI-powered peer review tool (see previous review here). Not because I believed it would replace human reviewers, but because I was curious what it would produce.
Liner AI offers support that appears to be geared towards academics and is supposed to help with research as well as writing. If writing is selected, Peer Review can be found in the list of agents on the left.

© RudolphLAB, 2026
The service offers some free credits, so I decided to see what happened when artificial intelligence evaluated real scientific work. What came back was fascinating – and revealing about both the capabilities and fundamental limitations of AI in scientific peer review.

© RudolphLAB, 2026
First impressions: almost too good
The AI review looked not bad on first glance.
It included:
- A structured meta-review summarizing strengths and weaknesses
- Five separate "reviewer" perspectives (Novelty, Rigor, Clarity, Impact, and Limitation)
- Specific recommendations with references to manuscript sections
- Balanced, measured tone throughout that we always see with AI these days
- Citations to relevant literature
This apparent quality is precisely what makes it interesting. And potentially dangerous.
The core problem: building a different paper
Once you read the AI review carefully, a pattern emerges. A large fraction of the recommendations fall into this category:
- You should measure mutation rates
- You should quantify DnaA protein levels
- You should perform transcriptomics
- You should test alternative antibiotics
- You should include proteomics
- You should determine MIC values
- You should test division mutants
- You should add time-resolved kinetics
The list goes on. And on. And ON.

© RudolphLAB, 2026
Here is the fundamental issue: the AI assumes that if something is interesting, the paper must fully explain it mechanistically. That is not how real science publishing works – or at least not how it should work.
Our manuscript is a physiological observation combined with a methodological warning. It is cell biology and experimental methodology. It was never meant to be a complete mechanistic dissection, quite simply because we neither have the funds nor the manpower to develop it further than we currently have.
A human reviewer typically asks: Is the conclusion supported? Is the scope reasonable? Is the claim appropriate given the evidence?
The AI instead asks: What would the perfect paper on this topic look like?
Those are very different questions. The first evaluates what you have done. The second imagines what you could have done if you had infinite time, infinite funding and infinite graduate students.
The constraint blindness
The AI review repeatedly suggests experiments that are scientifically reasonable but operationally unrealistic:
- Perform single-cell transcriptomics
- Conduct proteomics on drug-treated populations
- Measure genome-wide mutation frequencies
- Screen other drugs systematically
- Include comprehensive dose-response curves
- Add viability assays at multiple timepoints
None of these suggestions are necessarily wrong in a wider sense. But together, they describe a project that would take many additional years, require substantially more funding – and likely never get finished.

© RudolphLAB, 2026
The AI never asks: Is this needed to support the claim? Is this within reasonable scope? Is this realistic for one paper? Would this become a different project entirely?
This reflects a real problem in modern scientific publishing: the creep toward demanding, endless additional experiments before accepting work. AI reviews may unintentionally accelerate this trend because they optimize for completeness rather than feasibility.
A human reviewer with experience often thinks: "This would be nice, but it is not necessary for the points the authors are trying to make." The AI thinks: "If it is possible it should definitely be done."
When AI hallucinates criticism
More troubling, the AI sometimes criticized things that were already in the manuscript. This happens because AI review relies on pattern matching rather than genuine comprehension. It detects topics – replication, antibiotics, microscopy, statistics – and generates a generic checklist of expected issues. Some match real problems. Others are hallucinated relevance that sounds plausible but does not apply.
This makes the review look thorough while being partly disconnected from what is actually written.
Comparing to Human Reviewers
We have had this manuscript reviewed by human referees previously (an earlier version, admittedly). The contrast is instructive.
What human reviewers did:
Reviewer 1 made a specific technical disagreement. Right or wrong, this was an informed, contextual critique based on understanding how the drugs under investigation work. They questioned our experimental regime, asked for controls, and made specific procedural criticisms.
Reviewer 2 wanted mechanistic experiments, framing this as a "major concern" about understanding the molecular mechanism, a point we never intended to fully address – see funding, time and manpower constraints described above.
Both human reviewers also engaged in citation nitpicking, terminology arguments and extensive writing complaints (sentence structure, figure legends, terminology consistency).
What the AI did differently:
The AI provided more systematic coverage – every major experimental approach got evaluated against an apparent internal checklist. It identified some weaknesses we had possibly missed, such as unclear statistical reporting in some figures, insufficient imaging parameter details and other points.
However, here is the fly in the ointment: AI showed no contextual understanding. It did not grasp that this was a physiological observation paper, not a mechanistic study. It could not distinguish "this would strengthen the claim" from "this would be a different paper." It never made specific technical disagreements based on understanding bacterial physiology. It is something that I find more and more worrying these days.
Just as one example that always crops up in progression reviews for my PhD students. Let's assume we are measuring UV survival rates of very sensitive deletion mutants. Strains lacking the RecA recombinase, the main recombinase in E. coli, are exquisitely UV sensitive, several orders of magnitude below the wild type. Can I do a statistical evaluation of this difference? Sure! Do I actually have to do this? Well, my colleagues seem to think so, mainly because as part of their work they often get differences not particularly large, sometimes with error bars clearly overlapping. Here the question makes sense: are the differences observed likely to be significant? Statistics will provide a useful frame of reference to answer this question. But for the example of a strain that is consistently exquisitely sensitive to UV, with survival rates 1000-fold below wild type or more? It is telling that in the older papers that investigated survival rates for such mutants p values are very rarely shown. It is a waste of time. The effect is so significant that the answer is indeed obviuos. Asking for a statistical evaluation shows nothing but inexperience of the reviewer/AI.
But I know from experience that these days authors rarely can get away with this approach. So, I ran into the following almost comical situation. I calculated p values for many data points, all way below 0.001. When I ran the manuscript again, the AI review now complained, quite rightly, that I had used the wrong test for the data. So, I ran another set of tests on the same datasets. It was clear before I even started the tests that the results were highly significant. The wrong tests produced p values in the area of 10–56, so I had only given "< 0.001" in the figures. So, I now ran test after test, without any impact on the work, with the exception of the statement which statistical test was actually used. But it made the AI review happier.
What the AI actually got right
To be fair, portions of the AI review were genuinely useful.
It correctly identified vulnerable points: mechanistic links that were speculative, sections with low cell numbers due to lysis, places where heterogeneity needed more discussion. It pushed for clearer -defined limitations, more precise wording and improved framing of contributions.
These are exactly the kinds of editorial feedback a competent reviewer provides. The AI is not producing nonsense only. It is producing a mixture of good editorial suggestions, generic checklist criticism and entirely unrealistic expansion requests.
The challenge is separating useful from impractical.
The Journal Scope Problem
Perhaps most tellingly, the AI review reads like evaluation for a Nature paper, not a specialized microbiology journal. It does not understand journal scope, impact level, or realistic expectations for different publication venues.
It evaluates every manuscript as if it should be complete, mechanistic, quantitative, clinically relevant, translational and future-proof. Real science publishing does not work that way. Different journals serve different purposes. Not every paper needs to be definitive.
But there is also a broader issue. AI tools meant to generate a review will generate a review. If you ask you will get an answer. Especially if you pay money for the service, the one thing that no one really wants to see is: "Great paper. Nothing to complain about. Good to go." It has to find points to correct, and pushing towards more is an easy way to achieve this.
Where this leaves us
My assessment: Technically impressive, editorially unrealistic, occasionally useful.
The AI review tool produces professional-looking output that would be genuinely helpful as a pre-submission checklist. "Have I addressed these standard concerns?" But it is not reliable as decision-making review because it lacks:
- Contextual understanding of scope and feasibility
- Ability to distinguish core claims from tangential extensions
- Recognition of resource and timeline constraints
- Domain-specific expertise for technical disagreements
- Understanding of journal-appropriate standards
Human reviewers have their own problems – inconsistency, bias, occasional incompetence, citation territorialism, writing style obsessions. But they can usually distinguish "this claim needs support" from "this would be interesting to know" and "this is feasible" from "this is a different project."
The AI cannot make those distinctions. Yet. It optimizes for an idealised version of every paper that may not exist in reality.
The practical takeaway
Should you use AI peer review tools? Perhaps, with clear-eyed understanding of what they provide.
Useful for:
- Identifying standard weaknesses you might have missed
- Checking statistical reporting completeness
- Flagging unclear sections
- Generating a pre-submission checklist
- Spotting missing methodological details
Not useful for:
- Deciding whether claims are supported
- Evaluating appropriate scope
- Understanding realistic resource constraints
- Making domain-specific technical judgments
- Replacing human editorial decisions
AI peer review is an interesting diagnostic tool. It is not a replacement for human expertise – and will not be until it can distinguish between "this would make a better paper" and "this would make a different paper."
In the meantime, we still need humans who understand that not every observation requires complete mechanistic dissection, not every interesting finding demands proteomics, and sometimes "we don't know why yet" is an acceptable answer if the observation itself is solid.
If you enjoyed this blog post, you might also enjoy this review.