AI and Hallucinations in the GMP Environment: What AI Has to Say about it
The use of AI is currently the subject of extensive debate within the GMP environment. Many companies are keen to carry out GMP tasks with AI support or even entirely using AI systems. But what is possible, and where are the limits?
Regulatory requirements for AI in the GMP environment
The EU has recognised this challenge and published a draft of EU GMP Annex 22 on 7 July 2025. This draft prompted a large number of comments from industry associations; see, for example, the ECA’s and the European QP Association’s comments on Annex 22. A key focus in many comments is the desire to be able to use generative AI, including, for example, LLMs (such as ChatGPT, Gemini, etc.). The current draft of EU GMP Annex 22 excludes such systems from GMP. In addition, the EMA has also published the "Reflection paper on the use of artificial intelligence in the lifecycle of medicines", providing further information.
Critics of generative AI highlight the risk of hallucinations in AI systems, which can result in incorrect decisions being made in a GMP-regulated environment. However, can hallucinations be entirely eliminated by implementing controlled regulation, validation and monitoring of generative AI systems?
What could be more obvious than asking the AI itself?
We asked ChatGPT 5.5 and have included its response below. We also consulted other AI systems, including Mistral Large 3, Gemini 2.5 Pro and Opus 4.7. Their assessments are very similar.
The question: Are there any settings, validation activities or other additional security measures I can take to completely avoid AI hallucinations when deploying AI systems in a GMP-regulated environment?
‘In a GMP-regulated environment, you cannot currently assume that a generative AI system will operate without error in the long term – particularly when it comes to autonomous decisions or reviews with regulatory implications.
In the pharmaceutical context, this is a key issue because GMP systems are based on the following fundamental principles:
" Determinism
" Reproducibility
" Traceability
" Validability
" Controlled change management
" Documented decision-making logic
Generative AI violates several of these properties, at least in part.
The core issue under GMP: Ideally, a standard validated system behaves as follows: same input -> same output
With LLMs, this is often not the case:
" Answers can vary,
" Weightings change,
" Model updates alter behaviour,
" Hallucinations remain possible,
" Justifications can seem contrived,
" Uncertainty is difficult to quantify.
This is critical for GMP.
Example: Batch Record Review
When an AI checks for:
" missing signatures,
" incorrect thresholds,
" deviations,
" inconsistencies,
" OOS/OOT indicators,
" process violations,
regulatory risks arise when the AI:
" overlooks something,
" misinterprets something,
" merges data incorrectly,
" or generates a plausible but incorrect justification.
The problem is not just the error itself, but:
" a lack of predictability,
" a lack of complete explainability,
" and the difficulty of formal validation.
What regulators currently tend to find more acceptable:
" Assistance systems
" Prioritisation
" Recommendations
" Semantic search
" Document summarisation
" Signal detection
" Risk indicators
Less commonly accepted:
" autonomous release decisions,
" final QA decisions,
" independent deviation assessment,
" fully automated batch release.
The human typically remains the ‘final decision-maker’.
Relevant to GMP validation: The following apply in GMP environments, amongst others:
" European Medicines Agency
" U.S. Food and Drug Administration
" GAMP 5
" CSV / CSA
" Annex 11
" 21 CFR Part 11
" Data integrity requirements
These require system behaviour that can be controlled. It is difficult to fully validate a probabilistic model because:
" the state spaces are enormous,
" outputs cannot be fully tested,
" and there is no complete specification of all possible responses.
That is why a new approach is currently emerging. The industry is moving more towards:
AI as ‘decision support’
rather than
AI as ‘decision authority’
In other words:
" AI identifies anomalies,
" generates draft reviews,
" flags risks,
" suggests CAPAs,
" prioritises documents,
but:
"QA checks,
"QA approves,
"QA remains responsible.
Technically, it is possible to significantly reduce this risk. For example, by using:
" isolated models,
" no free generation,
" constrained generation,
" rule-based checks,
" fixed ontologies,
" retrieval from SOPs,
" audit trails,
" deterministic pipelines,
" human-in-the-loop,
" statistical performance measurement,
" continuous revalidation.
In that case, the AI would be more of a ‘rule-based analysis system’ rather than a ‘free-form language model’
Real-world reality. Many pharmaceutical companies are already experimenting with AI in:
" QA
" Regulatory Affairs
" Pharmacovigilance
" Manufacturing
" Technology Transfer
" Document review
But usually within very narrow limits and under strict human supervision. The most challenging regulatory issue is not so much:
‘Can the AI usually get it right?’ but rather: ‘Can I demonstrate to an auditor that the system operates in a controlled, reproducible and valid manner?’ And that is precisely where generative AI is still causing problems at the moment.
The AI’s self-assessment provides some interesting insights here. There is great anticipation as to how the EU will address the issue of generative AI use in the GMP environment in the final Annex 22. The ECA Academy is keeping pace with developments in AI within the GMP environment through a range of training events, both live and on-demand.
How do you view the use of AI, particularly generative AI, in the GMP environment? What has been your experience? Do you expect staff numbers to be reduced and staff to be (partially) replaced by AI? Do you see more benefits or more risks? Please email us at info@concept-heidelberg.de with the subject line: ‘Experiences with AI use’ .
Related GMP News
22.04.2026Use of AI Agents leads to the first FDA Warning Letter relating to AI
05.11.2025What Requirements does the new Annex 22 place regarding Personnel?
08.09.2025What is "Human-in-the-Loop"?

