Data quality for unbiased results: Preventing AI-induced hallucinations

AI Hallucination – AI lying and making mistakes, LLM misinterprets. Big Data and Artificial Intelligence Hallucination concept

Artificial Intelligence (AI) has revolutionized and will continue to transform many customer-facing industries. AI-powered business applications offer tangible value to customers and business operations alike. However, there are substantial risks to AI adoption. Large Language Models (LLMs) built on partly biased data or modeling have shown how hallucinations can lead to negative business outcomes (A. Pequeno, Feb. 2024).

Effective AI outcomes require rich, accurate and unbiased data. Biased, incomplete, unlabeled and inaccurate training or business data will often generate AI-induced “hallucinations.” Factual inconsistencies or subtly biased outcomes that may appear accurate and useful, but in reality, can negatively affect business decision making. Even clean business data contributes to hallucinations if upstream training data is biased or incomplete, or if AI results supervision is unable to access high-quality reference data and related knowledge. Because of this, it is not surprising that organizations seeking to tap into AI consider data challenges a primary obstacle as they work to train, deploy, scale and determine the return on investment of their AI initiatives. This points to the increasing need for tools and methods to confirm the trustworthiness of AI results.

Moving through fundamental steps in a real-world medical data example will outline data quality practices needed to ensure accurate AI outcomes. Start by profiling, cleaning, and enriching the training and business data with automated rules and reasoning. Apply expert semantics and visually supported retrieval augmented generation in high data quality environments for informed and observable supervised QA and training (S. Hedden, Dec. 2024). Automate QC testing and correction of outcomes with curated content and expert-driven outcome augmentation supported by business rules and semantics.

Preventing AI hallucinations from hampering business operations requires a comprehensive data quality approach, featuring “gold standard” training data; actively cleansed and augmented business data; and supervised AI training supported by observable content, machine reasoning and business rules. These factors must be complemented by automated testing and correction of outcomes supported by high-quality reference data, business rules, machine reasoning and RAG.

Ensuring accuracy in AI applications can mean life or death to humans and businesses

Exploring a classic medical risk example will demonstrate the critical need for accurate AI output – supported by clean data, process and outcomes observability, and automated results supervision.

In this scenario, a specific drug is prescribed as a patch, and the common dose is 15 milligrams. The drug is also available as a pill that requires a lower 5 milligram dose. An AI-enabled application may incorrectly generate a statement that combines the two pieces of information by stating a patient may take “a common 15mg dose, available in pill form.” It’s easy to miss the error, even for a human, but this makes for a potentially dangerous AI hallucination. A human medical expert paying close attention would likely recognize the error – taking 15 milligrams of the medication in pill form would be three times the recommended dose, potentially causing an overdose. A layperson naively asking an AI application about medical dosing might decide to take three 5 milligram pills – a potentially deadly outcome.

Here, a patient’s health and safety are at stake and deeply reliant on clean, well-labeled data and accurate AI outcomes. These errors can be avoided by combining high-quality training and reference data. Observable supervision and training of AI outcomes supported by semantic machine reasoning and business rules, and automated results checking that appeals to curated expert resources for results validation or correction, all contribute to a more functional AI system.

In addition to incorporating traditional data quality operations that cleanse, integrate, harmonize and enrich data, semantically informed rules supported by good data can ensure accurate business data and AI output. Comparing output to expected results supports empirical accuracy. An expert ontology combined with curated medical reference data, such as the Unified Medical Language System (UMLS), can automatically determine medication dosage based on its prescribed use or format. The system recognizes and corrects the error independently, asserting that “for this medication, pills are not prescribed or recommended above 5 milligrams.”

How can we ensure clean, comprehensive training and business data and accurate outcomes from AI applications associated with medical data? It’s critical to recommend the correct dose and route of administration for the proper medication.

The following practices are developed around this potentially life-saving example. These processes can be implemented with low-code, no-code platforms that reduce the technical demands related to engineering critical data quality workflows.

Start with cleansed and augmented training and data

To ensure high-quality training datasets, start by profiling, cleaning, and enriching training and business data as needed with automated rules and semantic inference. To avoid inaccurate AI outcomes (hallucinations), it is critical to use gold-standard reference datasets and clean, accurate business data. When training and business data are inaccurate, biased, or missing important metadata, AI applications will produce inaccurate or otherwise biased results.

Every AI project should start with active and core data quality management, including profiling, deduplication, cleansing, classification and enrichment. Think of it as ‘great data in – great business results out.’ Ideally, training data is curated and integrated from multiple sources to create high-quality demographic, customer, firmographic, geographic, or other relevant data resources. Further, data quality and data-driven processes are not static and must be handled in real time. For this reason, active data quality (data quality automation) as a routine business operation is essential to any AI-enabled business application. This supports generating and applying active rules to address issues emerging from data profiling to cleanse, integrate, harmonize and enrich data referenced by your AI application. All these factors point to the need to develop AI-enabled applications within active data quality environments, as a means to fuel better business insights and hallucination-free outcomes.

In the medications example, accurate, metadata-rich medication data is required and is referenced by the system. Clean reference data can be applied in multiple steps within an AI workflow:

First, upstream business data profiling, cleansing, and enrichment ensure the availability of accurate and consistent dosing and routing of administration information.
Next, this data may be applied as a complement to observable supervised or unsupervised training, as the AI model is informed by prompt and result engineering. Missing or incorrect dose or route of administration content will be added or corrected.
Finally, AI outcomes can be informed and corrected by content retrieved from clean reference data in automated ways by applying retrieval augmented generation methods (RAG), or with observable supervision using knowledge graph-based GraphRAG methods.

These methods can identify and flag or correct any content or result that doesn’t meet expected contents or relationships – a record or recommendation referencing a 15-milligram pill would be flagged or corrected.

Train your AI application with observable, expert semantic supervision

Next, comparing outcomes with expected authoritative content and relationships (richly labeled reference and semantic data) is a critical workflow step. Observability and provenance are particularly important during the AI application development stage and remain critical for governance throughout the application lifespan.

By combining high-quality training and reference datasets with semantically aligned ontological graphs, application engineers and data scientists can effectively review identified issues. Machine reasoning (or semantic inference) can apply semantic content and related data quality rules informed by experts, such as those provided by National Center for Biomedical Ontologies (NCBO) in the medications example. These resources can facilitate supervised learning, for instance, through visually supported retrieval augmented generation (GraphRAG).

This creates an environment for informed and observable supervised training that supports the creation and application of existing or new business rules to ensure accurate outcomes. By training the AI application in real time, potential errors can be inferred, flagged, and corrected.

Automate supervision, retrieval and augmentation/correction to advance AI at scale

Most of today’s AI systems are developed with humans supervising the results. Business scale applications must automate the ability to check outputs and verify that they meet expected data quality and semantic meaning. For production, well-labeled reference data and authoritative semantic resources are deployed to automate the application of semantic entailments (data enrichment or correction grounded in ontological reasoning). Based on authoritative sources for retrieval of reference data and logic, rules and reasoning can be employed and applied at scale to augment, assess, and correct the generation of AI outcomes. While unknown issues may always be flagged for human supervision, most issues can be addressed in automated ways through the application of rules, expert ontologies, and high-quality data. Gold standard data referenced previously complements training and automated downstream supervision by comparing outcomes with expected reference data patterns.

While medical diagnoses and prescriptions may always need human supervision, we can ensure accuracy with all our mission-critical AI applications by applying clean, well-labeled data and meaningful augmentation.

Hallucination-proofing AI applications requires the use of tools and resources that support empirical accuracy. To avoid getting it wrong, anchor your AI projects in gold standard reference data for training, clean and curated business data, and active data quality processes with observable and semantically informed results supervision. Together, these methods provide a fundamentally required foundation for meaningful, observable, and automated creation, testing, and correction of AI outcomes.

References

Pequeno, Anthony. Google’s Gemini Controversy Explained: AI Model Criticized By Musk And Others Over Alleged Bias. Forbes. Feb 26, 2024.

Hedden, Steve. How to Build a Graph RAG App: Using knowledge graphs and AI to retrieve, filter, and summarize medical journal articles. Towards Data Science. Dec 30, 2024.

1 thought on “Data quality for unbiased results: Preventing AI-induced hallucinations”

Dan August 4, 2025 at 8:00 am at 8:00 am


It’s always helpful to have an architectural diagram for this sort of thing. There could be many different architectures, but including one and labeling the type of technology involved is useful. What methods and standards are used for making machine readable metadata and ontologies?