Home » Uncategorized

Can we boost the confidence scores of LLM answers with the help of knowledge graphs?

Can we boost the confidence scores of LLM answers with the help of knowledge graphs?
Image by Markus Christ from Pixabay

Irene Politkoff, Founder and Chief Product Evangelist at semantic modeling tools provider TopQuadrant, posted this description of the large language model (LLM) ChatGPT:

“ChatGPT doesn’t access a database of facts to answer your questions. Instead, its responses are based on patterns that it saw in the training data. So ChatGPT is not always trustworthy.”

Georgetown University computer science professor Cal Newport put his own assessment this way during an interview with David Epstein, author of Range:

“Unlike the human brain, these large language models don’t start with conceptual models that they then describe with language. They are instead autoregressive word guessers. You give it some text and it outputs guesses at what word comes next.”

Newport underscores the absence of conceptual models as one reason why LLMs don’t reliably provide good answers to questions. Politkoff points out, “Representing mental models and knowledge in general is what knowledge graphs (KGs) excel at.” She argues that KGs and LLMs can work well together. 

Not only can KGs provide the mental models and the facts, but LLMs, she points out, can help generate KGs in a particular format you specify based on the text you feed it.  An example of a prompt she mentioned: “Generate an RDF (Resource Description Format, or semantic standard-based subject/verb/object triples) rendition of the following using Turtle notation,” followed by the text you need to be in .ttl format.

Using knowledge graphs with LLMs: Some representative research findings

Researchers in recent years have been evaluating how knowledge graphs might work best with LLMs. Some projects have used knowledge graphs to inform or augment LLMs. Others have used LLMs to generate input for knowledge graphs.


Google is of course known for coining the term knowledge graph in 2012 after it acquired Freebase in 2010. The company has been a leader in encouraging the use of standard schemas and specific kinds of structured data on the web in order to facilitate data curation, reuse, and discoverability.

Research Tech Leader and Manager Enrique Alfonseca presented his team’s findings on Using Knowledge Graph Data in Large Language Models at the Swiss Analytics Text Conference in 2022. The team evaluated two types of approaches to using knowledge graphs with LLMs: Internal and external. 

Alfonseca referred to the internal approach as “knowledge infusion.” After trying several approaches to infuse knowledge into the learning of the LLM,  Alfonseca, and team decided to try just stringing together and “dumping in” the same kinds of RDF triples Irene Politkoff referred to above for training purposes. That approach achieved infusion results that were just as good.

An external approach the team tried was querying and retrieving from the knowledge graph directly. “Structured representation works on par with natural language,” Alfonseca mentioned the need for multi-hop or chain reasoning, in which an accurate answer hinges on bringing together facts from several different places and deriving meaning that’s larger than the sum of those facts.


Optum, a US subsidiary of UnitedHealth Group, is an integrated payer/provider healthcare company that generated over $182 billion in revenues in 2022. Optum accounted for 50 percent of UHG’s 2022 earnings, up from 44 percent in 2017.

Kunal Suri and team of Optum in India intended to demonstrate that large language models’ “ability to learn relationships among different entities makes knowledge graphs redundant in many applications.” Their 2023 paper “Language Models Sound the Death Knell of Knowledge Graphs” describes the use of high dimensional vector representation (BioBERT word embeddings) to identify and extract synonyms for terms in the SNOWMED medical classification system. 

After identification and synonym extraction, then the team examined cosine distance similarity of the word embeddings using KMeans clustering. The clusters each demonstrated effective centering around the same core concept listed in the SNOWMED system. 

Thoughts on the research reviewed

It’s not clear to me that the Optum research summarized above actually proves what the authors of the paper claimed it did. Specifically, knowledge graphs aren’t just about standalone concepts and how they’re associated with a domain. KGs enable articulation at various tiers of abstraction–not just the domain of interest. And there’s a stateful, continual record of how connections originate, evolve, and proliferate.

Clearly, vector representations and databases have significant utility. But does that mean larger data management environments should be oriented around vector representations? Vectorization seems to me to have additive value. It’s not a replacement for semantic graph KGs.

As for the Google research, I felt it more or less demonstrated what I’d understood intuitively before I read the research. But I am surprised there is not more research along these lines than what I was able to uncover in my spare time on a weekend.