
The relatively recent capacity for front-end users to interface with backend systems, documents, and other content via natural language prompts is producing several notable effects on enterprise content management.
Firstly, it reduces the skills needed to engage with such systems, democratizing their use and the advantages organizations derive from them. Natural language interfacing also enables knowledge workers to boost their productivity, accelerate the time required to complete tasks, and increase the throughput of mission-critical workflows.
More importantly, the widespread incorporation of generative language models for ECM use cases engenders the critical byproduct of making enterprise content itself more meaningful—and potentially profitable—to the mission-critical applications that depend on it.
Models such as Open AI’s GPT-4o are influencing everything from metadata extraction to semantic search, summarization, and synthesis of content. Their capabilities are redefining the way these processes work while supporting newfound possibilities that were virtually unthinkable a few short years ago.
Or, as Alex Wong, Senior Product Marketing at Laserfiche, termed it, “It’s really revolutionary from what could previously be done.”
Automated metadata extraction
Prior to the influx of language models and other machine learning techniques, the metadata extraction process was predominantly manual for ECM workflows. For any given application (such as processing invoices), users would have to ingest the metadata based on the documents themselves. Thus, if there were invoices from 100 different vendors, organizations would have to create approximately the same number of templates for obtaining their metadata because “each vendor’s invoice looks different,” Wong commented. “The date may be on the top left and not the top right. The address might be on the bottom right and not the top left. There’s so much variation, like snowflakes.”
However, by relying on language models to read through the content of invoices, contextualize it according to user-defined metadata (stipulated in natural language) and input that metadata into the correct fields, the extraction process is no longer predicated on respective templates.
Instead, it’s based on the metadata itself, regardless of where it appears in the invoices or in any other type of content. Thus, instead of creating 100 templates, organizations now have to make only one.
Pairing OCR with GPT
The marked decrease in effort, time, and templates required to uniformly extract metadata based on natural language specifications is partly attributed to Optical Character Recognition (OCR). This utility extends to Intelligent Character Recognition (ICR), which operates like OCR for handwriting. The approach Wong described is based on organizations scanning their documents into an OCR or ICR engine that transcribes the content, which is then searchable. According to Wong, organizations “just say, in natural language, what metadata you want.”
For purchase orders, organizations might specify the name of the purchaser, to whom the order is shipping, the requested item and line item details, and other particulars. This information, along with the OCR or ICR transcriptions, is sent to the language model, which then extracts the metadata based on the former. “Our enterprise integration with OpenAI takes all that data, puts it together, and gives it back to us in a structured format in the template,” Wong remarked.
There are other downstream benefits of this approach, too. According to Catie Disabato, Senior Communications Manager at Laserfiche, what the model does is “structure it further into the metadata template, which makes it more searchable, but also enables analysis, reporting, and you can do workloads off of it as well.”
Document intelligence
The document analysis capabilities of language models such as GPT-4o are equally viable for informing ECM use cases. In addition to facilitating natural language search, such models are adept at reading through the content of documents to perform a multiplicity of functions, such as “summarizations, answer questions, give insights, and synthesize between documents,” Wong indicated. Users can also compare the information between documents to understand points of distinction and similarity. These features are invaluable for expanding the search idiom to include capabilities that would previously necessitate substantial human effort and were difficult to scale.
For example, “If you’re looking at a folder of invoices, you can ask which ones are due on the first, and it will help you find what you’re looking for,” Wong mentioned. Moreover, users can prompt models to perform these tasks in natural language, making these capabilities accessible to those who might not otherwise be adept at writing code for traditional queries. Once users stipulate in natural language what information they’re looking for, “Laserfiche is taking the text and processing it in a way that is easy for the AI to understand, and it will get sent to OpenAI to complete the task,” Wong noted.
Positive feedback
The ability for users to interface with AI models and backend content management systems via natural language creates a pair of palatable outcomes. It lowers the technological barriers required to interface with these resources and expands what can be done with the underlying content. Organizations can achieve more with their enterprise content, which is arguably the point of storing, processing, and acting on it.