Deep Learning can be used to automate just about every repetitive task that is currently or formerly performed by humans. Factory robots, autonomous cars, Internet of Things are example of these automations. Yet, mentally challenging tasks such as conducting research or strategic planning with natural language textual documents remain a daunting task for automation. We look into the root cause of this challenge and have implemented a solution to automate these tasks with a new breed of Natural Language Understand (NLU) technology.
Over the past decades, Internet search engines give a fast and easy solution for people to search for information with search keys. By keying in the words in the search bar, hundreds if not thousands of search results are returned. However, being able to access these resulting pages does not translate into digesting the search-results. This is because the user still has to read these results manually to determine if the context is relevant. Worse yet when the search words are not obvious or available on abstract subjects, for example, searching for “people with a beautiful life”, word search is rendered useless. In situations like this, one can turn to Natural Language Understanding (NLU) to search for relevance.
Applying NLU to resolve and combine subjects with contexts
Natural Language Processing (NLP) has been around for as long as there have been text analytics. Among the common techniques found in NLP are word counts, weight assignment, vectorization, stemming and lemmatization, or aliasing. Before the inclusion of Artificial Intelligence (AI), NLP made little attempt to get the semantics or understanding of underlying context. Natural Language Understanding, as a subtopic of NLP in AI, presents more challenges to traditional NLP. Before we dive deep into NLU, we’d like to put some context into its application.
Our approach to NLU processing deals with textual natural language input and automated processing of input based on the context and semantics conveyed by the input text. This process automatically discovers subjects of interest from the input text and connect subjects with corresponding contexts and draw relationships between subjects using context graph. A context graph is consisting of subjects as nodes, and contexts as leaves. Context graphs enable one to instantly connect to the contexts on any subject discussed in a corpus without reading through the entire document. The following is a brief description of procedure:
- For the purpose of discussion, Figure 1 shows two given documents. Each document is connected to subject-nodes (brown circles labeled with subject name). Relevant subjects are shown as nodes. Relevant contexts are shown as leaves (yellow box labeled with “Ctx”). Subject-nodes and context-leaves are artifacts automatically discovered from documents by our AI algorithm, namely, Context Discriminant. “Document 1” is selected as the key document. “Document 2” is to be used as a reference document. In practice, we would be dealing with one or more reference documents.
- NLU process begins by discovering relevant subjects and contexts from key document using Context Discriminant as noted above.
- Context referenced by relevant subjects are connected to form a context graph.
- Relevancy search is then conducted on reference documents by comparing artifacts from key documents with artifacts discovered in reference document.
- Key documents are then combined with relevant documents to form a new dataset.
- Context Discriminant is applied to the new dataset. Resulting artifacts form the basis of a new context graph.
- Finally, insights from documents are obtained by traversing subjects and contexts in the context graph.
In our example, “Document 1” is an investment instruction and “Document 2” is an investment policy. We started with an investment instruction (Document 1) to find key subjects. We then used these key subjects to search for reference document (Document 2) to determine similarity. Upon confirming the relevancy between the two documents, we combined both documents to create the resulting context graph. Traversing down either the “Investment” subject or “Policy” subject gives us the context for the instance context for the investment instruction and the investment policy details. Being able to automatically generate these context graphs from textual data enable analysts to focus on the subjects of interest.
Figure 1 – Showing “Document 1” and “Document 2” with relevant subjects and contexts
Figure 2 – Showing document set consisting of Document 1 & 2 relevant subjects and contexts
Natural Language Understanding is an important logical sequence to word search. It offers a more meaningful way for us to use AI in NLP to search for relevant context and subjects. At SiteFocus, we have successfully applied this technique in the implementation of automated context discovery and subject discovery on large textual data repositories with the CIF platform.