Subscribe to DSC Newsletter

Smart tags process: an algorithm for efficiently extracting useful information from a piece of text and storing it in a retrieval system.

The knowledge is extracted by asking the reader to answers a certain number of questions. Every time the answers to a question is yes, specifics tags are collected and stored. Every time the answer to a question is no, specific tags are also collected and stored. Some question ask the user to select in a list. In this case, all the elements selected in the list are stored. The reader does not have to answer all the question one by one. Instead of this, a first set of questions are asked. Based on the answers to the first set c question, a second set of questions is asked. Based on the answers to the first and second question a third set of question is asked. The process continue until no more question is triggered by previous answers. Filtering questions are used to ask only relevant questions. For example a filtering question is:
Does the text talk about financing issues? If the answer is no, then all questions on financing issues are automatically skipped. 
The one who develops c set of questions with their tags and triggering conditions has developed a guideline for efficiently extracting useful information for a given domain from pieces of text. People of different expertise can develop question sequences independently and these question sequences will be combined in one. We end up with an expert system combining expertise from different people on which information we should be looking for in a piece of text.

Knowledge retrieval:

We use the tags to retrieve coded knowledge. For example, if want to retrieve all piece of text talking about financing issues in country c, I just need to search the tags corresponding to the positive answer to the question: “does the text talk about financing issues?” and the tags corresponding to the selection of country c for the question: Please check the countries the text is talking about” I know that any text with these tags talks about country c and talks about financing issues.

The question are organized as follows: Questions are organized in batches. Batches are ordered. Each question has:

  • The text of the question
  • The conditions under which the question should be asked. Example: if term 1 and term 2 are in the tags pools but not term 3, then ask the question 1.
  • The tags to be added to the pool in case the answer to the question is yes.
  • The tags to be added to the pool in case the answer to the question is no.
  • Additional fields useful for maintenance of the questions set.

The process goes as follows:

1. Create an empty pool of tags
2. Go through the next unprocessed batch of questions and select those who meet condition to be asked (questions are asked based on the presence or absence of terms in the tags poll). For each of those questions:
1.1. Ask the question.
1.1.1. If the answer is yes add to the tags pool the tags corresponding to a positive answer to the question.
1.1.2. If the answer is yes add the tags corresponding to c positive answer to the question
1.2. Have all the questions meeting condition to be askes in the batch 1 ben asked?
1.2.1. If the answer is yes, then: If there are still unprocessed batches, move to the next batch. If it is the last batch end the process.
1.2.2. If the answer is no, go back to step 2 and ask all the questions meeting the conditions for being asked.

Views: 422


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service