Subscribe to DSC Newsletter

Have you ever tried a website’s keyword search and been unsatisfied with the accuracy of the results? Do you find yourself feeling frustrated and leaving when the search doesn’t return what you’re looking for or giving results but not in the order resulting in customer not able to find out? Even worse – do you find yourself just assuming what you’re looking for must not exist on that site – only to find the item on that exact same site through other channels like google search or ads?

If so, you’ve just experienced bad search relevancy. It’s something we all experience daily — a frustration for users and lost opportunity for the sites attempting to serve us.

As you might have recognized by now, this search relevancy is very important in retention of customers as well as providing good user experience to the customers.

A real time problem

Suppose you have an ecommerce website having millions of products. More often than not you are likely to use a text based search engines like Solr or Elasticsearch(I will not get into the merits of using Solr or elasticsearch in this article). For simplicity sake we assume that we have 4 fields of a product. Say name, category, brand and description and we are using name, category and brand fields for searching.

Suppose we have 2 entries in product

Name

Category

Brand

Description

IPhone

Mobile

Apple

16 GB etc

Iphone Cover

Accessories

SomeBrand

1 mm thickness

 

Say our backend converts the query as below.

Search text : IPhone

Solr Query : name:iphone OR brand:iphone OR category : Iphone

Common logic says that when user searches iphone his first result should be iphone from mobile category and brand should be Apple.  And then iphone cover in accessories category.

Search engines, like Solr or Elasticsearch, are simply sophisticated text matching systems. These tools can tell you when the search word matches a word in the document but they aren’t nearly as smart as human. Once a match is determined a search engine can use statistics about the relative frequency of that word to give a search result a relevancy score. This relevancy score is used to show the order of the results. In the above case keyword iphone does not match any of the entries in category, brand or description. In name field for each of the entry there is one match. Hence its possible that Iphone Cover will come first then the iphone itself.

Solution:

When a user searches for IPhone he means(more often than not) iphone mobile. His search should translate into

         Category: Mobile    

         Brand : Apple

         Name : Iphone


We will use synonyms analyzer in Solr to solve this relevancy problem.

In Schema.xml we create a field called brand with fieldType as text_synonyms_brand .

<field name="brand" type="text_synonyms_brand" indexed="true" stored="true" default="" />

And we create a fieldType like this

<fieldType name="text_synonyms_brand" class="solr.TextField" positionIncrementGap="100">

<analyzer type="query">

 <tokenizer class="solr.KeywordTokenizerFactory"/>

 <filter class="solr.LowerCaseFilterFactory"/>

 <filter class="solr.WordDelimiterFilterFactory"

generateWordParts="0" generateNumberParts="0" splitOnNumerics="0"                                                                                                           catenateWords="0" catenateNumbers="0" catenateAll="1"

             stemEnglishPossessive="0" splitOnCaseChange="0"/>

             <filter class="solr.SynonymFilterFactory" synonyms="synonyms_brand.txt" ignoreCase="true"                               expand="true"  tokenizerFactory="solr.KeywordTokenizerFactory"/>

 </analyzer>

 </fieldType>

Look at the synonyms which points to synonyms_brand.txt . Now here you can make an entry as

Iphone=>Apple

Similarly create another filed called category with fieldType as text_synonyms_category and create a corresponding fieldType as text_synonyms_category. In the synonyms of this field point to synomyms_category.txt. In synonyms_category.txt make an entry

Iphone=>Mobiles

How does it works ?

When the user searches for Iphone the backend converts it into solr query  

Name:iphone OR brand:Iphone OR Category: Iphone

With our synomyms it will be parsed at the query time to

Name:Iphone brand:Apple Category:Mobiles

will give you more relevant result.

Other ways for relevant search in Solr :

Query Time Boosts: This is a very useful method to give more scores to a particular field in comparison to matches in other fields. Say if the keyword matches in brand it will carry more weightage in result than a match in name. In Solr we have something like “^”.

Phrase Matching: Boosting(giving more score) on the appearance of the entire string.

 

Views: 1468

Tags: Relevant, Solr, search

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Chintan Donda on November 26, 2015 at 1:03am

Informative article !!

Thanks for sharing.

Comment by Andrew on November 25, 2015 at 11:11pm

have been working on solr for sometime now ... never thought of this way in increasing the relevancy of results...might have to look into performance aspect as you have described the synonym at query time ....anyway i liked this approach ..... 

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service