Home » Uncategorized

How to build a search engine: Part 1

In this multi-part series, we will explore how to build a search engine. It will be quite powerful and industrial strength. The first part will focus on getting the right tools and getting technology stack ready. We will build this search engine with an AngularJS front-end and use elasticsearch as the computation back end.

This post is the first part of the multi-part series on how to build a search engine –

Just a sneak peek into how the final output is going to look like –

how to build a search engine - screenshot_es

In case you are in a hurry you can find the full code for the project at my Github Page

Most applications of today are data driven. Single Page Applications (SPA) are gaining a lot of traction because of their simplicity and ability to act as a graceful front end to gigabytes of back end data.

Features

Search engines, especially Google has evolved to be extremely intuitive over the past two decades. What we will attempt to do is build a full-text search engine that would be a really quick information retrieval system if you are sitting on lots of data. Some of the features we would be looking to incorporate is –

  • Fuzzy Search
  • Subset Pattern Matching
  • Auto-complete suggestions
  • Scoring Algorithm
  • Further Sorting and filtering the results

How to build a search engine - angularjs

AngularJS

AngularJS is a front end HTML framework developed by Google. They have made it open-source. It lets you build lightweight dynamic webapps. It is a simple javascript library that you have to import. That is enough to get you started. You do not need to explicitly install anything for this.

How to build a search engine - elasticsearch

Elasticsearch

Elasticsearch is a REST interface on top of Apache Lucene. It is simple and blazingly fast. It offers all the CRUD operations that would easily help in indexing and retrieving data. More about elasticsearch later. Installing elasticsearch requires Java 8. It is recommended that Oracle JDK version 1.8.0_73 be used. You can head over to Oracle’s website to check install Java 8 for your operating system.

Once Java is set up, you are ready to install elasticsearch. Download elasticsearch binaries from here.

After downloading, just extract the contents in a folder of your choice.

Open a command prompt window and navigate to this folder. Go further in the bin folder in cmd and type elasticsearch and hit enter. If there are no errors, head over to your browser and hit the below URL –

http://localhost:9200

{   "name" : "Hitman",   "cluster_name" : "elasticsearch",   "version" : {     "number" : "2.3.2",     "build_hash" : "b9e4a6acad4008027e4038f6abed7f7dba346f94",     "build_timestamp" : "2016-04-21T16:03:47Z",     "build_snapshot" : false,     "lucene_version" : "5.5.0"   },   "tagline" : "You Know, for Search" }  

A response similar to this will be shown. Any errors will either be system specific or platform specific. Drop a note on the comments and I’d love to take a look.

How to build a search engine - python

Python

I mostly love using the Anaconda distribution of python. Give it a chance if you haven’t already and it will work wonders. The python component will not be present anywhere in the live setup, we would just be using it to make a few configurations on elasticsearch as it has a very nice elasticsearch api. Once the distribution is installed, head over to the command prompt and type

pip install elasticsearch

how to build a search engine - tomcat

Apache Tomcat

Finally we will need a web server. Any web server would do. We will use Apache tomcat here. This is a web server that we can install in our machines. It lets us host our front-end that we can access through a browser.

Download Tomcat and unzip it to any folder location.

These are all that you need to do to complete the installation and setup.
Finally, congratulations! You have taken the first step towards knowing how to build a search engine. Going forward we will directly look at elasticsearch configurations and the angular UI to go with it.

Originally posted here