Simple News Search Engine

Yulia Nudelman
2 min readDec 18, 2020

--

How to build a simple search engine dashboard with Docker, Elasticsearch, and Plotly.

In this article, I will build a full-text search functionality that allows finding relevant articles by searching for a specific word or phrase across thousands of news articles.

Prerequisites

  • Docker
  • Elasticsearch
  • Plotly

Docker

Docker is a platform that packages an application and all its dependencies together in a container.

Docker is like magic ✨ in the box.

Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine, one of the open-source products from Elastic. It is a schema-free, document-oriented data store.

Elasticsearch is an awesome search engine for performing a fast ⚡️ search in the text.

Install Docker Desktop on Windows

I downloaded Docker Desktop for Windows from here. Warning ⚠️, system requirements are:

  • Windows 10 64-bit: Pro, Enterprise, or Education.
  • Hyper-V and Containers Windows Features must be enabled.

Install Elasticsearch with Docker

To install Elasticsearch as a Docker image, open the command line and run docker pull command:

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.1

To start a single-node Elasticsearch cluster for development or testing, run docker run command:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.1

For more details, visit here.

Search Engine

I use Pythonrequests library to make Elasticsearch REST API requests.

Load Data

During one month, I have daily scraped news articles in Hebrew from several popular Israeli news sites. For each article, I collected the next information:

  • url
  • title
  • description
  • author
  • publishedAt
  • source

You can download the dataset from here.

Creating an index

Before starting with searching, we need to load all news articles to Elasticsearch. For this, we create a new index news with mapping.

An index is a container that stores and manages related documents. We can create index with mapping definitions that defines how Elasticsearch should process a document.

Use PUT method to create an index news:

Indexing a document

By indexing a document, we insert data in an news index. Elasticsearch is a document-oriented database, which means it stores the entire JSON object as a document.

A document is key/value pairs of JSON objects.

Use a POST method to add articles to the “news” index:

After creating the index with mapping and indexing documents, we are ready for search and query.

Search and Query

Use a GET method to search and query:

  • count matched results
  • query single term
  • query multiple terms
  • wildcard query

Dashboard with Plotly

Plotly.py is a graphic library for interactive graphs. With Plotly, we can easily add pie and bar charts to our dashboard.

You can find the full project here.

Enjoy 😀.

--

--

No responses yet