Simple News Search Engine
How to build a simple search engine dashboard with Docker, Elasticsearch, and Plotly.
In this article, I will build a full-text search functionality that allows finding relevant articles by searching for a specific word or phrase across thousands of news articles.
Prerequisites
- Docker
- Elasticsearch
- Plotly
Docker
Docker is a platform that packages an application and all its dependencies together in a container.
Docker is like magic ✨ in the box.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine, one of the open-source products from Elastic. It is a schema-free, document-oriented data store.
Elasticsearch is an awesome search engine for performing a fast ⚡️ search in the text.
Install Docker Desktop on Windows
I downloaded Docker Desktop for Windows from here. Warning ⚠️, system requirements are:
- Windows 10 64-bit: Pro, Enterprise, or Education.
- Hyper-V and Containers Windows Features must be enabled.
Install Elasticsearch with Docker
To install Elasticsearch as a Docker image, open the command line and run docker pull
command:
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.1
To start a single-node Elasticsearch cluster for development or testing, run docker run
command:
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.1
For more details, visit here.
Search Engine
I use Pythonrequests
library to make Elasticsearch REST API requests.
Load Data
During one month, I have daily scraped news articles in Hebrew from several popular Israeli news sites. For each article, I collected the next information:
- url
- title
- description
- author
- publishedAt
- source
You can download the dataset from here.
Creating an index
Before starting with searching, we need to load all news articles to Elasticsearch. For this, we create a new index news
with mapping.
An index is a container that stores and manages related documents. We can create index with mapping definitions that defines how Elasticsearch should process a document.
Use PUT method to create an index news
:
Indexing a document
By indexing a document, we insert data in an news
index. Elasticsearch is a document-oriented database, which means it stores the entire JSON object as a document.
A document is key/value pairs of JSON objects.
Use a POST method to add articles to the “news” index:
After creating the index with mapping and indexing documents, we are ready for search and query.
Search and Query
Use a GET method to search and query:
- count matched results
- query single term
- query multiple terms
- range query
- wildcard query
- aggregate query
- highlight query
- suggest query
Dashboard with Plotly
Plotly.py is a graphic library for interactive graphs. With Plotly, we can easily add pie and bar charts to our dashboard.
You can find the full project here.
Enjoy 😀.