Log Analysis with Elasticsearch and Kibana

Overview

From time to time, I have had to review and analyze various logs. Over time, log file size may reach >100 MB, which is problematic for analysis. Previously, I used Notepad++ and Autogrep for searching and querying logs. Sometimes, I created custom parsers in Python when a deep analysis was required. Then, I used Excel for displaying results. Creating custom solutions can take too much time and cause headaches. Recently, I learned about the Elasticsearch platform. Immediately, I recognized it was an excellent choice for this kind of analysis. In this blog, I will describe the minimum steps required to create some basic researching and analysis for a log which contains JSON entries. I will not cover advanced concepts of Elasticsearch such as working with Aggregations and Analyzers. This article is an introduction to a great tool. So let’s start!

Goals

Set up and run Elasticsearch
Generate a log sample for analysis with Elasticsearch
Create an index and mapping in Elasticsearch
Upload data to Elasticsearch
Set up Kibana and use it for analysis

Prerequisites

The following apps need to be installed before starting:

Docker app with Engine which is no older than version 18.09.1 (This is my local version where I run all apps).
Python3
Curl app (it may be specific to your platform, check here). This is not required; you are free to use any application that generates rest-api requests like Postman, for example.
A terminal or console for running apps from the command line

What Is Elasticsearch?

Elasticsearch is a search engine which has a lot of capabilities for data analysis and analytics. It also can be used as a NoSQL database. The Elasticsearch (ES) uses Apache Lucene for searching and is written in Java. You can find more details on the official ES web page here.

Set Up Elasticsearch with Docker

It would be redundant to set up ES from packages and then make additional configurations to run it. The easiest and fastest way to set up Elasticsearch with Docker is to use an ES Docker image. It is available in the Docker hub here. Run the following commands to download and run Elasticsearch:

docker pull docker.elastic.co/elasticsearch/elasticsearch:6.7.1<br /><br />
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.7.1

Note: The latest version of ES is 7.0.0, but for my examples in this blog, I’m using version 6.7.1. This is the previous version and I’m using it because it seems to be more stable. My experience has shown me that the latest version always has some unpredictable problems.
To check if Elasticsearch has started, run the following command:


curl -X GET localhost:9200/_cat/health

You should see something very similar to the following:


docker-cluster yellow 1 1 7 7 0 0 5 0 - 58.3%

You may ask, “What does the ‘yellow’ status” mean?” The answer is very simple. Elasticsearch is a cluster platform; it runs on several nodes. If it is run on one node, it does not make backup copies of existing indexes. When these backup copies are absent, then it reports a yellow status. If you are interested in learning more of the details, there is a good explanation in this article. Our Elasticsearch is running on a single node, so the “yellow” status (check the Docker command where the container starts) is OK and it is ready. Let’s define basic terms used by ES: index, mapping, and document. To understand this better, we can compare it with terms from a Relational Database. Let’s compare the following:

Index – This is the same thing as a table in the database. Regarding our case, this is the entire log we are going to save in ES.
Mapping – It looks like a schema definition, and it includes type definitions for all fields used by the index. Also, it may contain additional parameters which allows ES to store, ignore, or make indexing for a field of the index (a.k.a., table).
Document – This is a JSON object which is stored in ES. It looks like a row in the table. For our case, it will be just a single line from the log file.

Next, we’ll discuss getting a log file for analysis, pushing data to Elasticsearch, and setting up Kibana. Download the rest of my article to learn more.

Alex Kondratrov, Software Engineer PrizmDoc Suite

Aleksey Kondratov joined Accusoft in 2008 as a QA team leader. He worked on many Accusoft products including Barcode Xpress, forms processing applications, and other SDKs. Now Aleksey is a software engineer for PrizmDoc Suite. He likes working on new programming technologies like ML and Computer Vision. In his spare time, Aleksey enjoys landscape design and spends free time in his garden where he grows roses and other flowers. In addition, he likes to spend time with his children and travel in his car.