# Tutorial to Elasticsearch indexation
## Part 1: Intro to Elasticsearch

Welcome to the Elasticsearch indexation Tutorial!

By the end of this workshop, you will be able to:

- [understand the basics of Elasticsearch](#Understanding-the-basics-of-Elasticsearch)
- get a high level understanding of the architecture of Elasticsearch
- perform basic CRUD (Create, Read, Update, and Delete) operations with Elasticsearch


### Understanding the basics of Elasticsearch

#### `What is ElasticSearch?`

When people ask, *“what is Elasticsearch?”,* some may answer that:
*  It’s “an index”, 
*  A "search engine”, 
*  An “analytics database”, 
*  A "big data solution”, 
*  that “it’s fast and scalable”,
*  or that “it’s kind of like Google”. 

- **Elasticsearch is simple to configure, has incredible flexibility, and is an excellent tool for complex searches**. Let's take a closer look. 
> Depending on your level of familiarity with this technology, these answers may either bring you closer to an ah-ha moment or further confuse you. But the truth is, all of these answers are correct and that’s part of the appeal of Elasticsearch.

![elk](https://github.com/exajobs/elasticsearch-collection/blob/main/img/ELK.png)

- [Elasticsearch](https://blog.avenuecode.com/elasticsearch) is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java. . It was developed in Java and is designed to operate in real time. It can search and index document files in diverse formats. It was designed to be used in distributed environments by providing flexibility and scalability. Now, Elasticsearch is a widely popular enterprise search engine. Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. 

##### `How does it work?`

To help understand how Elasticsearch handles data, we can make an analogy to a database.

- Elasticsearch stores the data using the "schema-less" concept. This means that it is not necessary to define the structure of the data that will be entered in advance, as happens with relational databases known in the market: Oracle, MySQL, and SQLServer, among others.

In our analogy of traditional relational databases, the structure of the data used by [Elasticsearch](https://logz.io/blog/10-elasticsearch-concepts/) would be:

![analogy](https://github.com/exajobs/elasticsearch-collection/blob/main/img/4.png)

- **Index:** - Indices, the largest unit of data in Elasticsearch, are logical partitions of documents and can be compared to a database in the world of relational databases. More on [indices](#indices)
![analogy](https://github.com/exajobs/elasticsearch-collection/blob/main/img/3.png)

- **Type:** - A type in Elasticsearch represents a class of similar documents. A type consists of a name—such as user or blog post—and a mapping.
- **Documents:** - A document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values.
- **Fields:** - Are columns in Elasticsearch.

![shards](https://github.com/exajobs/elasticsearch-collection/blob/main/img/2.png)

- **Cluster:** - A cluster is a collection of one or more servers that together hold entire data and give federated indexing and search capabilities across all servers. For relational databases, the node is DB Instance. There can be N nodes with the same cluster name.
- **Node:** - A node is a single server that holds some data and participates in the cluster’s indexing and querying. A node can be configured to join a specific cluster by the particular cluster name. A single cluster can have as many nodes as we want. A node is simply one Elasticsearch instance.
- **Shard** - A shard is a subset of documents of an index. An index can be divided into many shards.
- **Replica Shard:** - The main purpose of replicas is for failover: if the node holding a primary shard dies, a replica is promoted to the role of primary; replica shard is the copy of primary shard and serves to prevent data loss in case of hardware failure.


#### `Table of contents`
- [Elasticsearch Introduction](#what-is-elastic-search)
- Elasticsearch Architecture
  - [Indices](#indices)
  - Types
  - Documents
  - Fields
  - Cluster
  - Shard
  - Replica Shards
- [Elasticsearch Queries](#elasticsearch-queries)
- APIs
- [Elastic Stack](#elastic-stack)  
  -  Kibana
  -  Beats
  -  Logstash
- Books
- Certifications
- Elasticsearch developer tools and utilities
- Elasticsearch Use cases


#### `Elastic Architecture`

##### `Indices`
Indices, the largest unit of data in Elasticsearch, are logical partitions of documents and can be compared to a database in the world of relational databases.

Continuing our e-commerce app example, you could have one index containing all of the data related to the products and another with all of the data related to the customers.
You can have as many indices defined in Elasticsearch as you want. These in turn will hold documents that are unique to each index. Indices are identified by lowercase names that refer to actions that are performed actions (such as searching and deleting) on the documents that are inside each index.
For a list of best practices in handling indices, check out the blog Managing an Elasticsearch Index. Another key element to getting how Elasticsearch’s indices work is to get a handle on shards.

 - [Best Practices for Managing Elasticsearch Indices](https://logz.io/blog/managing-elasticsearch-indices/) - Understanding indices




#### APIs


####  Elasticsearch Queries
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:

![queries](https://github.com/exajobs/elasticsearch-collection/blob/main/img/5-queries.png)




In [1]:
from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")
print(es.info().body)

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
{'name': 'PORT-RECH01', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'vGCaiZbkSg2xEDTmljOIRQ', 'version': {'number': '8.13.0', 'build_flavor': 'default', 'build_type': 'deb', 'build_hash': '09df99393193b2c53d92899662a8b8b3c55b45cd', 'build_date': '2024-03-22T03:35:46.757803203Z', 'build_snapshot': False, 'lucene_version': '9.10.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


In [2]:
print(es.cat.indices(index="*", s='index', pri=True, v=True))

health status index                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open   dicotopo__development__places      iQFTZk_rQlGR07i_T3yABA   1   1    1207647        66305    247.8mb        247.8mb      247.8mb
yellow open   encpos_document                    M0U4tb5EQgOvTySt8YSo1w   1   1       2996            0     71.6mb         71.6mb       71.6mb
yellow open   lettres__development__collections  mpJH2xwtTjyHPghL8XgFog   1   1          4            0      7.3kb          7.3kb        7.3kb
yellow open   lettres__development__documents    ZYsbdt7CSM-B3DwgCh24VA   1   1      12820            4     43.5mb         43.5mb       43.5mb
yellow open   lettres__development__institutions Njmoh7qTTTSjxQsGf_0Uzw   1   1          3            0      6.1kb          6.1kb        6.1kb
yellow open   lettres__development__languages    0bhHX_gMSLmnolm1HiuCTA   1   1          3            0      6.5kb          6.5kb        6.5kb

### Test the basics of Elasticsearch