# Indexing

## Table Creation with Different Index Configurations

In [1]:
import pandas as pd
import json
import requests
import copy
import time

To demonstrate the different indexing options and mechanisms, that Pinot offers, we will create some tables with different index configurations, describe their key properties and compare query performance.
The index configurations are applied to the `tableIndexConfig`-section the table configuration.

All tables in the following examples are real-time tables and consume data from the same Kafka topic (`trips`). Therefore, all of them will contain the same data records. To ensure, that the consumption of the tables has finished before executing queries, we are monitoring the timestamp when streaming data was ingested the last time to the table. Parameter `minConsumingFreshnessTimeMs` describes when the last ingestion took place. We defined: if there hasn't been any new data consumption in the last 10 seconds, the before generated data records of the Kafka topic have been consumed by the table and there are enough data records to proceed.
(Remark: We are only checking the last updated time stamp for one specific table and assume, that there is no high time deviation for the different tables.)

XXX use helper func and adapt description

In [2]:
# Array to collect details of created tables
table_list = []

json_tableConfig = {
  "tableName": "variable_tableName",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "trip_start_time_millis",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "60",
    "schemaName": "trips",
    "replication": "1",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "simple",
      "stream.kafka.topic.name": "trips",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.zk.broker.url": "pinot-kafka-zookeeper:2181",
      "stream.kafka.broker.list": "pinot-kafka:9092",
      "realtime.segment.flush.threshold.time": "12h",
      "realtime.segment.flush.threshold.size": "20000",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
} 

# helper function
def createTable(newTable_name, index_text, tableconfig_json):
    # Input: Name of new table, index description, table configuration in json structure
    response = requests.post('http://pinot-controller.pinot:9000/tables', json=tableconfig_json)
    print(response)
    print(response.text)
    table_list.append([newTable_name, index_text])


execution_start_time = int(round(time.time() * 1000))

# Create a new table with default index for each column (no configuration required)
newTable_defaultIndex = copy.deepcopy(json_tableConfig)
newTable_defaultIndex["tableName"] = "trips_default_index"
createTable(newTable_defaultIndex["tableName"], 'Default Index (Dictionary-encoded forward index with bit compression) for each column', newTable_defaultIndex)

# Create a new table with raw value forward index
newTable_rawForwardIndex = copy.deepcopy(json_tableConfig)
newTable_rawForwardIndex["tableName"] = "trips_rawForwardIndex"
newTable_rawForwardIndex["tableIndexConfig"]["noDictionaryColumns"] = ["start_location"]
createTable(newTable_rawForwardIndex["tableName"], 'Raw value forward index on start_location', newTable_rawForwardIndex)

# Create a new table with sorted forward index with run-length encoding
newTable_sortedForwardIndex = copy.deepcopy(json_tableConfig)
newTable_sortedForwardIndex["tableName"] = "trips_sortedForwardIndex"
newTable_sortedForwardIndex["tableIndexConfig"]["sortedColumn"] = ["start_location"]
createTable(newTable_sortedForwardIndex["tableName"], 'Sorted forward index with run-length encoding on start location', newTable_sortedForwardIndex)

# Create a new table with bitmap inverted index
newTable_bitmapInvertedIndex = copy.deepcopy(json_tableConfig)
newTable_bitmapInvertedIndex["tableName"] = "trips_bitmapInvertedIndex_startLocation"
newTable_bitmapInvertedIndex["tableIndexConfig"]["invertedIndexColumns"] = ["start_location"]
createTable(newTable_bitmapInvertedIndex["tableName"], 'Bitmap inverted index on start_location', newTable_bitmapInvertedIndex)

# Create a new table with sorted inverted index
newTable_sortedInvertedIndex = copy.deepcopy(json_tableConfig)
newTable_sortedInvertedIndex["tableName"] = "trips_sortedInvertedIndex_startLocation"
newTable_sortedInvertedIndex["tableIndexConfig"]["invertedIndexColumns"] = ["start_location"]
newTable_sortedInvertedIndex["tableIndexConfig"]["sortedColumn"] = ["start_location"]
createTable(newTable_sortedInvertedIndex["tableName"], 'Sorted inverted index on start_location', newTable_sortedInvertedIndex)

# Create a new table with star tree index
newTable_starTree = copy.deepcopy(json_tableConfig)
newTable_starTree["tableName"] = "trips_starTreeIndex"
newTable_starTree["tableIndexConfig"]["starTreeIndexConfigs"] = [{
    "dimensionsSplitOrder": [
      "rider_is_premium",
      "start_location_state",
      "end_location"
    ],
    "functionColumnPairs": [
      "SUM__payment_amount",
    ],
    "maxLeafRecords": 1
  }]
createTable(newTable_starTree["tableName"], 'Star Tree', newTable_starTree)

# Create a new table with text index
newTable_textIndex = copy.deepcopy(json_tableConfig)
newTable_textIndex["tableName"] = "trips_textIndex"
newTable_textIndex["fieldConfigList"]= [
  {
     "name":"driver_name",
     "encodingType":"RAW",
     "indexType":"TEXT"
  },
  {
     "name":"rider_name",
     "encodingType":"RAW",
     "indexType":"TEXT"
  }
]
newTable_textIndex["tableIndexConfig"]["noDictionaryColumns"] = [
     "driver_name",
     "rider_name"
 ]
createTable(newTable_textIndex["tableName"], 'Text Index', newTable_textIndex)

<Response [200]>
{"status":"Table trips_default_index_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_rawForwardIndex_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_sortedForwardIndex_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_bitmapInvertedIndex_startLocation_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_sortedInvertedIndex_startLocation_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_starTreeIndex_REALTIME succesfully added"}
<Response [200]>
{"status":"Table trips_textIndex_REALTIME succesfully added"}


Let's wait for all the tables to finish loading our sample data:

XXX use helper funcs

In [3]:
# Wait for tables to finish data consumption from Kafka Topic
tableConsuming = True
while tableConsuming:
    response = requests.post('http://pinot-broker.pinot:8099/query/sql', json={
            "sql" : "SELECT * FROM trips_starTreeIndex"
        })
    response_json=response.json()
    if response_json['minConsumingFreshnessTimeMs']>0 and response_json['minConsumingFreshnessTimeMs']<(int(round(time.time() * 1000)) - 10000 ) or response_json['minConsumingFreshnessTimeMs']>9000000036854775807:
        tableConsuming = False;
        print("")
        print("--Consumption of generated data for trips_starTreeIndex finished--")


--Consumption of generated data for trips_starTreeIndex finished--


In [4]:
# Read first start location and first driver name to use it as variables in upcoming queries
response_json = requests.post('http://pinot-broker.pinot:8099/query/sql', json={
    "sql": "SELECT start_location, driver_name FROM " + newTable_defaultIndex["tableName"] 
}).json()

startLocation = response_json['resultTable']['rows'][0][0]
# Read the first name of the driver name
driverName = (response_json['resultTable']['rows'][0][1]).split()[0]       

print(f"Using '{startLocation}' as start location and '{driverName}' as driver name for upcoming queries.")

Using 'Pforzheim' as start location and 'Ellen' as driver name for upcoming queries.


### Table Size

Although all tables contain the same amount of records and also the same record values, the table size differs.
This is because of the different indexes used. E.g. a text index on two columns consumes much more space compared to the raw forward index. The Star-Tree index is allocating the most disk space, as Pinot materializes pre-aggregations for calculations on defined metric columns.

XXX use DataFrame?

In [5]:
for table in table_list:
    url_getsize = 'http://pinot-controller.pinot:9000/tables/' + table[0] + '_REALTIME/size?detailed=false'
    response = (requests.get(url_getsize)).json()
    print('tableName: ' + response['tableName'])
    print('Index Description: ' + table[1])
    v_size=response['reportedSizeInBytes']/1024/1024
    print('reportedSizeInMB: '+ str(v_size))
    print(" ")

tableName: trips_default_index_REALTIME
Index Description: Default Index (Dictionary-encoded forward index with bit compression) for each column
reportedSizeInMB: 22.090063095092773
 
tableName: trips_rawForwardIndex_REALTIME
Index Description: Raw value forward index on start_location
reportedSizeInMB: 24.462448120117188
 
tableName: trips_sortedForwardIndex_REALTIME
Index Description: Sorted forward index with run-length encoding on start location
reportedSizeInMB: 20.088777542114258
 
tableName: trips_bitmapInvertedIndex_startLocation_REALTIME
Index Description: Bitmap inverted index on start_location
reportedSizeInMB: 22.93588161468506
 
tableName: trips_sortedInvertedIndex_startLocation_REALTIME
Index Description: Sorted inverted index on start_location
reportedSizeInMB: 20.08920669555664
 
tableName: trips_starTreeIndex_REALTIME
Index Description: Star Tree
reportedSizeInMB: 39.382994651794434
 
tableName: trips_textIndex_REALTIME
Index Description: Text Index
reportedSizeInMB: 3

## Index options in Apache Pinot

Pinot uses the MYSQL_ANSI dialect. Executing Joins or nested Subqueries is not supported. 
For accessing multiple tables in queries, the query engine Presto is recommended. In this report, we will focus on the functionality Pinot offers.
Pinot doesn't support data definition language. As already done, tables are created using the REST API.

(XXX remove as already described earlier?)

The function `executeSQLStatement` takes a query string and an array containing table names and index descriptions as input parameters. It executes the query on all tables which are defined in the array `table_list`. If `specific_tables_array` is empty, the query will be executed on all tables which have been created for this chapter. The top two records of the result data set are displayed once to get an insight into the result. Additionally, the function will create a `DataFrame` listing query execution statistics for each table. Metrics of one query execution will only be appended to the `DataFrame` if no exception occurs.

In [8]:
def executeSQLStatement(sql_statement_with_variable, specific_tables_array):
    pd.set_option('display.max_colwidth', None)
    df_metrics = pd.DataFrame(columns=['indextype','table', 'numDocsScanned',
       'numEntriesScannedInFilter', 'numEntriesScannedPostFilter',
       'totalDocs', 'timeUsedMs',
       'minConsumingFreshnessTimeMs',
       'exceptions'])
    b_resultRecordsNotShown = True;
    if not specific_tables_array:
        table_list_statement = table_list
    else:
        table_list_statement = specific_tables_array 
    for table in table_list_statement:
    
        sql_statement = sql_statement_with_variable.replace("XX_TABLE",table[0])
        sql_statement = sql_statement.replace("XX_STARTLOCATION","'"+startLocation+"'")
        sql_statement = sql_statement.replace("XX_DRIVERNAME","'"+driverName+"'") 
        response = requests.post('http://pinot-broker.pinot:8099/query/sql', json={
            "sql" : sql_statement
        })
        response_json=response.json()
        d = {'indextype': table[1], 'table': table[0],'numDocsScanned': [response_json['numDocsScanned']],'numDocsScanned': [response_json['numDocsScanned']],'numEntriesScannedInFilter': [response_json['numEntriesScannedInFilter']], 'numEntriesScannedPostFilter':[response_json['numEntriesScannedPostFilter']],'totalDocs':[response_json['totalDocs']],'timeUsedMs':[response_json['timeUsedMs']],'minConsumingFreshnessTimeMs':[response_json['minConsumingFreshnessTimeMs']],'exceptions':[response_json['exceptions']]}
        df_metrics_new = pd.DataFrame(data=d)
        if not response_json['exceptions']:
             df_metrics = df_metrics.append(df_metrics_new,ignore_index=True)
       

        if b_resultRecordsNotShown:
            try:
                if not response_json['exceptions']:
                    columnNames = response_json['resultTable']['dataSchema']['columnNames']
                    rows = response_json['resultTable']['rows']

                    result_dataframe = pd.DataFrame(columns=columnNames,data=rows)
                    print("Top two result records of: " + sql_statement )
                    display(result_dataframe.head(2))
                    b_resultRecordsNotShown = False
            except:
                pass

    print("Metrics of execution of: " + sql_statement_with_variable)
    display(df_metrics)

### Metrics

The main metrics of a query execution we will check are:
- __timeUsedMs__: Total time between broker receiving the query request request and sending the response back to the client.
- __numDocScanned__: Number of documents/records scanned while query processing. (Includes records scanned in the filter phase as well as after applying the filter.)
- __numEntriesScannedInFilter__: It is an indicator of the latency contributed by the lookup phase. If this number is high, applying an index on the selection criteria might improve performance, especially if the selection criteria is highly selective.
- __numEntriesScannedPostFilter__: High number is an indicator for low selectivity. Instead of regular indices, a star-tree index could help.

### Index Types

For the tables create above, we configured the following index types:

- Forward Index
    - __Default Index: Dictionary-encoded forward index with bit compression__: 
    Apache Pinot will use this index by default for each column if no other index is configured in the table configuration. An id is assigned to each distinct value of the column, afterwards a dictionary is built matching an id to the value. In the forward index, only the bit-compressed id is persisted instead of the values. This compression improves space efficiency of the storage, if there are only a few distinct values.
    
    XXX replace image
    <img src="images/RawValueForwardIndex.png" width="30%" height="30%">

    
    - __Raw Value Forward Index__: A raw value forward index is configured as a `noDictionaryColumn` in the table configuration. Instead of dictionary ids, the raw values will be stored in columns. Because of that, no dictionary lookup is required and due to the locality of values the performance of scanning large number of values is improved. 
    - __Sorted forward index with run-length encoding__: The sorted forward index is applied on top of the dictionary-encoding. For each dictionary id, a start and end document id is stored. Only one sorted column can be configured per table.
   
- Inverted Index: Inverted Indexes reduce the number of records which need be processed by identifying the ones which contain the search term. The inverted index is created by selecting all distinct values of a given column. For each value, a list of document ids which contain the value will be stored. If we search e.g. for "Hessen" as a state, we can look up the inverted index for "Hessen" and identify the documents in which that value appears. 
    - __Bitmap inverted index__: A map from each value to a bitmap is maintained for the column which is enabled as bitmap inverted index (e.g. "Thüringen" -> `Doc5, Doc1`). If a column is used frequently for filtering, an inverted index will improve the performance. 
    - __Sorted inverted index__: A sorted index can benefit from data locality, but can only be applied to one column.
    
- __Star-Tree Index__: This index is built on multiple columns and pre-aggregates results per configured dimension hierarchy level, so that less values need to be processed. This can significantly improve query performance for hierarchical data (e.g. groups of users, workspaces, or states of locations in the `trips` example), on the other hand pre-aggregation requires also more disk space (table size can easily grow about twice of the size as the other tables).
- __Text Index__: Text Indexes in Pinot allow to do aribtrary search on `STRING` columns (full text search).

#### Default Index (Dictionary-encoded forward index with bit compression) vs Raw value forward index

In [19]:
executeSQLStatement("select count, driver_name, driver_rating, end_location, end_location_state from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000",[['trips_default_index', 'Default Index (Dictionary-encoded forward index with bit compression) for each column'],['trips_rawForwardIndex', 'Raw value forward index on start_location']])

Top two result records of: select count, driver_name, driver_rating, end_location, end_location_state from trips_default_index WHERE start_location='Pforzheim' LIMIT 10000


Unnamed: 0,count,driver_name,driver_rating,end_location,end_location_state
0,1,Ellen Mars,0,Lauben,Bayern
1,1,Ronnie Renwick,5,Vitzenburg,Sachsen-Anhalt


Metrics of execution of: select count, driver_name, driver_rating, end_location, end_location_state from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Default Index (Dictionary-encoded forward index with bit compression) for each column,trips_default_index,310,315170,1550,315170,17,1618138752061,[]
1,Raw value forward index on start_location,trips_rawForwardIndex,310,315170,1550,315170,57,1618138750380,[]


The query execution on table `trips_rawForwardIndex` takes more time. The main difference between the two index types is, that the index on column `start_location` of `trips_default_index` creates a dictionary. This dictionary provides compression when values of the columns occurr repeatedly. 
A dictionary index can't provide this advantage over the other index, if the column values have a high cardinality.

#### Default Index (Dictionary-encoded forward index with bit compression) vs Sorted forward index with run-length encoding

In [27]:
executeSQLStatement("select * from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000", [['trips_default_index', 'Default Index (Dictionary-encoded forward index with bit compression) for each column'],['trips_sortedForwardIndex', 'Sorted forward index with run-length encoding on start location']])

Top two result records of: select * from trips_default_index WHERE start_location='Pforzheim' LIMIT 10000


Unnamed: 0,count,driver_name,driver_rating,end_location,end_location_state,end_zip_code,license_plate,payment_amount,payment_tip_amount,request_time_millis,rider_is_premium,rider_name,rider_rating,start_location,start_location_state,start_zip_code,trip_end_time_millis,trip_start_time_millis,trip_wait_time_millis
0,1,Ellen Mars,0,Lauben,Bayern,87761,KL-KT-11,282.9,17.0,1618049533984,0,Carolyn Merritt,4,Pforzheim,Baden-Württemberg,75172,1618069278992,1618050878992,1345008
1,1,Ronnie Renwick,5,Vitzenburg,Sachsen-Anhalt,6268,YA-QE-61,123.22,19.0,1618050353979,0,Andrea Patterson,4,Pforzheim,Baden-Württemberg,75172,1618060313830,1618051350565,996586


Metrics of execution of: select * from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Default Index (Dictionary-encoded forward index with bit compression) for each column,trips_default_index,310,315170,5890,315170,20,1618138752061,[]
1,Sorted forward index with run-length encoding on start location,trips_sortedForwardIndex,310,0,5890,315170,16,1618138746535,[]


The sorted forward index on column `start_location` of table `trips_sortedForwardIndex` benefits from data locality. Because of this, `numEntriesScannedInFilter` is less than for the column with default index.
Thus, query executions can be faster when using the sorted forward index on column `start_location`.

#### Default Index (Dictionary-encoded forward index with bit compression) vs Inverted index (Bitmap + Sorted)

In [28]:
executeSQLStatement("select driver_name, rider_name from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000", [['trips_default_index', 'Default Index (Dictionary-encoded forward index with bit compression) for each column'],['trips_bitmapInvertedIndex_startLocation', 'Bitmap inverted index on start_location'],['trips_sortedInvertedIndex_startLocation','Sorted inverted index on start_location']])

Top two result records of: select driver_name, rider_name from trips_default_index WHERE start_location='Pforzheim' LIMIT 10000


Unnamed: 0,driver_name,rider_name
0,Ellen Mars,Carolyn Merritt
1,Ronnie Renwick,Andrea Patterson


Metrics of execution of: select driver_name, rider_name from XX_TABLE WHERE start_location=XX_STARTLOCATION LIMIT 10000


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Default Index (Dictionary-encoded forward index with bit compression) for each column,trips_default_index,310,315170,620,315170,31,1618138752061,[]
1,Bitmap inverted index on start_location,trips_bitmapInvertedIndex_startLocation,310,0,620,315170,5,1618138752594,[]
2,Sorted inverted index on start_location,trips_sortedInvertedIndex_startLocation,310,0,620,315170,13,1618138744079,[]


As we can see, an inverted index can improve the query performance. In this case, no entries have to be scanned in the filtering phase and the query execution time is faster compared to using the dictionary encoded index.
By using the sorted inverted index, the performance can benefit from data locality. 

#### Text Index

A query searching selecting drivers by first name can only be executed successfully on table `trips_textIndex`, as it has a text index defined on column `driver_name`. The same query execution on other tables will fail, the metrics table only displays the executions without an exception.

In [30]:
executeSQLStatement("select * from XX_TABLE WHERE TEXT_MATCH ('driver_name',XX_DRIVERNAME) LIMIT 10000", [['trips_textIndex','Text Index']])

Top two result records of: select * from trips_textIndex WHERE TEXT_MATCH ('driver_name','Ellen') LIMIT 10000


Unnamed: 0,count,driver_name,driver_rating,end_location,end_location_state,end_zip_code,license_plate,payment_amount,payment_tip_amount,request_time_millis,rider_is_premium,rider_name,rider_rating,start_location,start_location_state,start_zip_code,trip_end_time_millis,trip_start_time_millis,trip_wait_time_millis
0,1,Ellen Mcdaniels,4,Saffig,Rheinland-Pfalz,56648,AG-SM-55,392.8,5.0,1618330000991,0,Michelle Zymowski,1,Wertach,Bayern,87497,1618361499662,1618330489136,488145
1,1,Ellen Mars,5,Negernbötel,Schleswig-Holstein,23795,GJ-HE-79,26.24,24.0,1618330026999,0,Maurice Schultz,2,Hagen,Nordrhein-Westfalen,58135,1618333671378,1618331412554,1385555


Metrics of execution of: select * from XX_TABLE WHERE TEXT_MATCH ('driver_name',XX_DRIVERNAME) LIMIT 10000


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Text Index,trips_textIndex,905,0,17195,315170,32,1618138758025,[]


#### Star-Tree Index

The Start-Tree index utilizes pre-aggregation of results and is built on multiple columns. This index can improve the performance for specific queries, because the number of values to be processed is reduced by the pre-aggregation. Although usage of a Star-Tree index has the advantage of decreased query runtime, the table size on disk is significantly increased.
For table `trips_starTreeIndex`, a Star-Tree index is built on the dimensions `rider_is_premium`, `start_location_state` and `end_location`. The sum of `payment_amount` is pre-aggregated and materialized based on the configured dimensions.

In [31]:
executeSQLStatement("SELECT SUM(payment_amount) FROM XX_TABLE",[["trips_default_index","Default Index (Dictionary-encoded forward index with bit compression) for each column"],["trips_starTreeIndex", "Start Tree 1"]] )

Top two result records of: SELECT SUM(payment_amount) FROM trips_default_index


Unnamed: 0,sum(payment_amount)
0,221695100.0


Metrics of execution of: SELECT SUM(payment_amount) FROM XX_TABLE


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Default Index (Dictionary-encoded forward index with bit compression) for each column,trips_default_index,315170,0,315170,315170,15,1618138752061,[]
1,Start Tree 1,trips_starTreeIndex,15185,0,15185,315170,10,1618138757485,[]


When selecting the Star-Node without grouping by any dimension, Pinot doesn't need to access all documents. Instead, only the Star-Node of each segment is required. The reason, why `numDocsScanned` is not equal to the number of segments is, that there is always one segment that isn't completed yet. Pinot accesses each record of the consuming segment (status `IN_PROGRESS`).

In [34]:
print("The table with the Star Tree Index consists of " + str(len(requests.get('http://pinot-controller.pinot:9000/segments/trips_starTreeIndex').json()[0]['REALTIME'])) + " segments")

The table with the Star Tree Index consists of 16 segments


Filtering on the dimension `rider_is_premium`, which builds the first node of the Star-Tree index, halves the number of `numDocsScanned`. This is because `rider_is_premiumn` is assigned randomly in our data generation, so there is a fifty percent chance that a rider is not premium and less documents of the consuming segment need to be scanned.

In [35]:
executeSQLStatement("SELECT SUM(payment_amount) FROM XX_TABLE WHERE rider_is_premium = 0",[["trips_default_index","Default Index (Dictionary-encoded forward index with bit compression) for each column"],["trips_starTreeIndex", "Start Tree 1"]])

Top two result records of: SELECT SUM(payment_amount) FROM trips_default_index WHERE rider_is_premium = 0


Unnamed: 0,sum(payment_amount)
0,110840300.0


Metrics of execution of: SELECT SUM(payment_amount) FROM XX_TABLE WHERE rider_is_premium = 0


Unnamed: 0,indextype,table,numDocsScanned,numEntriesScannedInFilter,numEntriesScannedPostFilter,totalDocs,timeUsedMs,minConsumingFreshnessTimeMs,exceptions
0,Default Index (Dictionary-encoded forward index with bit compression) for each column,trips_default_index,157713,315170,157713,315170,17,1618138752061,[]
1,Start Tree 1,trips_starTreeIndex,7642,15170,7642,315170,4,1618138757485,[]


##### Trace Details

The trace details of the query execution display how much time was spent for which operator execution. We extract the operator details of the following query. The query executed on `trips_default_index` requires a lot of Aggregation Operators, as no data is pre-aggregated like it is the case for the table `trips_starTreeIndex`.

XXX use helper funcs  
XXX use DataFrame?

In [40]:
def query_sql(query):
    print("query: " + query)
    return requests.get('http://pinot-broker.pinot:8099/query/sql', params={
        "sql" : query,
        "trace": "true"
    }).json()

def getTraceDetails(sql):
    response = query_sql(sql)
    test = (response['traceInfo']['pinot-server-0.pinot-server-headless.pinot.svc.cluster.local'])[7:-3]
    string_tracedetails = test.split(',')
    # display only trace details with time > 0
    tracedetails_time = []
    for x in string_tracedetails:
        if "Time" in x:
            if ":0" in x:
                pass
            else:
                tracedetails_time.append(x)
    return tracedetails_time

print("\nGet Trace Details for trips_starTreeIndex\n")
print(getTraceDetails("SELECT SUM(payment_amount) FROM trips_starTreeIndex WHERE rider_is_premium = 0"))

print("\nGet Trace Details for trips_default_index\n")
print(getTraceDetails("SELECT SUM(payment_amount) FROM trips_default_index WHERE rider_is_premium = 0"))


Get Trace Details for trips_starTreeIndex

query: SELECT SUM(payment_amount) FROM trips_starTreeIndex WHERE rider_is_premium = 0
['{"AggregationOnlyCombineOperator Time":1}', '{"InstanceResponseOperator Time":1}]}', '{"DocIdSetOperator Time":1}', '{"ProjectionOperator Time":1}', '{"TransformOperator Time":1}', '{"AggregationOperator Time":1}']

Get Trace Details for trips_default_index

query: SELECT SUM(payment_amount) FROM trips_default_index WHERE rider_is_premium = 0
['{"AggregationOnlyCombineOperator Time":19}', '{"InstanceResponseOperator Time":19}]}', '{"AggregationOperator Time":1}', '{"AggregationOperator Time":1}', '{"AggregationOperator Time":1}', '{"AggregationOperator Time":1}', '{"DocIdSetOperator Time":1}', '{"ProjectionOperator Time":1}', '{"TransformOperator Time":1}', '{"AggregationOperator Time":1}', '{"DocIdSetOperator Time":1}', '{"ProjectionOperator Time":1}', '{"TransformOperator Time":1}', '{"AggregationOperator Time":1}', '{"DocIdSetOperator Time":1}', '{"Proj

### Indexes - Comparison with other database technologies

Two categories of the indexing options demonstrated above, that can also be found in traditional databases, are Forward Indexes and Inverted Indexes.
Forward Indexes are frequently used in traditional database technologies as well to improve storage efficiency.
Search Engines most often rely on a inverted index, like for example EleasticSearch.
We saw, that there is a special index to do a fulltext search for records containing a specific string.

We also demonstrated two other indexing techniques, that are typically not offered by traditional database systems: raw value forward index and Star-Tree index.
The raw value forward index doesn't include dictionaries - when aggregating a large number of values, it can take advantage of data locality for scanning.

The Star-Tree Index is an important and special concept for Pinot, because it is utilizes pre-aggregation for group-by queries to achieve low query latencies. It is specifically designed for the analytical use cases which Pinot was built for and makes it a a key differenciator of Pinot.
E.g. Star-Tree Indexes can bring great benefits, if there is the requirement to return data e.g. per user level - like it is the case for the "Who viewed my Profile" application at LinkedIn.

### Maintenance - Table Deletion

In [41]:
## Delete Tables
#for table in table_list:
#    string = "http://pinot-controller.pinot:9000/tables/" + table[0]
#    response = requests.delete(string)
#    print(response.json())
#    table_list = []

# Closing Remarks
__TBD__:
Compared to known databases, complex set up and configurations.
The several different components ensure the flexible scale up of the cluster and the high availability, as the system would continue to server queries also if one node goes down. This is a big advantage, but the different also add more complexitiy to the whole landscape. 

We experienced the advantages of Pinot, e.g. the possibility to dynamically change configurations and to quickly create new tables consuming data from the Kafka topic. 