## Speed Up Your Data Analysis!

Today we are going to talk about three Python libraries:
- DTale
- Modin
- Vaex

**Dtale for fast Exploratory Data Analysis**

**Installation**

```pip install dtale```<br>
```conda install -c conda-forge dtale```


**Import DTale**

In [1]:
import dtale
import pandas as pd

**Let's Load a dataset**

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/sharmaroshan/FIFA-2019-Analysis/master/Footballer.csv",index_col = False)

In [3]:
df.shape

(18207, 89)

In [4]:
dtale.show(df)

2021-01-22 18:13:28,264 - INFO     - NumExpr defaulting to 8 threads.




**Disadvantages**
* Slow for Large Datasets
* A new library with some bugs

### Scale Pandas workflow with a single line of code

Modin can load medium sized datasets(upto 500,000 records) very fast.
<br>
<br>
Unlike Pandas which uses one core of the processor to process datasets, Modin distributes the processing to all the available cores in the CPU.
<br>
Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
<br>
Modin uses Ray or Dask(parallel processing libraries) to provide an effortless way to speed up your pandas notebooks, scripts, and libraries.

### Installation

pip install modin[ray]  - Install Modin dependencies and Ray to run on Ray<br>
pip install modin[dask]  - Install Modin dependencies and Dask to run on Dask<br>
pip install modin[all]  - Install all of the above<br>

### Claim By Modin!
Modin is a DataFrame for datasets from 1MB to 1TB+

**Loading a dataset in Pandas v/s Modin**

In [6]:
import pandas as pd

Link to the dataset : https://www.kaggle.com/skihikingkevin/csgo-matchmaking-damage?select=esea_master_dmg_demos.part1.csv

In [7]:
df = pd.read_csv('demo.csv')
df.head()

Unnamed: 0,file,round,tick,seconds,att_team,vic_team,att_side,vic_side,hp_dmg,arm_dmg,...,wp,wp_type,att_id,att_rank,vic_id,vic_rank,att_pos_x,att_pos_y,vic_pos_x,vic_pos_y
0,esea_match_13818366.dem,1,21257,165.8779,Team 2,Team 1,Terrorist,CounterTerrorist,20,0,...,Glock,Pistol,76561198242409332,0,76561198323654039,0,695.9623,1459.579,-299.9824,2220.148
1,esea_match_13818366.dem,1,21653,168.9752,Team 1,Team 2,CounterTerrorist,Terrorist,100,0,...,USP,Pistol,76561198013283167,0,76561198355844440,0,-236.3433,923.7209,271.5201,2097.479
2,esea_match_13818366.dem,1,21717,169.4758,Team 1,Team 2,CounterTerrorist,Terrorist,100,0,...,USP,Pistol,76561198323654039,0,76561198305399476,0,-422.4744,2051.092,909.0534,1543.72
3,esea_match_13818366.dem,1,21937,171.1965,Team 1,Team 2,CounterTerrorist,Terrorist,14,6,...,USP,Pistol,76561198323654039,0,76561197962070685,0,-263.6333,2248.544,704.7518,1579.347
4,esea_match_13818366.dem,1,22229,173.4804,Team 1,Team 2,CounterTerrorist,Terrorist,100,0,...,USP,Pistol,76561198853893462,0,76561198397252030,0,-558.0109,-874.1711,254.492,-691.1022


In [8]:
%%time
df1 = pd.read_csv('demo.csv')

Wall time: 28.7 s


In [10]:
df1.shape

(4546085, 23)

**Magic of Modin!**

In [12]:
import modin.pandas as pd

In [13]:
%%time
df1 = pd.read_csv("demo.csv",index_col=False)

Wall time: 6.91 s


**Another Example with a smaller dataset**

In [26]:
import numpy as np
import modin.pandas as pd

In [27]:
frame_data = np.random.randint(0, 100, size=(2**10, 2**8))

In [28]:
%%time
df = pd.DataFrame(frame_data)

Wall time: 152 ms




In [29]:
df.size

262144

**Vaex Library - Faster than Modin and Used for Very Large Datasets(in measures of Tb and millions of records!)**

Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets

**Installation**

```git clone https://github.com/vaexio/vaex.git``` <br>
```python setup.py install```

Since Vaex takes time to setup....we will proceed with a pre executed notebook

**Shoutout to Chanin Nantasenamat(Data Professor on YT)** for the notebook!
https://github.com/dataprofessor/python/blob/main/vaex.ipynb<br>
http://youtube.com/dataprofessor

### Additional Resources

*Dtale* : https://pypi.org/project/dtale/<br>
*Modin Pandas* : https://modin.readthedocs.io/en/latest/<br>
*Vaex* : https://vaex.io/docs/<br>
*Github Repo* : 