# XenonPy datatools

**XenonPy** comes with a dataset system to help users' data loading.
In this sample, we will explain the conception and demonstrate how to use these to play with date in **XenonPy**.

## conception

### Loader

Using `xenonpy.datatools.Loader` class to load preset dataset that comes with XenonPy or load a `LocalStorage` object.

Initialized with None parameters means to load data in ``~/.xenonpy/cached`` or ``~/.xenonpy/dataset`` dir, like this:

In [1]:
from xenonpy.datatools import preset

ele = preset('elements_completed')
ele.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94 entries, H to Pu
Data columns (total 58 columns):
atomic_number                    94 non-null float64
atomic_radius                    94 non-null float64
atomic_radius_rahm               94 non-null float64
atomic_volume                    94 non-null float64
atomic_weight                    94 non-null float64
boiling_point                    94 non-null float64
bulk_modulus                     94 non-null float64
c6_gb                            94 non-null float64
covalent_radius_cordero          94 non-null float64
covalent_radius_pyykko           94 non-null float64
covalent_radius_pyykko_double    94 non-null float64
covalent_radius_pyykko_triple    94 non-null float64
covalent_radius_slater           94 non-null float64
density                          94 non-null float64
dipole_polarizability            94 non-null float64
electron_negativity              94 non-null float64
electron_affinity                94 non-null float64
e

### For embed dataset, `Loader` class has correspond property to retrieved data directly

In [2]:
preset.elements_completed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94 entries, H to Pu
Data columns (total 58 columns):
atomic_number                    94 non-null float64
atomic_radius                    94 non-null float64
atomic_radius_rahm               94 non-null float64
atomic_volume                    94 non-null float64
atomic_weight                    94 non-null float64
boiling_point                    94 non-null float64
bulk_modulus                     94 non-null float64
c6_gb                            94 non-null float64
covalent_radius_cordero          94 non-null float64
covalent_radius_pyykko           94 non-null float64
covalent_radius_pyykko_double    94 non-null float64
covalent_radius_pyykko_triple    94 non-null float64
covalent_radius_slater           94 non-null float64
density                          94 non-null float64
dipole_polarizability            94 non-null float64
electron_negativity              94 non-null float64
electron_affinity                94 non-null float64
e

For general perpouse, you can load any resource from a http-request.

Assume we want to fetch https://raw.githubusercontent.com/yoshida-lab/XenonPy/master/requirements.txt

### save

Using `xenonpy.datatools.LocalStorage` class to save data.
You can save any python object and retriever them at any time.

In [3]:
from xenonpy.datatools import LocalStorage
import numpy as np
np.random.seed(0)

# name our data
test_data = LocalStorage('test_data')

# generate some data
data1 = np.random.randint(0, 10, 5)  # load it back
test_data(data1)  # save it
data1 = test_data.last()  # load it


print('data: ', data1, 'retriever data: ', data1)
print(test_data, '\n')



# update
data2 = np.random.randint(5, 10, 5)
test_data(data2)
data2 = test_data.last()

print('new data: ', data2, 'retriever new data: ', data2)
print(test_data, '\n')

# get old data
data2 = test_data[0]
print('retriever oldest data: ', data2)

# delete all
test_data.clean()
print(test_data)

data:  [5 0 3 3 7] retriever data:  [5 0 3 3 7]
<test_data> include:
"unnamed": 1 

new data:  [6 8 7 9 5] retriever new data:  [6 8 7 9 5]
<test_data> include:
"unnamed": 2 

retriever oldest data:  [5 0 3 3 7]
<test_data> include:


### You can name you data

In [4]:
test_data(data1=data1, data2=data2)
print(test_data, '\n')

# retriver by name
data1 = test_data.last('data1')
print('data1: ', data1, '; retriever data1: ', data1, '\n')

# by name and index or slice
data1 = test_data['data1', 0]
print('data1 at index 0: ', data1)

<test_data> include:
"data1": 1
"data2": 1 

data1:  [5 0 3 3 7] ; retriever data1:  [5 0 3 3 7] 

data1 at index 0:  [5 0 3 3 7]


Combin them together

In [5]:
test_data(data1, data2)
print(test_data)

# dump all last to file
test_data.dump('~/test')

from pathlib import Path
files = Path('~/test').expanduser().iterdir()
for f in files:
    print(f)

<test_data> include:
"unnamed": 2
"data1": 1
"data2": 1
/Users/liuchang/test/test_data-2018-06-15_21-05-07_462302.pkl.z
