## Load and save data

XenonPy come with some datatools for data loading and saving in efficient way.
These samples show how play with date in XenonPy's.


### Load

Using `xenonpy.utils.Loader` class to load data from XenonPy preset dataset or from cached user created under ``~/.xenonpy/cached``.

Initialized with None parameters means to load data in ``~/.xenonpy/cached`` or ``~/.xenonpy/dataset`` dir, like this:

In [1]:
from xenonpy.utils import Loader

load = Loader()  # init with None params
ele = load('elements_completed')
ele.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94 entries, H to Pu
Data columns (total 58 columns):
atomic_number                    94 non-null float64
atomic_radius                    94 non-null float64
atomic_radius_rahm               94 non-null float64
atomic_volume                    94 non-null float64
atomic_weight                    94 non-null float64
boiling_point                    94 non-null float64
bulk_modulus                     94 non-null float64
c6_gb                            94 non-null float64
covalent_radius_cordero          94 non-null float64
covalent_radius_pyykko           94 non-null float64
covalent_radius_pyykko_double    94 non-null float64
covalent_radius_pyykko_triple    94 non-null float64
covalent_radius_slater           94 non-null float64
density                          94 non-null float64
dipole_polarizability            94 non-null float64
electron_negativity              94 non-null float64
electron_affinity                94 non-null float64
e

For embed dataset, `Loader` class has correspond property to retrieved data directly

In [2]:
load.elements_completed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94 entries, H to Pu
Data columns (total 58 columns):
atomic_number                    94 non-null float64
atomic_radius                    94 non-null float64
atomic_radius_rahm               94 non-null float64
atomic_volume                    94 non-null float64
atomic_weight                    94 non-null float64
boiling_point                    94 non-null float64
bulk_modulus                     94 non-null float64
c6_gb                            94 non-null float64
covalent_radius_cordero          94 non-null float64
covalent_radius_pyykko           94 non-null float64
covalent_radius_pyykko_double    94 non-null float64
covalent_radius_pyykko_triple    94 non-null float64
covalent_radius_slater           94 non-null float64
density                          94 non-null float64
dipole_polarizability            94 non-null float64
electron_negativity              94 non-null float64
electron_affinity                94 non-null float64
e

For general perpouse, you can load any resource from a http-request.

Assume we want to fetch https://raw.githubusercontent.com/yoshida-lab/XenonPy/master/requirements.txt

In [3]:
from xenonpy.utils.datatools import Loader
load = Loader('https://raw.githubusercontent.com/yoshida-lab/XenonPy/master/')
file_path = load('requirements.txt')
with open(str(file_path), 'r') as f:
    print('requirements are:\n')
    print(f.read())

requirements are:

numpy
pandas
pymatgen
matminer
tqdm
seaborn
PyYAML
scikit-learn
scipy
plotly
requests



### save

Using `xenonpy.utils.Saver` class to save data.
You can save any python object and retriever them at any time.

In [4]:
from xenonpy.utils import Saver
import numpy as np
np.random.seed(0)

# name our data
saver = Saver('test_data')

# generate some data
data1 = np.random.randint(0, 10, 5)  # load it back
saver(data1)  # save it
data1 = saver.last()  # load it


print('data: ', data1, 'retriever data: ', data1)
print(saver, '\n')



# update
data2 = np.random.randint(5, 10, 5)
saver(data2)
data2 = saver.last()

print('new data: ', data2, 'retriever new data: ', data2)
print(saver, '\n')

# get old data
data2 = saver[0]
print('retriever oldest data: ', data2)

# delete all
saver.clean()
print(saver)

data:  [5 0 3 3 7] retriever data:  [5 0 3 3 7]
"test_data" include:
"data1": 1
"data2": 1
"unnamed": 3 

new data:  [6 8 7 9 5] retriever new data:  [6 8 7 9 5]
"test_data" include:
"data1": 1
"data2": 1
"unnamed": 4 

retriever oldest data:  [5 0 3 3 7]
"test_data" include:


You can name you data

In [5]:
saver(data1=data1, data2=data2)
print(saver, '\n')

# retriver by name
data1 = saver.last('data1')
print('data1: ', data1, '; retriever data1: ', data1, '\n')

# by name and index or slice
data1 = saver['data1', 0]
print('data1 at index 0: ', data1)

"test_data" include:
"data1": 1
"data2": 1 

data1:  [5 0 3 3 7] ; retriever data1:  [5 0 3 3 7] 

data1 at index 0:  [5 0 3 3 7]


Combin them together

In [6]:
saver(data1, data2)
print(saver)

# dump all last to file
saver.dump('~/test')

from pathlib import Path
files = Path('~/test').expanduser().iterdir()
for f in files:
    print(f)

"test_data" include:
"data1": 1
"data2": 1
"unnamed": 2
/Users/liuchang/test/test_data-2018-03-01_15-26-01_575440.pkl.z
/Users/liuchang/test/test_data-2018-03-01_15-23-13_137844.pkl.z
