Data access

Dataset is an abstraction of the local file system. Users can add their local paths into this system to easily access the data inside. The basic concept is to treat a data file as a property of a Dataset object. The following docs show how easy it is to interact with the data in this system.

Dataset

Assuming that you have some data files data1.csv, data1.pd.xz, data1.pkl.z under dir /set1 and data2.csv, data2.pd.xz, data2.pkl.z under dir /set2.

The following codes will create a Dataset object containing all available files under /set1 and /set2.

>>> from xenonpy.datatools import Dataset >>> dataset = Dataset('/set1', '/set2') >>> dataset <Dataset> includes: "data1": /set1/data1.pd.xz "data2": /set2/data2.pd.xz

Now, you can retrieve data by their name like this:

>>> dataset.dataframe.data1

What the code did is that, the dataset loaded a file with name data1.pd.xz from /set1 or /set2. In this case, the /set1/data1.pd.xz was loaded.

It is important to note that we called a property named dataframe before we load data1 in order to let dataset know that it is loading a pandas.DataFrame object file using the pd.read_pickle function.

Currently, 4 loaders are available out-of-the-box. The information of built-in loaders is summarised as below.

built-in loaders

file extension	loader	description
`pd(.*)`	pd.read_pickle	pandas.DataFrame object file
`csv`	pd.read_csv	csv file
`xlsx\|xls`	pd.read_excel	excel file
`pkl(.*)`	joblib.load	common pickled files

The default loader is dataframe. This means that if you want to load a pandas.DataFrame object, you can omit the dataframe. The following code exactly does the same work as explained above:

>>> dataset.data1

You can also specify the default loader by setting the backend parameter:

>>> dataset = Dataset('set1', 'set2', backend='csv') >>> dataset.data1 # this will load '/set1/data1.csv'

preset

XenonPy also uses this system to provide some built-in data. Currently, two sets of element-level property data are available out-of-the-box (elements and elements_completed (imputed version of elements)). These data were collected from mendeleev, pymatgen, CRC Hand Book and Magpie. To know the details of elements_completed, see features:Data access

Use the following codes to load elements and elements_completed.

>>> from xenonpy.datatools import preset >>> preset.elements >>> preset.elements_completed

If you will get a file not exist error, please run the following code to sync your local dataset.

>>> from xenonpy.datatools import preset >>> preset.sync('elements') >>> preset.sync('elements_completed')

These are still some advanced uses of Dataset and preset. For more details, see tutorials/1-dataset:Advance.

Also see the jupyter files at:

https://github.com/yoshida-lab/XenonPy/tree/master/samples/dataset_and_preset.ipynb

Storage

For implementation details, you can check out our sample codes:

https://github.com/yoshida-lab/XenonPy/tree/master/samples/storage.ipynb

Advance

Coming soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1-dataset.rst

1-dataset.rst

Data access

Dataset

preset

Storage

Advance

Files

1-dataset.rst

Latest commit

History

1-dataset.rst

File metadata and controls

Data access

Dataset

preset

Storage

Advance