# Quickstart

## `Atoms` Data
NFFLr makes it easy to load data and transform it into various formats.

The primary ways of interacting with data are `Atoms` and `AtomsDataset`,
which is a [PyTorch DataSet](https://pytorch.org/docs/stable/data.html) that returns `Atoms` instances.
The most convenient way to get started is with a [named Jarvis dataset](https://jarvis-tools.readthedocs.io/en/master/databases.html):

In [15]:
from nfflr.data.dataset import AtomsDataset
dataset = AtomsDataset("dft_3d", target="formation_energy_peratom")

dataset_name='dft_3d'
Obtaining 3D dataset 76k ...
Reference:https://www.nature.com/articles/s41524-020-00440-1
Other versions:https://doi.org/10.6084/m9.figshare.6815699
Loading the zipfile...
Loading completed.


The dataset yields a tuple of an `Atoms` instance and the target value, *e.g.,* `target="formation_energy_peratom"`:

In [16]:
atoms, target = dataset[0]
print(f"{atoms.lattice=}")
print(f"{atoms.positions=}")
print(f"{atoms.numbers=}")
print(f"{target=}")

atoms.lattice=tensor([[3.5669, 0.0000, -0.0000],
        [0.0000, 3.5669, -0.0000],
        [-0.0000, -0.0000, 9.3971]])
atoms.positions=tensor([[0.7500, 0.7500, 0.7849],
        [0.2500, 0.2500, 0.2151],
        [0.2500, 0.7500, 0.5000],
        [0.7500, 0.2500, 0.5000],
        [0.2500, 0.7500, 0.0000],
        [0.7500, 0.2500, 0.0000],
        [0.7500, 0.7500, 0.3075],
        [0.2500, 0.2500, 0.6925]])
atoms.numbers=tensor([22, 22, 29, 29, 14, 14, 33, 33], dtype=torch.int32)
target=tensor(-0.4276)


Internally, `AtomsDataset` uses a [pandas dataframe](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe) to store the datasets, so any key in the jarvis dataset is a valid `target`.
For example, `dft_3d` contains a large number of target properties, including some non-scalar quantities:

In [34]:
selected_cols = ("jid", "formula", "formation_energy_peratom", "optb88vdw_bandgap", "elastic_tensor")
dataset.df.loc[:,selected_cols].head()

Unnamed: 0,jid,formula,formation_energy_peratom,optb88vdw_bandgap,elastic_tensor
0,JVASP-90856,TiCuSiAs,-0.42762,0.0,na
1,JVASP-86097,DyB6,-0.41596,0.0,na
2,JVASP-64906,Be2OsRu,0.04847,0.0,na
3,JVASP-98225,KBi,-0.4414,0.472,na
4,JVASP-10,VSe2,-0.71026,0.0,"[[136.4, 27.8, 17.5, 0.0, -5.5, 0.0], [27.8, 1..."


We can change the target column, but missing values currently need to be handled manually.

In [33]:
dataset.target = "elastic_tensor"
atoms, elastic_tensor = dataset[4]
elastic_tensor

tensor([[136.4000,  27.8000,  17.5000,   0.0000,  -5.5000,   0.0000],
        [ 27.8000, 136.4000,  17.5000,   0.0000,   5.5000,   0.0000],
        [ 17.5000,  17.5000,  40.7000,   0.0000,   0.0000,   0.0000],
        [  0.0000,   0.0000,   0.0000,  54.3000,   0.0000,  -5.5000],
        [ -5.5000,   5.5000,   0.0000,   0.0000,  13.7000,   0.0000],
        [  0.0000,   0.0000,   0.0000,  -5.5000,   0.0000,  13.7000]])

## Force field datasets

Force field datasets like `mlearn`, `alignn_ff_db`, and `m3gnet` have a special target key `target="energy_and_forces"` that configure `AtomsDataset` to return a dictionary of target values containing the total energy of the atomic configuration, the forces, and the stresses if they are available.

In [35]:
dataset = AtomsDataset("mlearn", target="energy_and_forces")
atoms, target = dataset[0]
target

dataset_name='mlearn'
Obtaining mlearn dataset 1730...
Reference:https://github.com/materialsvirtuallab/mlearn
Loading the zipfile...
Loading completed.


{'energy': tensor(-64656.0625),
 'forces': tensor([[-1.9282e-01, -1.8793e+00, -6.6374e-01],
         [-8.2543e-03, -2.0313e-01,  3.6808e-01],
         [-5.5372e-01, -1.4736e+00,  1.2997e+00],
         [ 4.5678e-01,  5.1175e-01, -1.0934e+00],
         [-1.6499e+00, -1.6259e+00,  4.5255e-01],
         [-1.6698e-01,  6.8080e-01,  6.7749e-01],
         [ 3.6802e-02, -3.1423e+00, -2.0166e+00],
         [-1.0730e-01, -3.5780e-01,  1.1357e+00],
         [-1.9132e-01,  5.1381e-01,  3.4296e-01],
         [ 2.0090e+00,  1.5143e+00, -3.5578e-01],
         [-1.7128e-01, -2.7808e+00, -1.4215e+00],
         [-9.3987e-01, -1.6757e-02,  7.9322e-01],
         [ 3.7190e-01, -9.0627e-01, -5.2933e-01],
         [ 5.6458e-01, -9.6833e-01, -7.0043e-01],
         [-4.5756e-01, -6.5868e-02, -3.7038e-01],
         [-1.2044e+00,  6.3979e-01,  7.5036e-01],
         [-1.5743e+00,  6.4479e-02, -6.7272e-01],
         [-9.8223e-01, -9.5903e-02, -8.7198e-01],
         [ 4.9518e-01, -2.7982e-01, -4.6208e-01],
        

In [1]:

from nfflr.data.graph import periodic_adaptive_radius_graph

In [5]:
d[0]

(Atoms(lattice=tensor([[10.5241,  0.0000,  0.0000],
         [ 0.0000, 10.5241,  0.0000],
         [ 0.0000,  0.0000, 10.5241]]), positions=tensor([[8.8610e-03, 1.2966e-02, 3.4177e-01],
         [1.1860e-03, 9.9954e-01, 6.5278e-01],
         [1.5720e-03, 3.4240e-01, 9.8957e-01],
         [9.9629e-01, 3.3764e-01, 3.4630e-01],
         [9.3410e-03, 3.4059e-01, 6.6076e-01],
         [9.9993e-01, 6.5754e-01, 1.9600e-04],
         [4.7140e-03, 6.9225e-01, 3.4508e-01],
         [9.8893e-01, 6.6955e-01, 6.6091e-01],
         [3.3666e-01, 9.9502e-01, 1.4700e-04],
         [3.1948e-01, 9.9440e-01, 3.3301e-01],
         [3.4375e-01, 2.9278e-02, 6.6325e-01],
         [3.3291e-01, 3.2844e-01, 9.8932e-01],
         [3.2360e-01, 3.5171e-01, 3.3943e-01],
         [3.2243e-01, 3.3939e-01, 6.6961e-01],
         [3.3457e-01, 6.7072e-01, 9.9313e-01],
         [3.4607e-01, 6.5817e-01, 3.3440e-01],
         [3.4433e-01, 6.6961e-01, 6.7335e-01],
         [6.8091e-01, 9.9228e-01, 2.0310e-03],
         [6.693