# Devel read/write LAMMPS data files
`2025Sep24`: current ASE does not save `atoms.arrays['type']` explicitly, so read/write of LAMMPS data/dump files can not recover the atom types. 

Devel purpose: read/write LAMMPS data/dump files with atom types.

## Read DUMP file
When read DUMP file, ASE use `mass`/`type` columns to determine the chemical symbols. 
- With [MR 3529](https://gitlab.com/ase/ase/-/merge_requests/3529/diffs), ASE can determine the chemical symbols from `mass` column, so can save the `atoms.arrays['type']` as well.

In [None]:
from ase.io.lammpsrun import read_lammps_dump

dump_file = "testfile/ch4.dump"
struct = read_lammps_dump(dump_file)

In [10]:
struct.calc.results

{'energy': 0.0,
 'forces': array([[-0.144013  , -0.264858  , -0.0293463 ],
        [ 0.115898  ,  0.0821619 , -0.0790613 ],
        [-0.158412  , -0.109536  , -0.0274297 ],
        [ 0.190668  ,  0.027508  , -0.157418  ],
        [-0.00414137,  0.264724  ,  0.293255  ]])}

### Read atoms `type` from DUMP 
Using `asext.io.lmpdata.read_lmpdump_text()`
- This function is a modified version of `ase.io.lammpsrun.read_lammps_dump_text` to allow storing atom types if `type` column is given.

NOTE: ASE does not support writing `lammps-dump-text`. [See format](https://ase-lib.org/ase/io/io.html#)

In [None]:
from asext.io.lmpdata import read_lammps_dump_text
import numpy as np

dump_file = "testfile/ch4.dump"
struct2 = read_lammps_dump_text(dump_file)
struct2.arrays

{'numbers': array([1, 1, 1, 1, 6]),
 'positions': array([[4.65255, 3.49679, 4.42777],
        [3.92359, 4.7445 , 4.96772],
        [5.10644, 5.63801, 4.43427],
        [5.71649, 4.38867, 3.76125],
        [4.84659, 4.569  , 4.39505]]),
 'type': array([1, 1, 3, 1, 2])}

In [17]:
struct2.get_chemical_symbols()

['H', 'H', 'H', 'H', 'C']

In [4]:
# get unique types in order of first occurrence
unique_types, first_idx = np.unique(struct2.arrays["type"], return_index=True)
# map them to symbols
symbols_by_type = [struct2.symbols[i] for i in first_idx]

print(unique_types)  # [1 2 3]
print(symbols_by_type)  # ['H', 'C', 'H']

[1 2 3]
['H', 'C', 'H']


In [3]:
### Write extxyz file
from ase.io import write

write("testfile/ch4_from_dump.extxyz", struct2, format="extxyz")

## Write atoms `type` to DATA file
ASE function does not write `type` based on `atoms.arrays['type']`

In [4]:
### ASE function
from ase.io.lammpsdata import write_lammps_data

write_lammps_data("testfile/ch4_ase_write.data", struct2, masses=True)

In [None]:
### `asext` function
from asext.io.lmpdata import write_lammmps_data

write_lammmps_data("testfile/ch4_thang_write.data", struct2, masses=True)