# Scientific data formats:

In this tutorial, we will learn to open datasets that are stored in different scientific data formats:

- Binary format
- NetCDF
- HDF5
- VTK
- FITS

Please download the files from here:

https://drive.google.com/file/d/1q1VQUgiXCOcUcT0EnVp0nMbt-gQqRQVM/view?usp=drive_link


# 1. Binary data format

Binary data refers to raw data.

- It can be written in single precission (4 bytes = 32 bits) or double (8 bytes = 64 bits) precission.

- This means no headers, no metadata, and no information on the structure of the files are included in the data.

- Then, we need to know what sort of structure the data are stored in beforehand, otherwise we can only guess.

- Sometimes guessing is possible because we can calculate the shape of the arrays based on the size of the file.

- In this case I generated the file beta_temp.dbl, so I know it has a 2D array of 160 x 160 grid cells in double precission.


# 2. NetCDF

This is the favourite format of Earth scientists, particularly climate/weather researchers.

- This is a self-describing data format. This means it does not need a descriptor file to be read, but all the necessary metadata to understand the file structure and content is embedded in the format.


- It contains three sections:


    - DIMENSIONS, which indicates how the data are organised in the file.

    - VARIABLES, which contains information on both the data values and metadata, including some attributes such as units, each variable can have different shapes (e.g. 3D or 2D).

    - GENERAL INFORMATION, which displays the main file properties, such as version of the library, date, and copyright information.


Let us see the metadata of this file:

# 3. Hierarchical Data Format v5 (HDF5)

This is also a self-descriptive format, which is widely used in many physics sub-fields because of its versatility.

- Data stored here can be very large!


- Data don't have to be of the same type, you can save numbers, units, strings, images, etc, all in the same file.


- It uses the structure of a file directory, organising information in GROUPS.


- GROUPS are then comprised of data fields with their own metadata.

## 4. Visualization ToolKit (VTK) format

https://vtk.org/

This format is an open-source data format, developed by Kitware, and widely used in computational fluid dynamics and computer graphics applications. There are two sub-formats, legacy and XML.

### VTK file structure:

1. File version and ID.

2. Header, comments with informaton on the dataset.

3. Data type, which can be Binary or ASCII.

4. Domain structure, mesh/grid information:

- DATASET
    - STRUCTURED_GRID
    - UNSTRUCTURED_GRID
    - RECTILINEAR_GRID
    - POLYDATA
    - FIELD
    
- Coordinates, Dimensions, Grid spacing.

5. Attributes

- Values stored at grid cells, e.g. scalars, vectors, tensors.

Imagen tomada de: http://victorsndvg.github.io/FEconv/formats/vtk.xhtml

#![](https://drive.google.com/uc?id=1SZR76Q19ixrM5olFiYhw-w17oaAccqQ3)

We define the input directory and file name:

As we can see the data consists of a 3D grid of 100x100x100 cells/pixels, i.e. 1.e6 grid cells/pixels in total.

The data are organised in 6 arrays:

    Number Of Arrays: 6
    Array 0 name = rho
    Array 1 name = vx1
    Array 2 name = vx2
    Array 3 name = vx3
    Array 4 name = prs
    Array 5 name = tr1
    
  Bounds: 
    Xmin,Xmax: (-5, 15)
    Ymin,Ymax: (-10, 10)
    Zmin,Zmax: (-10, 10)
  Compute Time: 0
  Dimensions: (101, 101, 101)
  X Coordinates: 0x7fbefd475a70
  Y Coordinates: 0x7fbefd476cf0
  Z Coordinates: 0x7fbefd476fc0
  Extent: 0, 100, 0, 100, 0, 100    
    
Which we can now read with:

# 3D Array manipulation

# 5. Flexible Image Transport System (FITS) format

This is the preferred data format used in observational astronomy.

- It is mainly used as a standard format to share astronomical images (see Helga's talk).


- But it can also contain tables or cubes, e.g. position-position-velocity diagrams.


- It has a header with metadata related to the image.


- It is much simpler than the previous formats we checked above, but very practical.