# Querying Materials Data from the Materials Project using Pymatgen

In this tutorial, we will learn how to **query material data** from the [Materials Project](https://next-gen.materialsproject.org/) database using **pymatgen**, a powerful Python library for materials science.


By the end of this tutorial, you will be able to:
- Set up your **Materials Project API key**
- Query materials by **formula**, **elements**, or **material ID**
- Retrieve **structural**, **thermodynamic**, and **electronic** properties
- Download `.cif` files for visualization and simulations
- Perform a **sneak peek** into structure manipulation using pymatgen

In the previous tutorials we have become familiar with Materials Project and Pymatgen. Now we are going to use them together.

## Step 1 - Import Necessary Libraries

In [1]:
# Import main tools
from mp_api.client import MPRester
from pymatgen.core import Structure

# We will also use pandas to make the data more readable
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Step 2 - Setting Up the API Key

To query the Materials Project database, you need an **API key**. This is necessary to make any request to the Materials Project API and there's a unique key for each Materials Project account.

```{important}
**You should not share your API key with anyone.**
```

**Steps to get you API key:**
1. Go to [Materials Project](https://next-gen.materialsproject.org/)

2. Log in or create a free account. This step was done when we were introducing Materials Project. If you need more guidance, please revisit the tutorial: [Exploring Materials Project](../materials_project/exploring_materials_project.ipynb)

3. We are going to the main page of Materials Project after logging in, which should look like this:

    <img src="../_static/images/querying/1_mp_mainpage.png" width=700 style=margin:auto/>

4. Then we go to the top right and click on the API button and it will shows us this new page:

    <img src="../_static/images/querying/2_api.jpg" width=700 style=margin:auto/>

Your personal API key should appear where the red line is located in the image. You can select and copy it, and then paste it in the following cell as indicated:

```{note}
To run this notebook, you **must** provide your own Materials Project API key.
Log in as indicated in [Materials Project](https://next-gen.materialsproject.org/) → Dashboard → API Key.
Replace `"YOUR_API_KEY_HERE"` in the code cell below.


In [None]:
# Replace with your own API key! (keep the '' signs)
API_KEY = "YOUR_API_KEY_HERE"
mpr = MPRester(API_KEY)

## Step 3 - Querying by Chemical Formula

Let us start with a simple query: searching for all materials with the formula $\text{CaTiO}_3$.

`MPRester.query()` lets us specify:
- The **criteria** (what we are looking for)
- The **properties** (what we want return)

For now, we will extract:
- Material ID (`material_id`) $\rightarrow$ Unique identifier used by Materials Project
- Formula (`formula_pretty`) $\rightarrow$ Simplified chemical formula
- Formation energy per atom (`formation_energy_per_atom`) $\rightarrow$ Energy required to form the compound (eV/atom)
- Band gap (`band_gap`) $\rightarrow$ Band gap energy in electronvolts (eV)

There are plenty of properties to retrieve from the query. Here is a list of all of them:

['builder_meta', 'nsites', 'elements', 'nelements', 'composition', 'composition_reduced', 'formula_pretty', 'formula_anonymous', 'chemsys', 'volume', 'density', 'density_atomic', 'symmetry', 'property_name', 'material_id', 'deprecated', 'deprecation_reasons', 'last_updated', 'origins', 'warnings', 'structure', 'task_ids', 'uncorrected_energy_per_atom', 'energy_per_atom', 'formation_energy_per_atom', 'energy_above_hull', 'is_stable', 'equilibrium_reaction_energy_per_atom', 'decomposes_to', 'xas', 'grain_boundaries', 'band_gap', 'cbm', 'vbm', 'efermi', 'is_gap_direct', 'is_metal', 'es_source_calc_id', 'bandstructure', 'dos', 'dos_energy_up', 'dos_energy_down', 'is_magnetic', 'ordering', 'total_magnetization', 'total_magnetization_normalized_vol', 'total_magnetization_normalized_formula_units', 'num_magnetic_sites', 'num_unique_magnetic_sites', 'types_of_magnetic_species', 'bulk_modulus', 'shear_modulus', 'universal_anisotropy', 'homogeneous_poisson', 'e_total', 'e_ionic', 'e_electronic', 'n', 'e_ij_max', 'weighted_surface_energy_EV_PER_ANG2', 'weighted_surface_energy', 'weighted_work_function', 'surface_anisotropy', 'shape_factor', 'has_reconstructed', 'possible_species', 'has_props', 'theoretical', 'database_Ids']


In [3]:
# Query CaTiO3 structures
results = mpr.materials.summary.search(
    formula="CaTiO3",
    fields=[
        "material_id",
        "formula_pretty",
        "formation_energy_per_atom",
        "band_gap"
    ]
)

# Convert results to DataFrame
df = pd.DataFrame([
    {
        "material_id": r.material_id,
        "formula": r.formula_pretty,
        "formation_energy": r.formation_energy_per_atom,
        "band_gap": r.band_gap
    }
    for r in results
])

df

Retrieving SummaryDoc documents: 100%|██████████| 6/6 [00:00<00:00, 51046.30it/s]


Unnamed: 0,material_id,formula,formation_energy,band_gap
0,mp-3442,CaTiO3,-3.538691,2.2434
1,mp-754701,CaTiO3,-3.527734,3.5706
2,mp-5827,CaTiO3,-3.49162,1.8285
3,mp-4019,CaTiO3,-3.556087,2.3053
4,mp-1205364,CaTiO3,-3.544638,2.1378
5,mp-556003,CaTiO3,-3.544675,2.1586


## Step 4 - Querying by Material ID

Suppose we want to retrieve detailed information for one specifi material. We will use its **Materials Project ID**

In [4]:
# Pick the first material ID from our query
material_id = df["material_id"].iloc[0]
print("Using material ID:", material_id)

# Fetch the structure directly
structure = mpr.get_structure_by_material_id(material_id)
structure

Using material ID: mp-3442


Retrieving MaterialsDoc documents: 100%|██████████| 1/1 [00:00<00:00, 7557.30it/s]


Structure Summary
Lattice
    abc : 5.441143344248945 5.441143344248945 5.441143344248945
 angles : 120.84549073210634 120.84549073210634 88.54170096725916
 volume : 112.41312515722095
      A : np.float64(-2.68573015) np.float64(2.68573015) np.float64(3.89611961)
      B : np.float64(2.68573015) np.float64(-2.68573015) np.float64(3.89611961)
      C : np.float64(2.68573015) np.float64(2.68573015) np.float64(-3.89611961)
    pbc : True True True
PeriodicSite: Ca (2.686, 0.0, 1.948) [0.25, 0.75, 0.5]
PeriodicSite: Ca (-2.22e-16, 2.686, 1.948) [0.75, 0.25, 0.5]
PeriodicSite: Ti (0.0, 0.0, 3.896) [0.5, 0.5, -0.0]
PeriodicSite: Ti (0.0, 0.0, 0.0) [0.0, -0.0, -0.0]
PeriodicSite: O (-1.028, 1.658, 3.896) [0.8087, 0.3087, 0.1174]
PeriodicSite: O (1.028, 1.658, 2.22e-16) [0.3087, 0.1913, 0.5]
PeriodicSite: O (1.658, 1.028, 3.896) [0.6913, 0.8087, 0.5]
PeriodicSite: O (3.713, 1.028, 7.015e-17) [0.1913, 0.6913, 0.8826]
PeriodicSite: O (-1.11e-16, 1.11e-16, 5.844) [0.75, 0.75, -0.0]
PeriodicSite:

## Step 5 - Downloading the Structure as a `.cif` File

Once we have the structure, we can save it locally in `.cif` format, which could be use for visualization, just like the previous tutorial, or simulations.

In [5]:
# Save CIF file
structure.to(filename="CaTiO3_from_MP.cif")
print("CIF file saved as 'CaTiO3_from_MP.cif'")

CIF file saved as 'CaTiO3_from_MP.cif'


## Step 6 - Exploring Structure Properties

Just as we learned on the previous tutorial, we can apply that knowledge and inspect the structure properties.

- **Lattice parameters** (a, b, c)
- **Angles** (α, β, γ)
- **Number of sites**
- **Atomic species**

In [6]:
# Lattice parameters
lattice = structure.lattice
print("Lattice parameters (Å):", lattice.abc)
print("Lattice angles (°):", lattice.angles)
print("Number of sites:", len(structure))
print("Atomic species:", structure.species)

Lattice parameters (Å): (5.441143344248945, 5.441143344248945, 5.441143344248945)
Lattice angles (°): (120.84549073210634, 120.84549073210634, 88.54170096725916)
Number of sites: 10
Atomic species: [Element Ca, Element Ca, Element Ti, Element Ti, Element O, Element O, Element O, Element O, Element O, Element O]


## Step 7 (optional) - Teaser: Structure Manipulation

In materials science, it's often necessary to modify atomic structures to simulate real-world phenomena or design new materials. Two common manipulations are:

- **Vacancy creation** → Removing an atom from the structure

    - Useful for studying defects, diffusion, and electronic properties.

- **Substitution** → Replacing one atom with a different element
    - Used to model doping, alloying, and tuning material properties like band gaps or magnetism.

These techniques allow researchers to predict how structural modifications influence material behavior, helping design more efficient semiconductors, batteries, catalysts, and more.

### Code Example: Creating Vacancies and Substitutions

Pymatgen lets us manipulate structures in powerful ways.  
Here’s a **quick teaser**:
- Create a **vacancy** by removing an atom
- Substitute one atom for another

In [8]:
print("Original structure:")
print(structure)

# --- 1. Creating a vacancy ---
# Let's remove the first calcium atom
vacancy_structure = structure.copy()
vacancy_structure.remove_sites([0])  # Index of the atom to remove

print("\nStructure after creating a Ca vacancy:")
print(vacancy_structure)

Original structure:
Full Formula (Ca2 Ti2 O6)
Reduced Formula: CaTiO3
abc   :   5.441143   5.441143   5.441143
angles: 120.845491 120.845491  88.541701
pbc   :       True       True       True
Sites (10)
  #  SP           a          b          c    magmom
---  ----  --------  ---------  ---------  --------
  0  Ca    0.25       0.75       0.5             -0
  1  Ca    0.75       0.25       0.5             -0
  2  Ti    0.5        0.5       -0               -0
  3  Ti    0         -0         -0               -0
  4  O     0.808697   0.308697   0.117394         0
  5  O     0.308697   0.191303   0.5              0
  6  O     0.691303   0.808697   0.5              0
  7  O     0.191303   0.691303   0.882606         0
  8  O     0.75       0.75      -0               -0
  9  O     0.25       0.25      -0               -0

Structure after creating a Ca vacancy:
Full Formula (Ca1 Ti2 O6)
Reduced Formula: CaTi2O6
abc   :   5.441143   5.441143   5.441143
angles: 120.845491 120.845491  88.541701

In [7]:
from pymatgen.transformations.standard_transformations import SubstitutionTransformation

# Substitute Ti with Zr
transformation = SubstitutionTransformation({"Ti": "Zr"})
substituted_structure = transformation.apply_transformation(structure)

print("Original species:", structure.species)
print("After substitution:", substituted_structure.species)

Original species: [Element Ca, Element Ca, Element Ti, Element Ti, Element O, Element O, Element O, Element O, Element O, Element O]
After substitution: [Element Ca, Element Ca, Element Zr, Element Zr, Element O, Element O, Element O, Element O, Element O, Element O]


## Step 8 - Summary

In this tutorial, we learned how to:
- Set up and authenticate with the **Materials Project API**
- Query materials by formula, elements, or material ID
- Retrieve properties like formation energy, band gap, and crystal system
- Save structures as `.cif` files
- Peek into **pymatgen’s structure manipulation capabilities**

---

### Next Steps

In the next tutorial, we’ll:
- Perform **data mining** using Materials Project queries
- Analyze **large sets of materials** to extract trends and correlations  
- Use plots and statistics to gain insights into **materials properties**

---
