# Overview of Tools and Resources 

In this tutorial, students are introduced to a set of powerful yet accessible computational tools for materials data mining and analysis. Each tool plays a specific role in the overall workflow, and together they provide a seamless and reproducible environment for exploring real materials data. The activity is designed so that instructors and students with limited programming experience can follow along successfully. 

## Getting Started: Cloning and Setting up the Environment 

All of the materials for this tutorial are hosted in a public GitHub repository, an online platform used for sharing and collaborating on code. GitHub allows users to access all files, track changes, and contribute to open-source projects. Users can clone this repository, which means that they can copy it to their local machines and install all required packages using a single command. This step ensures compatibility by installing the exact versions used in our tested environment. This greatly reduces the likelihood of dependency conflicts that might arise from version mismatches. The option to run the entire tutorial using Google Colab is provided. It is a browser-based platform that requires no local installation. This is specifically helpful for students or instructors working on shared or restricted machines; moreover, for remote collaborative tasks. 

## Jupyter Notebooks: Our Computational Laboratory 

The tutorial is developed through Jupyter Notebooks, a platform that combines executable code, text, equations, and visualizations in a single, interactive document. Students can read explanations and run code cells, all within the same interface. This format supports exploratory learning and encourages active engagement with the material. Whether running locally or on Google Colab, Jupyter notebooks serve as an accessible entry point for hands-on scientific computing. 

## Python: The Language That Glues Everything Together 

Python is the programming language used throughout the tutorial. It is widely used in science and technology applications, regarded for its simplicity, readability, high efficiency, and large ecosystem of scientific libraries, with a community of thousands of users worldwide.​18–20​ Python connects all the other tools in this workflow. Students and/or instructors do not need prior programming experience to follow this tutorial. All code is explained step-by-step and annotated to encourage understanding. By the end of this activity, students will gain familiarity with Python syntax and basic data manipulation techniques, which are valuable skills in scientific research, academy, and industry. 

## Materials Project: a Database of Materials 

The Materials Project is an open-access online database. It is an international initiative to get free access for data analysis providing relevant information from thousands of materials, viz. electronic structure, geometry, thermodynamics, and spectral information, together with compositional analysis and the feasibility to calculate the properties of all inorganic materials​38​.Students can use this resource to explore real materials data and retrieve relevant information for their own analyses. In the tutorial, the Materials Project is accessed both through its website and using an Application Programming Interface (API), which is a tool that allows programs to communicate with external services. In this context, the API allows users to retrieve materials data directly from the Materials Project using Python commands, without having to download files manually from the website. This tool allows students to automate remotely data retrieval and perform custom searches. 

## Pymatgen: A Python Interface for Materials Data 

Pymatgen (Python Materials Genomics) is a Python library that acts as a bridge between the Materials Project and your Jupyter Notebook and/or Google Colab. With Pymatgen, students can query the Materials Project using an API key, which is a personal access code provided for free by the respective website. This key identifies the user and allows them to make a certain number of automated data requests per day, ensuring responsible use of the database. Then students retrieve detailed structural information and perform basic analyses. Pymatgen provides simple functions for working with structures, converting file formats, and visualizing data, making it an essential component in this tutorial. 

## Matminer: Data Mining for Materials Science 

Matminer is a Python Library developed to support data mining and machine learning in materials science. It provides access to curated datasets like the widely known Materials Project, in addition to Citrine and MDF. This suite of tools is used for converting material structures into numerical descriptors that can be used in statistical analysis or modeling. In this tutorial, the students use Matminer to explore material properties and identify trends across different compounds. This exposes them to data-driven inquiry methods and the basic concepts of materials informatics. 

## ASE (Atomic Simulation Environment): Structure Manipulation and Visualization 

ASE is a Python library designed for creating, modifying, and visualizing atomic structures, together with implementing minimization algorithms, among many other practical applications. In this tutorial, students/instructors use ASE to manipulate structures retrieved from the Materials Project, such as by introducing point defects. ASE also includes built-in visualization capabilities that allow students to see the atomic arrangements they are working with. These visual insights are helpful for reinforcing the connection between data and physical structure and making abstract concepts more tangible. 