# YData Quality - Data Errors Tutorial
Time-to-Value: 4 minutes

This notebook provides a tutorial for the ydata_quality package funcionality for data errors.

**Structure:**

1. Load dataset
2. Distort dataset
3. Instantiate the Data Quality engine
4. Run the quality checks
5. Assess the warnings
6. (Extra) Detailed overview

In [None]:
import statsmodels.api as sm
from ydata_quality.data_errors import DataErrorSearcher

## Load the example dataset
We will use a dataset available from the statsmodels package.

In [None]:
df = sm.datasets.get_rdataset('Guerry', 'HistData').data

## Distort the original dataset
Apply transformations to highlight the data quality functionalities.

In [None]:
# Duplicate the first 20 rows
df = df.append(df[:20], ignore_index=True)

In [None]:
# Duplicate the dept column
df["dept2"] = df["dept"]

## Create the engine
Each engine contains the checks and tests for each suite. To create a DataErrorSearcher, you provide:
- df: target DataFrame, for which we will run the test suite

In [None]:
des = DataErrorSearcher(df=df)

### Full Evaluation
The easiest way to assess the data quality analysis is to run `.evaluate()` which returns a list of warnings for each quality check. 

In [None]:
results = des.evaluate()
results.keys()

## Check the status
After running the data quality checks, you can check the warnings for each individual test. The warnings are suited by priority and have additional details that can provide better insights for Data Scientists.

In [None]:
des.report()

### Quality Warning

In [None]:
# Get a sample warning
sample_warning = des.warnings[1]

In [None]:
# Check the details
sample_warning.test, sample_warning.description, sample_warning.priority

In [None]:
# Retrieve the relevant data from the warning
sample_warning_data = sample_warning.data

## Full Test Suite
In this section, you will find a detailed overview of the available tests in the data errors module of ydata_quality.