# TDDA: Test-Driven Data Analysis

In diesem Notebook werden wir eine Python-Bibliothek [TDDA](https://github.com/tdda/tdda) genauer anschauen, die Dateneingaben (wie NumPy-Arrays oder Pandas DataFrames) verwendet und eine Reihe von _Constraints_ um diese herum erstellt. Ihr könnt dann Eure _Constraints_ speichern (JSON-Ausgabe) und neue Daten anhand der beobachteten _Constraints_ testen.

## 1. Importe

In [1]:
import pandas as pd
import numpy as np
from tdda.constraints.pdconstraints import discover_constraints, \
    verify_df

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/kjam/data-cleaning-101/master/data/iot_example.csv')

## 2. Daten überprüfen

In [3]:
df.sample(10)

Unnamed: 0,timestamp,username,temperature,heartrate,build,latest,note
81618,2017-02-03T03:25:53,bjohnson,10,70,81a841f7-4129-76b9-6473-0bf73d9c1b8f,0,interval
109648,2017-02-14T07:41:47,morgan82,9,83,9302b31f-c6f4-44f8-a8c2-6ec921f913ed,0,wake
35197,2017-01-15T13:23:44,bethhobbs,18,71,fe10410c-ffc1-a542-1433-c4c5eb79561e,0,sleep
24643,2017-01-11T08:08:32,mhernandez,23,67,57c09f66-3ab7-d64d-42ea-d49a8be683f5,0,wake
39545,2017-01-17T07:09:46,caitlynsandoval,26,72,11868075-6e6e-d1e1-127d-b7c771021db5,1,interval
1672,2017-01-02T04:08:54,kingchristine,15,79,42f12825-7ee8-4499-f823-5b634fdf19c2,1,update
83012,2017-02-03T16:48:46,rachel73,5,84,90353288-66fe-f809-726d-28f44263d0ab,0,test
75779,2017-01-31T19:13:50,ryangutierrez,19,81,b4d7f7b8-ceab-4417-5e9e-0545fc9d742d,0,wake
29645,2017-01-13T08:12:48,michael02,8,83,4cd4563f-d8b9-6d7d-482b-715bbfbd9754,1,test
11272,2017-01-06T00:08:46,kempandrea,16,69,dc10a912-4d52-4aef-74b3-c8873ec801c2,1,interval


In [4]:
df.dtypes

timestamp      object
username       object
temperature     int64
heartrate       int64
build          object
latest          int64
note           object
dtype: object

## 3. Erstellen eines _constraint_-Objekt mit `discover_constraints`

In [5]:
constraints = discover_constraints(df)

In [6]:
constraints

<tdda.constraints.base.DatasetConstraints at 0x11de13240>

In [7]:
constraints.fields

Fields([('timestamp', <tdda.constraints.base.FieldConstraints at 0x11de134a8>),
        ('username', <tdda.constraints.base.FieldConstraints at 0x11de13780>),
        ('temperature',
         <tdda.constraints.base.FieldConstraints at 0x11de13908>),
        ('heartrate', <tdda.constraints.base.FieldConstraints at 0x11de136d8>),
        ('build', <tdda.constraints.base.FieldConstraints at 0x11de13b70>),
        ('latest', <tdda.constraints.base.FieldConstraints at 0x11de13da0>),
        ('note', <tdda.constraints.base.FieldConstraints at 0x11de13ef0>)])

## 4. Schreiben der _Constraints_ in eine Datei

In [8]:
with open('../../data/ignore-iot_constraints.tdda', 'w') as f:
    f.write(constraints.to_json())

In [9]:
cat ../../data/ignore-iot_constraints.tdda

{
    "creation_metadata": {
        "local_time": "2020-07-06 14:14:33",
        "utc_time": "2020-07-06 12:12:33",
        "creator": "TDDA 1.0.31",
        "host": "eve.local",
        "user": "veit",
        "n_records": 146397,
        "n_selected": 146397
    },
    "fields": {
        "timestamp": {
            "type": "string",
            "min_length": 19,
            "max_length": 19,
            "max_nulls": 0,
            "no_duplicates": true
        },
        "username": {
            "type": "string",
            "min_length": 3,
            "max_length": 21,
            "max_nulls": 0
        },
        "temperature": {
            "type": "int",
            "min": 5,
            "max": 29,
            "sign": "positive",
            "max_nulls": 0
        },
        "heartrate": {
            "type": "int",
            "min": 60,
            "max": 89,
            "sign": "positive",
            "max_nulls": 0
        },
        "