# Philippines
Dataset from the Republic of Philippine (PH) Department of Health (DOH).

In [302]:
import import_ipynb, patient

## Reading the Data
We will convert this CSV dataset to a Pandas `DataFrame`.

In [303]:
import pandas as pd

data = pd.read_csv(filepath_or_buffer="datasets/DOH COVID Data Drop_ 20200521 - 04 Case Information.csv")

In [304]:
def apply_date(data, column):
    data[column] = pd.to_datetime(data[column], format='%Y-%m-%d')
    pass

apply_date(data, "DateRepConf")
apply_date(data, "DateDied")
apply_date(data, "DateRecover")
apply_date(data, "DateRepRem")
apply_date(data, "DateOnset")

The following is a sample of the data:

In [305]:
print(data["Admitted"].unique())
print(data["Pregnanttab"].unique())
print(data["HealthStatus"].unique())

rejects = [
    "CaseCode",
    "AgeGroup",
    "Pregnanttab",
]
data = data.drop(columns=rejects)
display(data)

[nan 'Yes' 'No']
[nan 'No' 'Yes']
['Mild' 'Recovered' 'Asymptomatic' 'Died' 'Severe' 'Critical']


Unnamed: 0,Age,Sex,DateRepConf,DateDied,DateRecover,RemovalType,DateRepRem,Admitted,RegionRes,ProvRes,CityMunRes,CityMuniPSGC,HealthStatus,Quarantined,DateOnset
0,53.0,Female,2020-05-11,NaT,NaT,,NaT,,Region IV-A: CALABARZON,LAGUNA,CITY OF SAN PEDRO,PH043425000,Mild,,NaT
1,31.0,Male,2020-04-12,NaT,NaT,Recovered,2020-05-11,Yes,NCR,METRO MANILA,CITY OF PARAÑAQUE,PH137604000,Recovered,No,2020-03-23
2,33.0,Female,2020-05-19,NaT,NaT,,NaT,,Repatriate,,,,Mild,,NaT
3,58.0,Male,2020-03-29,NaT,NaT,,NaT,No,NCR,METRO MANILA,CITY OF MANDALUYONG,PH137401000,Mild,,2020-03-14
4,35.0,Male,2020-05-21,NaT,NaT,,NaT,,Region VII: Central Visayas,CEBU PROVINCE,,,Mild,,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13429,54.0,Female,2020-04-27,NaT,NaT,,NaT,,NCR,METRO MANILA,CITY OF MANDALUYONG,PH137401000,Mild,,NaT
13430,38.0,Male,2020-04-29,NaT,NaT,,NaT,,Repatriate,,,,Mild,,NaT
13431,50.0,Female,2020-04-13,NaT,NaT,,NaT,,NCR,METRO MANILA,CITY OF MANDALUYONG,PH137401000,Mild,,2020-03-29
13432,87.0,Male,2020-05-12,NaT,NaT,,NaT,,NCR,METRO MANILA,,,Mild,,NaT


## Checking the Data

We need a JSON schema to validate our data.

The properties that interest us most are:
 - `Age`
 - `Sex` (gender)
 - `DateRepConf` (date rep confirmed)
 - `DateDied` (deceased)
 - `DateRecover`
 - ``

Fortunately, the source of this dataset provided descriptions of each property

Unfortunately, we have determined that the `Pregnanttab` (pregnant) property is unreliable because it is either `None` or `"No"`, and thus does not specify whether the patient was either confirmed or unknown of being pregnant.

In [306]:
schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "Age": {
                "anyOf": [
                    {
                        "type": "number",
                        "minimum": 0
                    },
                    {
                        "type": "null"
                    }
                ]
            },
            "Sex": {
                "enum": [
                    "Male",
                    "Female"
                ]
            },
            "RemovalType": {
            },
            "Admitted": {
            },
            "CityMuniPSGC": {
            },
            "HealthStatus": {
            },
            "Quarantined": {
            },
            "DateOnset": {
                "type": "integer"
            }
        }
    }
}

Now that we have the schema, we will convert our `DataFrame` to JSON data so that we can perform validation.

Note that we could have used the `pandas_schema` package (https://pypi.org/project/pandas-schema/).

In [307]:
import json

json_data = json.loads(data.to_json(orient="records"))

Note that the orientation must be set to records since that is the format which our schema follows.

Now that we have the JSON data and schema, we simply check if our data is valid.

In [308]:
import jsonschema

jsonschema.validate(json_data, schema)

ValidationError: None is not of type 'integer'

Failed validating 'type' in schema['items']['properties']['DateOnset']:
    {'type': 'integer'}

On instance[0]['DateOnset']:
    None

> No errors should be raised here if the data is valid.

## 