# External Factors Analysis

Copyright ©2021-2022. Stephen Rigden. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

'External factors' are any data which is not captured by the Apple watch or the iPhone. It is user
created tabular data such as that produced by a spreadsheet. The 'use case' is the creation of date stamped cegories which can be used for categorizing health data from the phone.

This notebook is a demonstration of how to handle this kind of data. It assumes an input data set with
these columns:
- Date
- Medication A dose (integer)
- Medication B dose (integer)

It expects the file to be in csv format and located in the <project>/Data/Raw directory.

In [250]:
# Change the import_file_name to match the name of current import file.
import pandas


import_file_name = 'External Factors'
from pathlib import Path

In [251]:
# Set file paths
project_path = Path.cwd().parent.parent
extra_data_file = project_path / 'data' / 'raw' / f"{import_file_name}.csv"
extra_data_pickle = project_path / 'data' / 'processed' / 'extra_data_preprocessed.pickle'

# Get the raw data
eds = pandas.read_csv(extra_data_file)
eds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28 entries, 0 to 27
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        28 non-null     object 
 1   metoprolol  27 non-null     float64
 2   alcohol     16 non-null     float64
 3   notes       1 non-null      object 
dtypes: float64(2), object(2)
memory usage: 1.0+ KB


In [252]:
eds.head()

Unnamed: 0,date,metoprolol,alcohol,notes
0,12/1/2021,2.0,,
1,12/2/2021,2.0,177.0,
2,12/3/2021,2.0,,
3,12/4/2021,2.0,,
4,12/5/2021,2.0,30.0,


In [253]:
eds.tail()

Unnamed: 0,date,metoprolol,alcohol,notes
23,12/24/2021,1.0,112.0,
24,12/25/2021,1.0,28.0,
25,12/26/2021,1.0,,
26,12/27/2021,2.0,25.0,
27,12/28/2021,,,


### Convert float columns to integer

In [254]:
eds.alcohol = eds.alcohol.astype('Int64')
eds['alcohol'] = eds['alcohol'].fillna(0)
eds.metoprolol = eds.metoprolol.astype('Int64')
eds['metoprolol'] = eds['metoprolol'].fillna(0)
eds.head()

Unnamed: 0,date,metoprolol,alcohol,notes
0,12/1/2021,2,0,
1,12/2/2021,2,177,
2,12/3/2021,2,0,
3,12/4/2021,2,0,
4,12/5/2021,2,30,


In [255]:
eds.tail()

Unnamed: 0,date,metoprolol,alcohol,notes
23,12/24/2021,1,112,
24,12/25/2021,1,28,
25,12/26/2021,1,0,
26,12/27/2021,2,25,
27,12/28/2021,0,0,


### Convert date column to DateSeries

In [256]:
eds.date = eds.date.astype('datetime64[ns]')
eds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28 entries, 0 to 27
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date        28 non-null     datetime64[ns]
 1   metoprolol  28 non-null     Int64         
 2   alcohol     28 non-null     Int64         
 3   notes       1 non-null      object        
dtypes: Int64(2), datetime64[ns](1), object(1)
memory usage: 1.1+ KB


### Convert Notes column NaN's to empty strings

In [257]:
eds['notes'] = eds['notes'].fillna('')
eds.head()

Unnamed: 0,date,metoprolol,alcohol,notes
0,2021-12-01,2,0,
1,2021-12-02,2,177,
2,2021-12-03,2,0,
3,2021-12-04,2,0,
4,2021-12-05,2,30,


In [258]:
eds.tail()

Unnamed: 0,date,metoprolol,alcohol,notes
23,2021-12-24,1,112,
24,2021-12-25,1,28,
25,2021-12-26,1,0,
26,2021-12-27,2,25,
27,2021-12-28,0,0,


In [259]:
eds.to_pickle(extra_data_pickle)