## DETERMINANTS OF ANTENATAL CARE VISITS IN KENYA

### Introduction
This report aims to assess the determinants of antenatal care visits using the Poisson and negative binomial regression models. The dependent variable will be antenatal care (ANC) visits which is a count variable. Whereas WHO recommends a minimum of 8 visits, Kenya continues to lag behind with an average of just 4 visits. A statistical analysis of these determinants will highlight which are most impactful thus laying a foundation for accurate and evidence-based recoommendations for policy formulation and improvement of the country's health and specifically maternal sector.

In [2]:
#Import necessary packages
import pandas as pd
# import pyreadstat
import numpy as np
import statsmodels.api as sm
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns 

In [3]:
#yet to understand this bit - chatgpt assisted
import re
import pandas as pd

# === Step 1: Parse the .SAS file ===
sas_file = "KENR8CFL.SAS"
with open(sas_file, "r", encoding="latin-1") as f:
    lines = f.readlines()

colspecs = []
names = []
inside_input = False

# Extract variable definitions from the INPUT block
for line in lines:
    if line.strip().upper().startswith("INPUT"):
        inside_input = True
        continue
    if inside_input:
        if ";" in line:
            break  # End of INPUT block
        match = re.search(r"@(\d+)\s+(\w+)\s+([$]?)(\d+)\.0", line)
        if match:
            start = int(match.group(1)) - 1  # Convert to 0-based indexing
            name = match.group(2)
            width = int(match.group(4))
            end = start + width
            colspecs.append((start, end))
            names.append(name)

# Optional fix if first variable (e.g., CASEID) was missed
if names and names[0] != "CASEID":
    colspecs.insert(0, (0, 15))
    names.insert(0, "CASEID")

# === Step 2: Read the .DAT file using colspecs ===
dat_file = "KENR8CFL.DAT"
df = pd.read_fwf(dat_file, colspecs=colspecs, names=names, encoding="latin-1")

# === Step 3: Preview or use the DataFrame ===
print(df.head())


     CASEID  PIDX  V001  V002  V003  V004     V005  V006  V007  V008  ...  \
0  1   4  2     1     1     4     2     1  1296049     4  2022  1468  ...   
1  1   7  2     1     1     7     2     1  1296049     4  2022  1468  ...   
2  1  55  2     1     1    55     2     1  1296049     4  2022  1468  ...   
3  1  55  2     2     1    55     2     1  1296049     4  2022  1468  ...   
4  1  65  2     1     1    65     2     1  1296049     4  2022  1468  ...   

   SDV30BN  SDV30BX  SDV30BZ  SDV35A  SREDUC  SPEDUC  IDX94  IDX94P  S446  \
0      NaN      NaN      NaN     0.0       0     0.0      1       1   2.0   
1      NaN      NaN      NaN     NaN       2     3.0      0       1   NaN   
2      NaN      NaN      NaN     0.0       2     4.0      1       1   NaN   
3      NaN      NaN      NaN     0.0       2     4.0      2       2   NaN   
4      NaN      NaN      NaN     NaN       2     2.0      1       1   1.0   

   S454A  
0  203.0  
1    NaN  
2  206.0  
3    NaN  
4  307.0  

[5 rows

In [4]:
#convert file to csv for easier working
df.to_csv("KENR8CFL.csv", index=False, encoding="utf-8")

In [5]:
#read the dataframe using pd.read_csv() function
projectdf = pd.read_csv("KENR8CFL.csv")
projectdf

Unnamed: 0,CASEID,PIDX,V001,V002,V003,V004,V005,V006,V007,V008,...,SDV30BN,SDV30BX,SDV30BZ,SDV35A,SREDUC,SPEDUC,IDX94,IDX94P,S446,S454A
0,1 4 2,1,1,4,2,1,1296049,4,2022,1468,...,,,,0.0,0,0.0,1,1,2.0,203.0
1,1 7 2,1,1,7,2,1,1296049,4,2022,1468,...,,,,,2,3.0,0,1,,
2,1 55 2,1,1,55,2,1,1296049,4,2022,1468,...,,,,0.0,2,4.0,1,1,,206.0
3,1 55 2,2,1,55,2,1,1296049,4,2022,1468,...,,,,0.0,2,4.0,2,2,,
4,1 65 2,1,1,65,2,1,1296049,4,2022,1468,...,,,,,2,2.0,1,1,1.0,307.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13179,1691 104 2,1,1691,104,2,1691,3269876,5,2022,1469,...,,,,0.0,4,4.0,1,1,,203.0
13180,1691 104 2,2,1691,104,2,1691,3269876,5,2022,1469,...,,,,0.0,4,4.0,2,2,,
13181,1692 18 2,1,1692,18,2,1692,8074191,5,2022,1469,...,,,,,4,3.0,1,1,,101.0
13182,1692 18 2,2,1692,18,2,1692,8074191,5,2022,1469,...,,,,,4,3.0,2,2,,
