# Social Netowrk Analysis Of Births In The United States

This Notebook Processes Data Of Births In The US With The Goal To Locate The Importers And Exporters Of Childbirth.

### Imports

This Cell Contains All Needed Imports To Run This Analysis.

In [1]:
import numpy as np
import networkx as nx
import pandas as pd
from tqdm import tqdm

# Creating Mock File

This Cell Can Create A Mock File. Due To Confidentiality Reasons, The Raw Data Cannot Be Uploaded. However, The Data Can Be Obtained From Making A Request For It Here: https://www.cdc.gov/nchs/nvss/index.htm

*Note: A Mock File Has Already Been Created And Uploaded For Use*

In [2]:
def generateMockData():
    """
    Creates Mock File For Analysis

    Returns:
    None: Saves The Result To A CSV File.
    """

    # Load The FIPS codes
    dfDummy = pd.read_csv('./county_codes.csv')

    fips_codes = dfDummy['FIPS'].values

    # Generate Random FIPS Codes For 'OC_County_FIPS' And 'MR_County_FIPS'
    mock_data = pd.DataFrame({
        'OC_County_FIPS': np.random.choice(fips_codes, size=100),
        'MR_County_FIPS': np.random.choice(fips_codes, size=100),
    })

    # Save The Mock Data To A CSV File
    mock_data.to_csv('mock_data.csv', index=False)


### Building Graph From Input Data

This Cell Creates A Graph To Use For The Analysis.
- `buildGraph` Function To Build A Graph Using Distinct Pairs As Nodes, And All Pairs As Edges.


In [3]:
def buildGraph(dataframe):
    """
    Creates Graph From File Source - Destination Pairs

    Args:
    dataframe (pd.DataFrame): Table Of Source - Destination Pairs 
    
    Returns:
    nx.DiGraph: A Graph Object Created From Pairs And Numbers Of Pairs.
    """

    nodes = dataframe[["OC_County_FIPS", "MR_County_FIPS"]].drop_duplicates().values.flatten()
    edges = list(zip(dataframe["OC_County_FIPS"], dataframe["MR_County_FIPS"]))

    G = nx.DiGraph()
    G.add_nodes_from(nodes)
    G.add_edges_from(edges)
    return G

### Runing The Social Network Analysis

This Cell Will Run The Analysis With The Created Graph.

- `salsa_algorithm` Runs SALSA With Convergence Criteria

- `process_data_and_run_algorithms` Runs HITS And Calls The SALSA Function.


In [6]:
# SALSA Algorithm (Convergence Criteria Can Be Changed)
def salsa_algorithm(G, max_iter=100, tol=1.0e-6):
    """
    Runs SALSA Social Network Analysis

    Args:
    G (nx.DiGraph): Directed Graph Of Pairs
    max_iter (int): Number Of Iterations
    tol (float): Tolerence Level
    
    Returns:
    : A Graph Object Created From Pairs And Numbers Of Pairs.
    """



    nodes = G.nodes()

    hubs = {node: 1 for node in nodes}
    authorities = {node: 1 for node in nodes}

    for _ in tqdm(range(max_iter), desc="Running SALSA"):  
        # Authority Update Step
        new_authorities = {}
        for node in nodes:
            new_authorities[node] = sum(
                hubs[neighbor] for neighbor in G.predecessors(node)
            )

        # Hub Update Step
        new_hubs = {}
        for node in nodes:
            new_hubs[node] = sum(
                authorities[neighbor] for neighbor in G.successors(node)
            )

        # Normalize
        auth_sum = sum(new_authorities.values())
        hub_sum = sum(new_hubs.values())

        for node in nodes:
            new_authorities[node] = new_authorities[node] / auth_sum
            new_hubs[node] = new_hubs[node] / hub_sum

        # Check For Convergence
        err = sum(
            abs(new_authorities[node] - authorities[node]) for node in nodes
        ) + sum(abs(new_hubs[node] - hubs[node]) for node in nodes)
        if err < tol:
            break

        hubs, authorities = new_hubs, new_authorities

    return hubs, authorities

def process_data_and_run_algorithms(dataframe):
    G = buildGraph(dataframe)
    salsa_hubs, salsa_authorities = salsa_algorithm(G)
    hits_hubs, hits_authorities = nx.hits(G, max_iter=100, tol=1.0e-6)
    nodes = list(G.nodes())
    salsa_hub_scores = [salsa_hubs[node] for node in nodes]
    salsa_authority_scores = [salsa_authorities[node] for node in nodes]
    hits_hub_scores = [hits_hubs[node] for node in nodes]
    hits_authority_scores = [hits_authorities[node] for node in nodes]

    dfScores = pd.DataFrame({
        "County": nodes,
        "SALSA_Hub_Score": salsa_hub_scores,
        "SALSA_Authority_Score": salsa_authority_scores,
        "HITS_Hub_Score": hits_hub_scores,
        "HITS_Authority_Score": hits_authority_scores,
    })
    return dfScores

# Example Usage With A Mock Dataframe:
df = pd.read_csv("mock_data.csv")
scores = process_data_and_run_algorithms(df)
scores.to_csv("scores.csv", index=False)


Running SALSA: 100%|██████████| 100/100 [00:00<00:00, 5384.22it/s]
