# 🇺 Senate Elections Forecast by zip codes
______________________________
#### This is a demo notebook that shows how use **[Upgini](https://github.com/upgini)** for real use case
______________________________
##Train dataset

https://github.com/upgini/upgini/raw/main/notebooks/senate_elections/us_zip_elections.parquet



**Description**:
1.   2016, 2018, 2020 and 2022 Senate elections results by counties
2.   Target variable is which party get more votes in the county: 1 - GOP, 0 - DEM
3.   We use the results of last election in Senate as a variables:

*last_result_party* - difference between number of votes for GOP and DEM candiadate in last Senate elections for the county


*last_result_party* - which party get more votes in the county: 1 - GOP, 0 - DEM


*last_result_share* - difference between number of votes for GOP and DEM candiadate in last Senate elections for the county diveded to population of county


In [None]:
%pip install -Uq upgini
import pandas as pd

[K     |████████████████████████████████| 91 kB 5.4 MB/s 
[K     |████████████████████████████████| 1.6 MB 20.9 MB/s 
[K     |████████████████████████████████| 76.6 MB 1.2 MB/s 
[K     |████████████████████████████████| 2.0 MB 54.8 MB/s 
[K     |████████████████████████████████| 12.2 MB 64.4 MB/s 
[?25h

In [None]:
df_path = "https://github.com/upgini/upgini/raw/main/notebooks/senate_elections/us_zip_elections.parquet"
df = pd.read_parquet(df_path)
df=df[['country', 'postal_code', 'population', 'density','last_result',  'population_county','last_result_share', 'election_date', 'last_result_party', 'target_result_party']].copy()

In [None]:
# We'll use election results in last years as train
# We'll check the quality of prediction on the results of 2022 elections 
train=df.loc[df['election_date']<'2022-11-08']
train=train.drop_duplicates(subset=['postal_code'], keep='last')
test=df.loc[df['election_date']=='2022-11-08']

In [None]:
from upgini import FeaturesEnricher, SearchKey
from upgini.metadata import RuntimeParameters,CVType

enricher = FeaturesEnricher(
    search_keys={
        'country': SearchKey.COUNTRY,
        'postal_code': SearchKey.POSTAL_CODE,
        'election_date':SearchKey.DATE,      
    }, 
    cv=CVType.time_series,
)
enricher.fit(train.drop(['target_result_party'], axis=1), train.target_result_party,
             eval_set=[(test.drop(['target_result_party'], axis=1), test.target_result_party)],
             calculate_metrics=True)

<IPython.core.display.Javascript object>

Detected task type: ModelTaskType.BINARY



Column name,Status,Errors
country,All valid,-
postal_code,All valid,-
target,All valid,-
election_date,All valid,-



Running search request, search_id=7d134c87-22c5-4dba-b4dc-906ad7b9e0f1
We'll send email notification once it's completed, just use your personal api_key from profile.upgini.com
Done

[92m[1m
42 relevant feature(s) found with the search keys: ['country', 'postal_code', 'election_date'][0m


Provider,Source,Feature name,SHAP value,Coverage %,Type,Feature type
,,last_result_share,0.424737,100.0,numerical,
,,last_result,0.147414,100.0,numerical,
,,population_county,0.095969,100.0,numerical,
Upgini,Public/Comm. shared,f_location_country_postal_latitude_2e1eae46,0.041621,100.0,numerical,Free
Upgini,Public/Comm. shared,f_weather_country_date_postal_daylight_time_bea3cf0a,0.0376,21.44474,numerical,Free
Upgini,Public/Comm. shared,f_location_country_postal_longitude_585c92dc,0.03256,100.0,numerical,Free
Upgini,Public/Comm. shared,f_weather_country_date_postal_prcp_4d9ed1e1,0.020857,99.771653,numerical,Free
Upgini,Public/Comm. shared,f_weather_country_date_postal_delta_to_avg_snow_8bd64d2e,0.016698,97.970692,numerical,Free
Upgini,Public/Comm. shared,f_weather_country_date_postal_tobs_210f1c58,0.012317,91.723919,numerical,Free
Upgini,Public/Comm. shared,f_weather_country_date_postal_daylight_time_a79bd1f1,0.011364,78.55526,numerical,Free


Calculating metrics...
Done
[92m[1m
Quality metrics[0m


Unnamed: 0,Match rate,Baseline roc_auc,Enriched roc_auc,Uplift
,,,,
Train,100.0,0.915854,0.922462,0.006608
Eval 1,100.0,0.950074,0.956784,0.00671


In [None]:
df2=enricher.transform(test)

Column name,Status,Errors
country,All valid,-
postal_code,All valid,-
election_date,All valid,-



Running search request, search_id=f18ec22f-77bf-4213-ab01-7b124972823d
We'll send email notification once it's completed, just use your personal api_key from profile.upgini.com
Done

Retrieving selected features from data sources...
Done


In [None]:
df2.groupby(['target_result_party'])['f_weather_country_date_postal_delta_to_avg_prcp_69cb1eca'].mean()

target_result_party
0     9.072786
1    12.687661
Name: f_weather_country_date_postal_delta_to_avg_prcp_69cb1eca, dtype: float64

##Conclusion

1. The results of last elections is the most powerfull features
2. Population of county has a big predictive power
3. Dozens of external features from Upgini have a significant feature importance
______________________________
Thanks for reading! If you found this useful or interesting, please share with a friend.
______________________________
## 🔗 Useful links
* Upgini Library [Documentation](https://github.com/upgini/upgini#readme)
* More [Notebooks and Guides](https://github.com/upgini/upgini#briefcase-use-cases)
* [Feature importance](https://github.com/upgini/upgini#5--evaluate-feature-importances-shap-values-from-the-search-result) in Upgini
* Kaggle public [Notebooks](https://www.kaggle.com/romaupgini/code)


<sup>😔 Found mistype or a bug in code snippet? Our bad! <a href="https://github.com/upgini/upgini/issues/new?assignees=&title=readme%2Fbug">
Please report it here.</a></sup>