In [1]:
import pandas as pd 
import numpy as np 
import zebra 
import os





## Partner and solver base data tables 

Get a data tables for
    - solve partner separate table 
    - individual solvers separate tables 
    - a dict of regions based on countries 
   
Combine all these into full tables and then you can generate a dataset similar to what MIT solve people have done 



## Partner data 

Partner data is one of the first core sheets of the dataset. It contains the partner preferences. The raw sheet data is being loaded into a dataframe. We fill the NaN values with 0 using fillna and then print the head of the dataframe. 

In [2]:
partners_df = zebra.csv_to_df("partner_data.csv")

# filled nan with 0 
partners_df = partners_df.fillna(0)
partners_df.head()
type(partners_df["Org"])

pandas.core.series.Series

## Solver team data
The same procedure that we applied to the partner data sheet. 

In [3]:
solver_df = zebra.csv_to_df("solver_team_data.csv")

# filled nan with 0 
solver_df = solver_df.fillna(0)
solver_df.head()

Unnamed: 0,Org,Geo 1,Geo 2,Geo 3,Geo 4,Geo 5,Geo 6,Geo 7,Stage,Key Need 1,Key Need 2,Key Need 3,Key Need 4,Key Need 5,Key Need 6,Key Need 7,Challenge,Technology
0,AIR-INK: Air-Pollution to ink,South Asia,US and Canada,0,0,0.0,0.0,0.0,Growth,Technology,0,0,0,0,0,0,Circular Economy,Biomimicry; Internet of Things
1,Algramo-Catalyzing Reusable Packaging,Latin America and the Caribbean,0,0,0,0.0,0.0,0.0,Growth,Other,Business model,0,0,0,0,0,Circular Economy,Behavioral Technology; Big Data; Internet of T...
2,BioCellection,US and Canada,0,0,0,0.0,0.0,0.0,Pilot,Technology,Distribution,Legal,"Marketing, Media, and Exposure",0,0,0,Circular Economy,Biotechnology / Bioengineering
3,Mycotech,East and Southeast Asia,0,0,0,0.0,0.0,0.0,Pilot,Distribution,Business model,"Marketing, Media, and Exposure",0,0,0,0,Circular Economy,Materials Science
4,Queen of Raw,US and Canada,Europe and Central Asia,South Asia,East and Southeast Asia,0.0,0.0,0.0,Pilot,Distribution,Technology,Financial,0,0,0,0,Circular Economy,Artificial Intelligence / Machine Learning; Bi...


# Generate geo sheet using partner and solver data 


### Get geo choices for solvers


In [4]:
solver_geo = zebra.solver_geo_df(solver_df)
solver_geo.head()

Unnamed: 0,AIR-INK: Air-Pollution to ink,Algramo-Catalyzing Reusable Packaging,BioCellection,Mycotech,Queen of Raw,Renewal Workshop,Rheaply,Xilinat,Aira,Elpis Solar,...,OneSky Caregiver Training,Tabshoura Tiny Thinkers,Blue Sky Analytics,CareMother,change:WATER Labs' iThrone: a waste-shrinking toilet,Faircap Clean Water,OmniVis,RAAJI,Salauno: Eye care for all,Shape-Up
Geo 1,South Asia,Latin America and the Caribbean,US and Canada,East and Southeast Asia,US and Canada,US and Canada,US and Canada,Latin America and the Caribbean,US and Canada,Sub-Saharan Africa,...,East and Southeast Asia,Middle East and North Africa,South Asia,South Asia,Sub-Saharan Africa,Latin America and the Caribbean,South Asia,South Asia,Latin America and the Caribbean,US and Canada
Geo 2,US and Canada,0,0,0,Europe and Central Asia,0,0,0,0,Europe and Central Asia,...,0,0,0,0,0,Sub-Saharan Africa,0,0,0,0
Geo 3,0,0,0,0,South Asia,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Geo 4,0,0,0,0,East and Southeast Asia,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Geo 5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Geo choice for partners

Get geo choices from the partners dataframe. 


In [5]:
partners_geo = zebra.partner_geo_df(partners_df)
partners_geo.head()

Unnamed: 0,Org,Geo Interests
0,Aditya Birla Group,US and Canada
1,Aditya Birla Group,Europe and Central Asia
2,Aditya Birla Group,East and Southeast Asia
3,BMW Foundation Herbert Quandt,US and Canada
4,BMW Foundation Herbert Quandt,Latin America and the Caribbean


## Pivot table for geo choices

Now that we have both the geo choices for the partners and the solvers, we need to generate a contingency table. We will use a pivot table to combine both geo-choices. 

The first step in generating the pivot table is to convert the solvers geo choices table to a list (not a python list). As you see above, the partners geo choices is a list with Org being the first column and the second column being geo interest.

We need to convert the solvers geo choices to a similar dataframe. This is done using the ```solver_regions_listform()```  function. This function use the pandas function melt to convert a contingency table to to a list form. This is the equivalent of converting a pivot table to an unpivoted table.


In [6]:
unpivoted_solver_geo = zebra.solver_regions_listform(solver_geo, solver_df)
unpivoted_solver_geo.head(20)

Unnamed: 0,geo,Org,match,geo_match
0,1,AIR-INK: Air-Pollution to ink,South Asia,"1,South Asia"
1,2,AIR-INK: Air-Pollution to ink,US and Canada,"2,US and Canada"
2,3,AIR-INK: Air-Pollution to ink,0,30
3,4,AIR-INK: Air-Pollution to ink,0,40
4,5,AIR-INK: Air-Pollution to ink,0,50.0
5,6,AIR-INK: Air-Pollution to ink,0,60.0
6,7,AIR-INK: Air-Pollution to ink,0,70.0
7,1,Algramo-Catalyzing Reusable Packaging,Latin America and the Caribbean,"1,Latin America and the Caribbean"
8,2,Algramo-Catalyzing Reusable Packaging,0,20
9,3,Algramo-Catalyzing Reusable Packaging,0,30


Once we have both geo preferences for solvers and partners, we generate a pivot table using the ```  pivot_table_geo()``` function. This function uses a outer join to combine the merge both the solver geo and partners geo tables into a single table. This table should be similar to the geo_match table from excel_to_csv folder.

We also have the option of exporting the pivot table to a csv file which we choose to set as False for now since we have already exported it the first time around. The head of the dataframe is also printed so you 
can take a look at some of the elements and compared.


In [7]:
_,geo_pivot_copy = zebra.pivot_table_geo(unpivoted_solver_geo,
                                         partners_geo,
                                         export=False)
geo_pivot_copy.head()

Org_x,AIR-INK: Air-Pollution to ink,Aira,Algramo-Catalyzing Reusable Packaging,BioCellection,Blue Sky Analytics,CareMother,Dost Education,EarlyBird,Elpis Solar,Faircap Clean Water,...,RevelaGov,Rheaply,Salauno: Eye care for all,Shape-Up,Supercivicos app,Tabshoura Tiny Thinkers,The Future is Offline,WheeLog!,Xilinat,change:WATER Labs' iThrone: a waste-shrinking toilet
Org_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aditya Birla Group,1,1,0,1,0,0,0,1,1,0,...,0,1,0,1,0,0,0,1,0,0
BMW Foundation Herbert Quandt,1,1,1,1,0,0,0,1,2,2,...,1,1,1,1,1,1,1,1,1,1
Buenos Aires Innovation Park (City of Buenos Aires Government),0,0,1,0,0,0,0,0,0,1,...,1,0,1,0,1,0,0,0,1,0
C L Sandberg & Associates,1,1,0,1,0,0,0,1,0,0,...,0,1,0,1,0,0,0,0,0,0
Capital One,1,1,0,1,0,0,0,1,0,0,...,0,1,0,1,0,0,0,0,0,0


The geographical preference feature requires a little more playing around with since there are multiple options for each solver and there are multiple regions for the partners. Another feature that is similar to geo preferences is the needs match which also has multiple options for a single solver or partner. 


# Needs match sheet

We need to follow the same process for generating the needs_match sheet as we did for geo preferences- 

1) Read in the sovler pivot table <br>
2) Melt it and get a list form  <br>
3) Read in the partner pivot table <br>
4) Merge both tables into another pivot table <br>


Its exactly the same game we played above hence we can concisely write it down.


In [8]:
partners_needs = zebra.get_partners_needs(partners_df)
partners_needs.head()

Unnamed: 0,Org,Needs
0,Aditya Birla Group,Business model
1,Aditya Birla Group,Distribution
2,Aditya Birla Group,Financial
3,Aditya Birla Group,Legal or Regulatory Matters
4,BMW Foundation Herbert Quandt,Other


In [9]:
unpivoted_solver_needs= zebra.get_solver_needs(solver_df)
unpivoted_solver_needs.head()

Unnamed: 0,key_needs,Org,match,needs_match
0,1,AIR-INK: Air-Pollution to ink,Technology,"1,Technology"
1,2,AIR-INK: Air-Pollution to ink,,20
2,3,AIR-INK: Air-Pollution to ink,,30
3,4,AIR-INK: Air-Pollution to ink,,40
4,5,AIR-INK: Air-Pollution to ink,,50


In [18]:
needs_values, needs_pivot_copy = zebra.pivot_table_needs(unpivoted_solver_needs,
                  partners_needs,
                  export=False)
needs_pivot_copy.head()

Org_x,AIR-INK: Air-Pollution to ink,Aira,Algramo-Catalyzing Reusable Packaging,BioCellection,Blue Sky Analytics,CareMother,Dost Education,EarlyBird,Elpis Solar,Faircap Clean Water,...,RevelaGov,Rheaply,Salauno: Eye care for all,Shape-Up,Supercivicos app,Tabshoura Tiny Thinkers,The Future is Offline,WheeLog!,Xilinat,change:WATER Labs' iThrone: a waste-shrinking toilet
Org_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aditya Birla Group,0,1,1,1,1,2,1,0,2,2,...,2,1,2,1,2,1,2,3,2,2
BMW Foundation Herbert Quandt,0,0,1,0,0,1,0,0,1,0,...,0,1,0,0,0,0,0,0,1,0
Buenos Aires Innovation Park (City of Buenos Aires Government),0,0,0,1,1,1,0,1,0,0,...,1,0,1,0,0,0,0,0,0,0
C L Sandberg & Associates,1,0,1,2,2,2,2,0,2,3,...,3,1,2,1,3,1,2,3,2,2
Capital One,1,0,1,3,3,3,4,2,2,3,...,4,1,4,1,3,1,3,3,3,2


# Generate challanges match sheet data

Matching challenges column is a bit easier since all solvers come through a single challenge even if partners are willing to judge/help solvers from multiple challenges. Here too we follow the same process- 

1) Get partner challenges data <br>
2) Get solvers challenges data <br>
3) Generate a pivot table combining the partner and solvers data <br>


In [10]:
ch_partners_challenges = zebra.get_ch_partners(partners_df)
ch_partners_challenges.head()

Unnamed: 0,Org,Challenge
0,Aditya Birla Group,Community-Driven Innovation
1,Aditya Birla Group,Circular Economy
2,BMW Foundation Herbert Quandt,Community-Driven Innovation
3,BMW Foundation Herbert Quandt,Healthy Cities
4,BMW Foundation Herbert Quandt,Circular Economy


In [11]:
ch_solver = zebra.get_ch_solvers(solver_df)
ch_solver.head()

Unnamed: 0,Org,Challenge
0,AIR-INK: Air-Pollution to ink,Circular Economy
1,Algramo-Catalyzing Reusable Packaging,Circular Economy
2,BioCellection,Circular Economy
3,Mycotech,Circular Economy
4,Queen of Raw,Circular Economy


In [12]:
challenges_pivot, challenges_pivot_copy = zebra.pivot_table_challenges(ch_solver, ch_partners_challenges)
challenges_pivot_copy.head()

Org_x,AIR-INK: Air-Pollution to ink,Aira,Algramo-Catalyzing Reusable Packaging,BioCellection,Blue Sky Analytics,CareMother,Dost Education,EarlyBird,Elpis Solar,Faircap Clean Water,...,RevelaGov,Rheaply,Salauno: Eye care for all,Shape-Up,Supercivicos app,Tabshoura Tiny Thinkers,The Future is Offline,WheeLog!,Xilinat,change:WATER Labs' iThrone: a waste-shrinking toilet
Org_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aditya Birla Group,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
BMW Foundation Herbert Quandt,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Buenos Aires Innovation Park (City of Buenos Aires Government),1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
C L Sandberg & Associates,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
Capital One,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


# Generate stage match sheet 

Same game as the challenges sheet data. Again the solvers have only one option for stage- 

1) Get partner stage data <br>
2) Get solvers stage data <br>
3) Generate a pivot table combining the partner and solvers data <br>

In [13]:
st_partners = zebra.get_st_partners(partners_df)
st_partners.head()

Unnamed: 0,Org,Stage
0,Aditya Birla Group,Growth
1,Aditya Birla Group,Scale
2,BMW Foundation Herbert Quandt,Scale
3,Buenos Aires Innovation Park (City of Buenos A...,Prototype
4,C L Sandberg & Associates,Prototype


In [14]:
st_solver = zebra.get_st_solver(solver_df)
st_solver.head()

Unnamed: 0,Org,Stage
0,AIR-INK: Air-Pollution to ink,Growth
1,Algramo-Catalyzing Reusable Packaging,Growth
2,BioCellection,Pilot
3,Mycotech,Pilot
4,Queen of Raw,Pilot


In [15]:
_,stage_pivot_copy = zebra.pivot_table_stage(st_solver, st_partners)
stage_pivot_copy.head()

Org_x,AIR-INK: Air-Pollution to ink,Aira,Algramo-Catalyzing Reusable Packaging,BioCellection,Blue Sky Analytics,CareMother,Dost Education,EarlyBird,Elpis Solar,Faircap Clean Water,...,RevelaGov,Rheaply,Salauno: Eye care for all,Shape-Up,Supercivicos app,Tabshoura Tiny Thinkers,The Future is Offline,WheeLog!,Xilinat,change:WATER Labs' iThrone: a waste-shrinking toilet
Org_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aditya Birla Group,1,1,1,0,0,1,0,0,0,0,...,0,1,1,0,0,1,1,0,0,0
BMW Foundation Herbert Quandt,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Buenos Aires Innovation Park (City of Buenos Aires Government),0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1
C L Sandberg & Associates,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,1
Capital One,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1


# Combining all the answers 

Once we have all the sheets, we combine them using the same heuristic calculation MIT did. This is given by the variable  ``` total_score ``` . Finally we write the result to an excel file. 


In [19]:
total_score = ((geo_pivot_copy.astype(int)*stage_pivot_copy.astype(int))*100) + (challenges_pivot_copy.astype(int)*10 ) + needs_pivot_copy
total_score.to_excel("total_score.xlsx")