# Starbucks locations
### Crossreferencing with company names

In this notebook we are going to tackle the following criterium:
- Executives like Starbucks A LOT. Ensure there's a starbucks not to far.
To do this, we have downloaded a dataset on starbucks locations from kaggle.com (https://www.kaggle.com/starbucks/store-locations). We are going to crossreference the starbucks locations with the shortlist of companies we have.

We can import the sb dataframe first, and try a bit of visualization.



In [2]:
import pandas as pd
sb = pd.read_csv("../input/starbuckslocations.csv")

In [3]:
sb.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720.0,GMT+1:00 Europe/Andorra,1.53,42.51
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51


In [4]:
import geopandas as gpd
import pandas as pd
from cartoframes.viz import Map, Layer
from cartoframes.viz.helpers import size_continuous_layer
from cartoframes.viz.widgets import histogram_widget
df = pd.read_csv('../output/gaming-designcompanies1M.csv')
comps = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))

In [5]:
sbs = gpd.GeoDataFrame(sb, geometry=gpd.points_from_xy(sb['Longitude'], sb['Latitude']))
sbs.dropna(subset=['geometry'], inplace=True)
sbs.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude,geometry
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720.0,GMT+1:00 Europe/Andorra,1.53,42.51,POINT (1.53000 42.51000)
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42,POINT (55.47000 25.42000)
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39,POINT (55.47000 25.39000)
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48,POINT (54.38000 24.48000)
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51,POINT (54.54000 24.51000)


In [103]:
Map(Layer(sbs[['geometry']][:1000])) # Turns out this error is something to do with too many starbucks being listed

In [7]:
# Map(Layer(comps,'color:red'))

Unfortunately, the visualization is throwing up errors, so we will instead go ahead with transforming the starbucks dataframe to geoindexes that we can use to query the mongo db.

#### We use the same function as before, modified slightly.

In [22]:
import math

def asGeoJSON(lat,lng):
    try:
        lat = float(lat)
        lng = float(lng)
        if not math.isnan(lat) and not math.isnan(lng):
            return {
                "type":"Point",
                "coordinates":[lng,lat]
            }
    except Exception:
        print("Invalid data")
        return None
        

sbs["location"] = sbs[["Latitude","Longitude"]].apply(lambda x:asGeoJSON(x.Latitude,x.Longitude), axis=1)
sbs.dropna(subset=['location'])

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude,geometry,location
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720,GMT+1:00 Europe/Andorra,1.53,42.51,POINT (1.53000 42.51000),"{'type': 'Point', 'coordinates': [1.53, 42.51]}"
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42,POINT (55.47000 25.42000),"{'type': 'Point', 'coordinates': [55.47, 25.42]}"
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39,POINT (55.47000 25.39000),"{'type': 'Point', 'coordinates': [55.47, 25.39]}"
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48,POINT (54.38000 24.48000),"{'type': 'Point', 'coordinates': [54.38, 24.48]}"
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51,POINT (54.54000 24.51000),"{'type': 'Point', 'coordinates': [54.54, 24.51]}"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25595,Starbucks,21401-212072,Rex,Licensed,"141 Nguyễn Huệ, Quận 1, Góc đường Pasteur và L...",Thành Phố Hồ Chí Minh,SG,VN,70000,08 3824 4668,GMT+000000 Asia/Saigon,106.70,10.78,POINT (106.70000 10.78000),"{'type': 'Point', 'coordinates': [106.7, 10.78]}"
25596,Starbucks,24010-226985,Panorama,Licensed,"SN-44, Tòa Nhà Panorama, 208 Trần Văn Trà, Quận 7",Thành Phố Hồ Chí Minh,SG,VN,70000,08 5413 8292,GMT+000000 Asia/Saigon,106.71,10.72,POINT (106.71000 10.72000),"{'type': 'Point', 'coordinates': [106.71, 10.72]}"
25597,Starbucks,47608-253804,Rosebank Mall,Licensed,"Cnr Tyrwhitt and Cradock Avenue, Rosebank",Johannesburg,GT,ZA,2194,27873500159,GMT+000000 Africa/Johannesburg,28.04,-26.15,POINT (28.04000 -26.15000),"{'type': 'Point', 'coordinates': [28.04, -26.15]}"
25598,Starbucks,47640-253809,Menlyn Maine,Licensed,"Shop 61B, Central Square, Cnr Aramist & Coroba...",Menlyn,GT,ZA,181,,GMT+000000 Africa/Johannesburg,28.28,-25.79,POINT (28.28000 -25.79000),"{'type': 'Point', 'coordinates': [28.28, -25.79]}"


In [23]:
sbs.head() # Now we have added the location column to the dataframe, and we can use an apply 

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude,geometry,location
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720.0,GMT+1:00 Europe/Andorra,1.53,42.51,POINT (1.53000 42.51000),"{'type': 'Point', 'coordinates': [1.53, 42.51]}"
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42,POINT (55.47000 25.42000),"{'type': 'Point', 'coordinates': [55.47, 25.42]}"
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39,POINT (55.47000 25.39000),"{'type': 'Point', 'coordinates': [55.47, 25.39]}"
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48,POINT (54.38000 24.48000),"{'type': 'Point', 'coordinates': [54.38, 24.48]}"
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51,POINT (54.54000 24.51000),"{'type': 'Point', 'coordinates': [54.54, 24.51]}"


We need to create a function that that takes the location field, and queries the companies database in the following format:
```json
{
   <location field>: {
     $near: {
       $geometry: {
          type: "Point" ,
          coordinates: [ <longitude> , <latitude> ]
       },
       $maxDistance: <distance in meters>,
       $minDistance: <distance in meters>
     }
   }
}
````

In [24]:
from pymongo import MongoClient
client = MongoClient("mongodb://localhost/companies")
db = client.get_database()
c = "companies_wlocation" # to be able to call the db as db[comp]

In [67]:
over1m = db[c].find({"$and": [{"description_1": {"$regex": "game*|design"}}, {"total_money_raised": {"$regex": "M" }}]})
len(list(over1m))


After some debugging, it turns out it is not that easy to make a new collection from the find query. Instead we will save the search query, and use it together with the $near operator.

In [69]:
search = {"$and": [{"description_1": {"$regex": "game*|design"}}, {"total_money_raised": {"$regex": "M" }}]}

We create a function to iterate through the location fields in the dataframe, as follows.

In [108]:
def queryNear(point):
    return {"location": {"$near": {"$geometry": point, "$maxDistance": 2000, "$minDistance": 0} } }

In [109]:
compsnearSB = []
for i,_ in enumerate(sbs.location):
    try:
        compsnearSB.append(list(over1m.collection.find({"$and": [queryNear(sbs.location[i]), search]}, {"name":1, "_id":0})))
    except:
        continue

len(compsnearSB)

25599

In [110]:
compsnearSB

# Next we flatten the list, and remove duplicates

flatcompsnearSB = [val for sublist in compsnearSB for val in sublist]
names = [[value for value in dict.values()] for dict in flatcompsnearSB]
flatnames = [val for sublist in names for val in sublist]
unique = set(flatnames)
# namescomps = [e for e in flatcompsnearSB]
# uniquecomps = set(namescomps)

#### And here we have the list of companies that are within 2k of a starbucks

In [111]:
compsnexttoSB = list(unique)
len(compsnexttoSB)

41

In [114]:
unique = list(unique)
unique

['Curse',
 'PlaySpan',
 'NetLogic Microsystems',
 '99designs',
 'Pikum',
 'Realtime Worlds',
 'Geewa',
 'Playfish',
 'Social Gaming Network',
 'PlayFirst',
 'Smule',
 'Outspark',
 'Zynga',
 'Bigpoint',
 'Unkasoft Advergaming',
 'crowdSPRING',
 'Challenge Games',
 'TwoFish',
 'Owlient',
 'Cellufun',
 'Altobeam',
 'GotGame',
 'GamerDNA',
 'OMGPOP',
 'Crispy Gamer',
 'Turbine',
 'WeeWorld',
 'Grockit',
 'Minted',
 'FlowPlay',
 'Akoha',
 'Riot Games',
 'RocketOn',
 'Virgin Play',
 'eRepublik',
 'SCVNGR',
 'Sometrics',
 'Three Rings',
 'Double Fusion',
 'MocoSpace',
 'PurePlay']

In [117]:
dfcompsnearSB = df[df['name'].isin(unique)]

In [119]:
dfcompsnearSB.to_csv("../output/companies-gaming-highlyfunded-starbucks.csv", index=False)

And with that we have our dataset of companies that meet 3 of the criteria saved (near a starbucks, gaming/design and raised over 1 million dollars)