<a href="https://colab.research.google.com/github/victorianieto/iniciacion-git-y-github/blob/master/02_full_usage_arlington.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Quickstart: Full end-to-end use of CARTOFrames v 1.03b

Hi! Glad to see you made it to the Quickstart guide! In this guide you are introduced to how CARTOframes can be used by data scientists in spatial analysis workflows. Using bike share data, this guide walks through some common steps a data scientist takes to answer the following question: are the company's bike share stations placed in optimal locations?

Before you get started, we encourage you to have your environment ready so you can get a feel for the library by using it. If you don’t have your environment set-up yet, check out this guide first. You will need:

- A python notebook environment
- The CARTOframes, geopandas and pySAL libraries installed

### Spatial analysis scenario

Let's say you work for a bike share company in Arlington, Virginia and you want to better understand how your stations around the city are being used, and if these stations are placed in optimal locations.

To begin, let's outline a workflow: 

- Get and explore your company's data
- Discover and enrich data thanks to the CARTO catalog
- Analyse if the current bike stations are placed in optimal locations
- And finally, share the results of your analysis with your team

Let's get started!

### Get and explore your company's data

[This](./arlington_bikeshare_july_agg.csv) is the dataset you have to start your exploration. It contains information about the bike stations around the city of Arlington. As a first exploratory step, you read it into a Jupyter Notebook using a [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [2]:
%pip install cartoframes==1.0b3

Collecting cartoframes==1.0b3
[?25l  Downloading https://files.pythonhosted.org/packages/20/be/814b7e27b774c57234dbe08cd4bea3d2dc1ebe748bd9d602c3b0c2f76e17/cartoframes-1.0b3-py2.py3-none-any.whl (189kB)
[K     |████████████████████████████████| 194kB 2.8MB/s 
[?25hCollecting google-cloud-bigquery<2.0,>=1.19.0
[?25l  Downloading https://files.pythonhosted.org/packages/b6/c6/bcfcb6c25e49d8ce10ccdf4358b7efef0fd729e0941e2d1966499a2fae2f/google_cloud_bigquery-1.21.0-py2.py3-none-any.whl (158kB)
[K     |████████████████████████████████| 163kB 47.3MB/s 
[?25hCollecting unidecode<2.0,>=1.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)
[K     |████████████████████████████████| 245kB 49.2MB/s 
Collecting tqdm<5.0,>=4.32.1
[?25l  Downloading https://files.pythonhosted.org/packages/e1/c1/bc1dba38b48f4ae3c4428aea669c5e27bd5a7642a74c8348451e0bd8ff86/tqdm-4.36.1-py2.

In [2]:
import pandas as pd

arlington_file = 'arlington_bikeshare_july_agg.csv'
bikeshare_df = pd.read_csv(arlington_file)
bikeshare_df.head(3)

Unnamed: 0,num_bike_dropoffs,num_bike_pickups,total_events,station_id,intersection
0,178,204,382,31000,"Eads St & 15th St S, Arlington, VA"
1,222,276,498,31001,"18th & Eads St, Arlington, VA"
2,839,710,1549,31002,"20th & Crystal Dr, Arlington, VA"


By only reading the data into a dataframe you aren't able to see at a glance where the stations are. So let's visualize it in a map!

The first thing you have to do is to transform the `intersection` column into geometries. This process is called geocoding and CARTO provides an easy way to do it (you can learn more about it in the [geocoding guide](https://cartoframes.readthedocs.io/en/v1.0b3/geocode.html)).

Before being able to do it, you have to log in to CARTO. You will need to create an API key and use the method `set_default_credentials` to create a session. If you haven't created an API key yet, check the [authentication guide](https://cartoframes.readthedocs.io/en/v1.0b3/credentials.html) to learn how to get it. In case your data is already geocoded, you can visualize it without a CARTO account.

Note: If you don't have an account yet, you can get a free account if you are a student or get a trial if you aren't.

In [3]:
import cartoframes
cartoframes.__version__

'1.0b3'

In [0]:
from cartoframes.auth import set_default_credentials

set_default_credentials(username='victoria-nieto', api_key='c37a5a52ba9326bd1116862682a574123f2b0c66')

Now, we are already prepared to geocode the dataframe:

In [9]:
from cartoframes.data.services import Geocode

gc = Geocode()
bikeshare_df, info = gc.geocode(bikeshare_df,
                                street='intersection',
                                state={'value': 'Virginia'},
                                city={'value': 'Arlington'},
                                country={'value': 'US'})
bikeshare_df.head(3)

Unnamed: 0,the_geom,num_bike_dropoffs,num_bike_pickups,total_events,station_id,intersection,carto_geocode_hash
0,0101000020E6100000EC17EC866D4353C0D9429083126E...,178.0,204.0,382.0,31000.0,"Eads St & 15th St S, Arlington, VA",736de2a5c001973775c36d3f9951b7bf
1,0101000020E6100000CC457C27664353C099F04BFDBC6D...,222.0,276.0,498.0,31001.0,"18th & Eads St, Arlington, VA",ad16ba659920f1cf520e4d13210a18c4
2,0101000020E61000008C2D0439284353C01A8BA6B3936D...,839.0,710.0,1549.0,31002.0,"20th & Crystal Dr, Arlington, VA",117c538b730ae4b9c2d839175131945a


Done! Now that the bike stations are geocoded, you will notice a new column called `the_geom` has been added. This column stores the geographic location of each bike station and it's used to plot each station's location on the map.

You can quickly visualize your geocoded dataframes using the [Map]() and [Layer]() classes. Check the [visualization guide]() to know all the visualization possibilities and check the [sources guide]() to know about which data sources are supported.

In [8]:
from cartoframes.viz import Map, Layer

Map([Layer(bikeshare_df,'width:12')])

AttributeError: ignored

Great! We have a map!

Now, you have a better sense about where the stations are. To continue with your exploration, you want to know which are the stations with more activity. To do so, you can use the `size_continuous_layer` visualization helper taking into account that the `total_events` column as the one that determines the activity:

In [10]:
from cartoframes.viz.helpers import size_continuous_layer

Map(size_continuous_layer(bikeshare_df, 'total_events', 'Pickups + dropoffs'))

AttributeError: ignored

Good job! Now, just taking a look, you can see where are the stations with more activity. Also, thanks to be using a helper, we get a legend out of it.

To learn more about visualizating your data, about how to add legends, pop-ups, widgets and how to do it faster thanks to helpers, check the [documentation on interative map](https://cartoframes.readthedocs.io/en/v1.0b3/maps.html#interactive-carto-vl-maps)  as well as the  [CARTO VL guide](https://carto.com/developers/carto-vl/guides/introduction/) for styling expressions. CARTO VL is a JavaScript library that interacts with different CARTO APIs to build custom apps leveraging vector rendering. 


### Discover and augment with external sources

You already know where your company stations are and their activity, now you want to know if they placed them in optimal locations. You start thinking about which data can be valuable and decide to check if there can be any correlation with data about households with no car. Let's see how CARTOframes can help you finding that data.

In [11]:
from cartoframes.data.clients import DataObsClient
from cartoframes.data import Dataset
from cartoframes.viz.helpers import color_continuous_layer

# Filter out Arlington
do = DataObsClient()
arlington_ct = do.boundaries(region=[-77.17232, 38.827447, -77.032086, 38.93428],
                             boundary='us.census.tiger.census_tract', decode_geom=True)
arlington_df = arlington_ct.dataframe[arlington_ct.dataframe.geom_refs.str.startswith('51013')]

Dataset(arlington_df).upload(table_name='arlington_ct', if_exists='replace')

# Augmenting it with % of no car households
do.augment('arlington_ct',
           [{"numer_id": "us.census.acs.B08201002", "denom_id": "us.census.acs.B11001001",
           "normalization": "denominated", "geom_id": 'us.census.tiger.census_tract' }],
           how='the_geom', persist_as='arlington_ct_no_cars')
Map(color_continuous_layer('arlington_ct_no_cars', 'no_cars_2011_2015_by_households', 'No Car Households'))

Nice! Thanks to our vizualization helper, we can already see which are the areas with the highest percentage of households with no cars. You can learn more about discovering and enriching your data in the [data guide]().

### Analyse if the current bike stations are placed in optimal locations

We can already suggest which are the areas where can make more sense to have a station looking at the ones that have more households with no cars. In this step, let's try to go a bit further and try to calculate which areas have significantly high or low numbers of them.

You decide to use a common algorithm to do this called [Moran's I](https://en.wikipedia.org/wiki/Moran%27s_I) In statistics, Moran's I is a measure of spatial autocorrelation. Spatial autocorrelation is characterized by a correlation in a signal among nearby locations in space . Check  [CARTO's Moran's I API](https://github.com/CartoDB/crankshaft/blob/develop/doc/02_moran.md) if Interested.  Also, you use [Queen]() as the algorithm to decide which areas are considered neighbours. Once you are done with your analysis (here we are just showing a simplified version of it), you assign a label with the significance level to each station.

In [12]:
import geopandas as gpd
from libpysal.weights import Queen
from pysal.explore.esda.moran import Moran_Local

arlington_no_car_df = gpd.GeoDataFrame(Dataset('arlington_ct_no_cars').download(decode_geom=True))
wq = Queen.from_dataframe(arlington_no_car_df)
wq.transform = 'r'
li = Moran_Local(arlington_no_car_df['no_cars_2011_2015_by_households'], wq)
sig = 1 * (li.p_sim < 0.05)
spot_qs = [1, 3, 2, 4] # HH(hotspot), LL(coldspot), LH(doughnut), HL(diamond)
spots = sum([i * (sig * li.q==i) for i in spot_qs])
spot_labels = ['Not significant', 'Hot spot', 'Low outlier', 'Cold spot', 'Hot outlier']
labels = [spot_labels[i] for i in spots]
arlington_no_car_df = arlington_no_car_df.assign(cl=labels)

ModuleNotFoundError: ignored

In [13]:
arlington_no_car_df.head(3)

NameError: ignored

You have finished your analysis and now you want to see the results in a map, 
plotting the significance of each area. Let's do it!

In [0]:
from cartoframes.viz.helpers import color_category_layer

hmap = ['#E4E4E4','#1785FB', '#F24440', '#12A2B8']
Map(
    [
        color_category_layer(arlington_no_car_df, 'cl', title='Significance', palette=hmap),
     size_continuous_layer(bikeshare_df, 'total_events', 'Pickups + Dropoffs')])

Awesome! You have finished with your analysis and see that your company has done a good job.

### Publish and share your results

To finish your work, you want to share the results with some teammates. Also, it would be great if you could allow them to play with the information. Let's do it!

First, you have to upload the data used in your map to CARTO using the [Dataset]() class:

In [14]:
Dataset(arlington_no_car_df).upload(table_name='arlington_ct_no_cars', if_exists='replace')


NameError: ignored

In [15]:
Dataset(bikeshare_df).upload(table_name='bikeshare_july_agg', if_exists='replace')

<cartoframes.data.dataset.dataset.Dataset at 0x7fb71fa1b780>

Now, let's add widgets so people are able to see some graphs of the information and filter it. To do this, we only have to add `widget=True` to the helpers. Remember to check the [visualization guide](https://cartoframes.readthedocs.io/en/v1.0b3/maps.html#widget-functions) to learn more.

In [16]:
final_map = Map([
    color_category_layer('arlington_ct_no_cars', 'cl', title='Significance', palette=hmap, widget=True),
    size_continuous_layer('bikeshare_july_agg', 'total_events', 'Pickups + Dropoffs', widget=True)
])
final_map

NameError: ignored

Cool! Now that you have a small dashboard to play with, let's publish it on CARTO so you are able to share it with anyone. To do this, you just need to call the [publish]() method from the [Map]() class. NOTE: if the map is private, you need to generate a maps_api_key with MAPS API Access for those specific datasets. You can do this in your dashboard. Profile -> Developer Settings -> API Keys. 

In [17]:
kuviz = final_map.publish('bikeshare', maps_api_key='womLQ5RYS1D8ZlCrs5gFIg')

NameError: ignored

In [0]:
print(kuviz['url'])

https://axa-group.carto.com/u/axagroup-admin/kuviz/8305ca2a-93e6-4fb4-a66a-8cd7ecc84ead


Congratulations! You have finished this guide and have a sense about how CARTOframes can speed up your workflow. To continue learning, you can check the specific guides, check the [reference]() to know everything about a class or a method or check the notebook [examples]().