# Share the Insight

There are two main insights we want to communicate. 
- Bangalore is the largest market for Onion Arrivals. 
- Onion Price variation has increased in the recent years.

Let us explore how we can communicate these insight visually.

## Preprocessing to get the data

In [1]:
# Import the library we need, which is dplyr and ggplot2
library(dplyr)
library(ggplot2)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



In [3]:
# Read the csv file of Monthwise Quantity and Price csv file we have.
df <- read.csv('MonthWiseMarketArrivals_clean.csv')

In [5]:
# Fix the date
df$date = as.Date(paste("01-",df$date,sep=""), "%d-%B-%Y")

In [13]:
# Get the data for year 2015 and sort
df2015 <- df %>% 
          filter(year == 2015) %>%
          group_by(city) %>%
          summarize(quantity_year = sum(quantity)) %>%
          arrange(desc(quantity_year))

In [14]:
head(df2015)

Unnamed: 0,city,quantity_year
1,BANGALORE,8267060
2,MAHUVA,5113510
3,SOLAPUR,4162041
4,PUNE,3591209
5,LASALGAON,3581359
6,PIMPALGAON,3455265


## Let us plot the Cities in a Geographic Map

In [None]:
# Load the geocode file
dfGeo = pd.read_csv('city_geocode.csv')

In [None]:
dfGeo.head()

### PRINCIPLE: Joining two data frames

There will be many cases in which your data is in two different dataframe and you would like to merge them in to one dataframe. Let us look at one example of this - which is called left join

![](../img/left_merge.png)

In [None]:
dfCityGeo = pd.merge(df2015City, dfGeo, how='left', on=['city', 'city'])

In [None]:
dfCityGeo.head()

In [None]:
dfCityGeo.plot(kind = 'scatter', x = 'lon', y = 'lat', s = 100)

We can do a crude aspect ratio adjustment to make the cartesian coordinate systesm appear like a mercator map

In [None]:
dfCityGeo.plot(kind = 'scatter', x = 'lon', y = 'lat', s = 100, figsize = [10,11])

In [None]:
# Let us at quanitity as the size of the bubble
dfCityGeo.plot(kind = 'scatter', x = 'lon', y = 'lat', s = dfCityGeo.quantity, figsize = [10,11])

In [None]:
# Let us scale down the quantity variable
dfCityGeo.plot(kind = 'scatter', x = 'lon', y = 'lat', s = dfCityGeo.quantity/1000, figsize = [10,11])

In [None]:
# Reduce the opacity of the color, so that we can see overlapping values
dfCityGeo.plot(kind = 'scatter', x = 'lon', y = 'lat', s = dfCityGeo.quantity/1000, alpha = 0.5, figsize = [10,11])

### Exercise - Can you plot all the States by quantity in (pseudo) geographic map