## Bubble Chart 
https://plot.ly/python/bubble-charts/

A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles, and additional dimension of the data is represented in the size of the bubbles and color.

### Importing packages

In [1]:
!pip install plotly

Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/3c/7e/bafb51ecd654a16f593beceb8e4e4069e33fccf47a3d071e0ef9b821a694/plotly-3.9.0-py2.py3-none-any.whl (41.2MB)
[K     |████████████████████████████████| 41.2MB 365kB/s  eta 0:00:01     |████████████████████            | 25.7MB 41.0MB/s eta 0:00:01
[?25hCollecting retrying>=1.3.3 (from plotly)
  Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py) ... [?25ldone
[?25h  Stored in directory: /Users/jananiravi/Library/Caches/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-3.9.0 retrying-1.3.3
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import pandas as pd
import plotly
import plotly.graph_objs as go
import plotly.offline as offline

offline.init_notebook_mode(connected=True)

In [3]:
plotly.__version__

'3.9.0'

##### Regular 2D scatter plot

In [4]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers')

data = [trace]

offline.iplot(data)

#### Representing a 3rd dimension of data in the same scatter plot

In [5]:
z = [25, 100, 75, 50]

In [6]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers',
                    
                    marker = dict(size = z)
                   )

data = [trace]

offline.iplot(data)

#### Representing a 4th dimension in the same plot and use color to represent it

In [7]:
i = [5, 6, 8, 4]

In [8]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers',
                    
                    marker = dict(size = z,
                                  color = i,
                                  colorscale = 'Portland',
                                  showscale = True)
                   )

In [9]:
data = [trace]
offline.iplot(data)

#### Loading the dataset
Here we are loading the csv format data that we saved in the previous demo

In [63]:
housing_data = pd.read_csv('datasets/housing.csv')

housing_data.sample(10)

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
9140,-118.27,34.46,10.0,2184.0,405.0,1119.0,370.0,4.7437,294000.0,INLAND
2264,-119.81,36.78,36.0,1650.0,313.0,660.0,298.0,3.0,79700.0,INLAND
13143,-121.54,38.29,47.0,1396.0,254.0,630.0,218.0,2.8616,92500.0,INLAND
5643,-118.31,33.74,22.0,5042.0,974.0,2260.0,935.0,4.3472,351200.0,NEAR OCEAN
17022,-122.31,37.52,24.0,2328.0,335.0,969.0,354.0,7.7364,435800.0,NEAR OCEAN
127,-122.21,37.84,44.0,3424.0,597.0,1358.0,597.0,6.0194,292300.0,NEAR BAY
12327,-116.51,33.96,16.0,4913.0,1395.0,2518.0,1132.0,1.4665,61100.0,INLAND
5734,-118.23,34.16,31.0,3105.0,582.0,1359.0,547.0,5.1718,429100.0,<1H OCEAN
5907,-118.44,34.29,35.0,2606.0,447.0,1555.0,404.0,4.6864,193800.0,<1H OCEAN
383,-122.16,37.74,47.0,824.0,223.0,533.0,166.0,2.625,98200.0,NEAR BAY


In [64]:
housing_data.shape

(20640, 10)

##### dataset is very big so we are taking only 1% length of the dataset

In [65]:
housing_data = housing_data.sample(frac=0.07).reset_index(drop=True)

housing_data.shape

(1445, 10)

##### showing unique values of ocean_proximity

In [66]:
housing_data['ocean_proximity'].unique()

array(['NEAR BAY', 'NEAR OCEAN', '<1H OCEAN', 'INLAND'], dtype=object)

##### plotting the bubble chart where the size of bubble is related to households and color is related to housing_median_age

In [67]:
trace = go.Scatter(x = housing_data['median_income'],
                   y = housing_data['median_house_value'],
                   
                   mode = 'markers',
                   
                   marker = dict(
                                 size = housing_data['total_rooms'],
                                 sizeref = 500,
                                
                                 color = housing_data['housing_median_age'],
                                 colorscale = 'Jet',
                                 showscale = True))

In [68]:
data = [trace]

layout = go.Layout(height = 600,
                   width = 900,
                  
                   title = 'Housing Data',
                   hovermode = 'closest')

In [69]:
fig = go.Figure(data = data,
                layout = layout)

offline.iplot(fig)