<a href="https://colab.research.google.com/github/waltz2u/bd/blob/master/Iowa_Liquor_Retail_Sales.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The dataset Iowa Liquor Retail Sales (https://console.cloud.google.com/marketplace/details/iowa-department-of-commerce/iowa-liquor-sales)

# Before you begin


1.   Use the [Cloud Resource Manager](https://console.cloud.google.com/cloud-resource-manager) to Create a Cloud Platform project if you do not already have one.
2.   [Enable billing](https://support.google.com/cloud/answer/6293499#enable-billing) for the project.
3.   [Enable BigQuery](https://console.cloud.google.com/flows/enableapi?apiid=bigquery) APIs for the project.


### Provide your credentials to the runtime

In [1]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


## Optional: Enable data table display

Colab includes the ``google.colab.data_table`` package that can be used to display large pandas dataframes as an interactive data table.
It can be enabled with:

In [0]:
%load_ext google.colab.data_table

If you would prefer to return to the classic Pandas dataframe display, you can disable this by running:
```python
%unload_ext google.colab.data_table
```

# Use BigQuery via magics

The `google.cloud.bigquery` library also includes a magic command which runs a query and either displays the result or saves it to a variable as a `DataFrame`.

In [14]:
# Display query output immediately

%%bigquery --project bigquery-207917
SELECT * FROM `bigquery-public-data.iowa_liquor_sales.sales` LIMIT 100;

Unnamed: 0,invoice_and_item_number,date,store_number,store_name,address,city,zip_code,store_location,county_number,county,category,category_name,vendor_number,vendor_name,item_number,item_description,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale_dollars,volume_sold_liters,volume_sold_gallons
0,INV-05069500006,2017-05-22,2604,Hy-Vee Wine and Spirits / Lemars,1201 12th Ave SW,Lemars,51031,POINT (-96.18335000000002 42.778257),75,Plymouth,1701100,Temporary & Specialty Packages,260,DIAGEO AMERICAS,13062,Crown Royal w/Alternate Bag,12,750,15.07,22.61,36,813.78,27.00,7.13
1,S14187900008,2013-10-04,3859,Wal-Mart 0750 / Independence,302 ENTERPRISE DR SW,INDEPENDENCE,50644,POINT (-91.892924 42.450709),10,Buchanan,1701100,DECANTERS & SPECIALTY PACKAGES,260,Diageo Americas,2870,Bailey's Original Irish Cream w/2 Glasses,6,750,13.00,19.50,30,585.00,22.50,5.94
2,S27423900062,2015-08-19,2512,Hy-Vee Wine and Spirits / Iowa City,1720 WATERFRONT DR,IOWA CITY,52240,POINT (-91.53046300000001 41.642764),52,Johnson,1081330,PEACH SCHNAPPS,434,Luxco-St Louis,84457,Paramount Peach Schnapps,12,1000,5.42,8.13,48,390.24,48.00,12.68
3,S32937300026,2016-06-21,3420,Sam's Club 6344 / Windsor Heights,1101 73rd Street,Windsor Heights,50311,POINT (-93.718027 41.599172),77,Polk,1081317,GRAPE SCHNAPPS,65,Jim Beam Brands,82637,Dekuyper Grape Pucker,12,1000,7.87,11.81,132,1558.92,132.00,34.87
4,S10452500062,2013-02-06,2648,Hy-Vee #4 / WDM,555 S 51ST ST,WEST DES MOINES,50265,POINT (-93.773557 41.561197),77,Polk,1081030,COFFEE LIQUEURS,370,Pernod Ricard USA/Austin Nichols,67526,Kahlua Coffee Liqueur,12,750,10.50,16.49,60,989.40,45.00,11.89
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,S21781600001,2014-10-15,3660,Wal-Mart 2935 / Knoxville,814 W BELL AVE,KNOXVILLE,50138,POINT (-93.106728 41.306176),63,Marion,1701100,DECANTERS & SPECIALTY PACKAGES,65,Jim Beam Brands,788,Red Stag w/4-50mls,6,950,11.03,16.55,6,99.30,5.70,1.51
96,INV-24068600002,2019-12-20,9041,S&B Farmstead Distillery,212 E Ramsey St.,Bancroft,50517,POINT (-94.215446 43.292947),55,KOSSUTH,1091100,American Distilled Spirit Specialty,578,S&B Farmstead Distillery,77054,Private First Class,12,750,12.96,19.44,36,699.84,27.00,7.13
97,INV-24152600164,2019-12-24,2623,Hy-Vee Food Store #4 / Sioux City,2827 Hamilton Blvd,Sioux City,51104,POINT (-96.417783 42.519886),97,WOODBURY,1081100,Coffee Liqueurs,370,PERNOD RICARD USA,67522,Kahlua Coffee Mini,12,50,6.60,9.90,1,9.90,0.05,0.01
98,INV-24187800005,2019-12-27,2588,Hy-Vee Food and Drug #6 / Cedar Rapids,4035 Mt Vernon Rd SE,Cedar Rapids,52403,POINT (-91.60978 41.976835),57,LINN,1062100,Gold Rum,035,BACARDI USA INC,43034,Bacardi Gold,24,375,4.50,6.75,3,20.25,1.12,0.29


In [0]:
# Save output in a variable `df`

%%bigquery --project bigquery-207917 df
SELECT 
  COUNT(*) as total_rows
FROM `bigquery-public-data.iowa_liquor_sales.sales`

In [16]:
df

Unnamed: 0,total_rows
0,17926603


# Use BigQuery through google-cloud-bigquery

See [BigQuery documentation](https://cloud.google.com/bigquery/docs) and [library reference documentation](https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html).



### Declare the Cloud project ID which will be used throughout this notebook

In [0]:
project_id = 'bigquery-207917'

### Sample approximately 2000 random rows

In [17]:
from google.cloud import bigquery

client = bigquery.Client(project=project_id)

sample_count = 2000
row_count = client.query('''
  SELECT 
    COUNT(*) as total
  FROM `bigquery-public-data.iowa_liquor_sales.sales`''').to_dataframe().total[0]

df = client.query('''
  SELECT
    *
  FROM
    `bigquery-public-data.iowa_liquor_sales.sales`
  WHERE RAND() < %d/%d
''' % (sample_count, row_count)).to_dataframe()

print('Full dataset has %d rows' % row_count)

Full dataset has 17926603 rows


### Describe the sampled data

In [18]:
df.describe()

Unnamed: 0,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale_dollars,volume_sold_liters,volume_sold_gallons
count,2129.0,2129.0,2129.0,2129.0,2129.0,2129.0,2129.0,2129.0
mean,12.298262,908.07891,9.765078,14.662823,10.27008,128.07302,9.674382,2.554782
std,7.424988,487.297429,7.923287,11.886057,22.923538,315.178876,34.889472,9.216977
min,1.0,50.0,0.89,1.34,1.0,1.34,0.05,0.01
25%,6.0,750.0,5.25,7.88,3.0,31.26,1.75,0.46
50%,12.0,750.0,7.92,11.88,6.0,72.0,5.25,1.39
75%,12.0,1000.0,11.99,17.99,12.0,135.66,10.5,2.77
max,48.0,3000.0,130.0,195.0,600.0,8952.0,1050.0,277.38


### View the first 10 rows

In [19]:
df.head(10)

Unnamed: 0,invoice_and_item_number,date,store_number,store_name,address,city,zip_code,store_location,county_number,county,category,category_name,vendor_number,vendor_name,item_number,item_description,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,sale_dollars,volume_sold_liters,volume_sold_gallons
0,INV-15356800007,2018-10-30,3651,Wal-Mart 1491 / Indianola,1500 North Jefferson,Indianola,50125,,91,WARREN,1701100,Temporary & Specialty Packages,65,Jim Beam Brands,522,Makers Mark w/Holiday Ornament,6,750,17.5,26.25,18,472.5,13.5,3.56
1,INV-15152100002,2018-10-19,4306,Northside One Stop / Hampton,1208 4th St NE,Hampton,50441,POINT (-93.202452 42.753045),35,FRANKLIN,1062100,Gold Rum,35,BACARDI USA INC,43037,Bacardi Gold,12,1000,9.5,14.25,2,28.5,2.0,0.52
2,INV-07697800001,2017-10-04,3773,Benz Distributing,501 7th Ave SE,Cedar Rapids,52401,POINT (-91.659875 41.97574),57,LINN,1062400,Spiced Rum,260,DIAGEO AMERICAS,43337,Captain Morgan Spiced Rum,12,1000,11.75,17.63,48,846.24,48.0,12.68
3,S12670100015,2013-06-10,3524,Sam's Club 6568 / Ames,305 AIRPORT RD,AMES,50010,POINT (-93.613648 42.001123),85,Story,1031080,VODKA 80 PROOF,55,Sazerac North America,35318,Barton Vodka,6,1750,6.92,10.38,42,435.96,73.5,19.42
4,INV-20060600034,2019-06-18,3722,Wal-Mart 1361 / Sioux City,3400 Singing Hills Blvd,Sioux City,51106,POINT (-96.36432 42.43609000000001),97,WOODBURY,1031100,American Vodkas,301,FIFTH GENERATION INC,38178,Titos Handmade Vodka,6,1750,19.0,28.5,12,342.0,21.0,5.54
5,INV-23231500007,2019-11-14,2622,Hy-Vee Food Store / Iowa City,1125 N Dodge St,Iowa City,52240,POINT (-91.518868 41.676095),52,JOHNSON,1031100,American Vodkas,55,SAZERAC NORTH AMERICA,35318,Barton Vodka,6,1750,6.92,10.38,24,249.12,42.0,11.09
6,INV-23646800060,2019-12-03,5113,"Ray's Supermarket, Inc.",1975 Franklin St,Waterloo,50703,POINT (-92.315215 42.495394),7,BLACK HAWK,1012200,Scotch Whiskies,260,DIAGEO AMERICAS,5347,Johnnie Walker Red,12,1000,17.1,25.65,2,51.3,2.0,0.52
7,INV-24860900009,2020-01-28,4434,Todd's,209 S Union St,Rock Rapids,51246,POINT (-96.175502 43.430056),60,LYON,1081600,Whiskey Liqueur,421,SAZERAC COMPANY INC,100423,Fireball Cinnamon Whiskey 100ml Carrier,8,100,6.0,9.0,24,216.0,2.4,0.63
8,S12343700001,2013-05-21,3744,Payless Foods / Dyersville,733 16TH AVE SE,DYERSVILLE,52040,POINT (-91.115769 42.470204),31,Dubuque,1012100,CANADIAN WHISKIES,115,"Constellation Wine Company, Inc.",11788,Black Velvet,6,1750,9.7,14.92,600,8952.0,1050.0,277.38
9,S31522300001,2016-03-30,5199,Super Mart / Oelwein,"701, S FREDERICK AVE",OELWEIN,50662,POINT (-91.913481 42.669128),33,Fayette,1052010,IMPORTED GRAPE BRANDIES,389,REMY COINTREAU USA .,66295,Remy Martin V,24,200,5.42,8.13,2,16.26,0.4,0.11


In [20]:
# 10 highest total_precipitation samples
df.sort_values('city', ascending=False).head(10)[['store_name', 'city', 'vendor_name', 'item_number', 'item_description']]

Unnamed: 0,store_name,city,vendor_name,item_number,item_description
1981,Casey's General Store #2551 / Woodward,Woodward,Laird & Company,35916,Five O'clock Vodka
849,Casey's General Store #2551 / Woodward,Woodward,Laird & Company,35918,Five O'clock Vodka
1164,Rodgers Spirits and More,Winterset,DIAGEO AMERICAS,68037,Baileys Original Irish Cream
1977,Hy-Vee / Winterset,Winterset,Gemini Spirits,80456,Ryan's Cream Liqueur
1796,Fareway Stores #683 / Winterset,Winterset,LUXCO INC,89386,Juarez Tequila Gold
47,Kum & Go #246 / Winterset,Winterset,DIAGEO AMERICAS,37996,Smirnoff 80prf
905,Fareway Stores #683 / Winterset,Winterset,Phillips Beverage,41681,UV Red Cherry
1480,Wine and Spirits Gallery,Windsor Heights,Campari(skyy),67192,X Rated Fusion Liqueur
1989,Sam's Club 6344 / Windsor Heights,Windsor Heights,DIAGEO AMERICAS,64512,Ciroc Apple
2050,Wal-Mart 1764 / Windsor Heights,Windsor Heights,BACARDI USA INC,43036,Bacardi Gold


# Use BigQuery through pandas-gbq

The `pandas-gbq` library is a community led project by the pandas community. It covers basic functionality, such as writing a DataFrame to BigQuery and running a query, but as a third-party library it may not handle all BigQuery features or use cases.

[Pandas GBQ Documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_gbq.html)

In [23]:
import pandas as pd

sample_count = 2000
df = pd.io.gbq.read_gbq('''
  SELECT store_name, SUM(sale_dollars) as sales
  FROM `bigquery-public-data.iowa_liquor_sales.sales`
  GROUP BY store_name
  ORDER BY sales DESC
  LIMIT 100
''', project_id='bigquery-207917', dialect='standard')

df.head()

Unnamed: 0,store_name,sales
0,Hy-Vee #3 / BDI / Des Moines,80701450.0
1,Central City 2,67332560.0
2,Hy-Vee Wine and Spirits / Iowa City,33902440.0
3,Sam's Club 8162 / Cedar Rapids,29964920.0
4,Sam's Club 6344 / Windsor Heights,28574220.0
