# Pandas Dataframe to Google Firestore

On my recent Hacker News project I ended up building a few nested data views in my pandas dataframe. These views are for plotting the user's comment sentiment over time, and for their top 50 saltiest comments.

I needed to find somewhere to host this data for the front end app and after looking over every offering across AWS/Heroku/GoogleCloud I settled on Google Cloud's Cloud Firestore. 

The price and usability was hard to beat, but the system lacks a simple `csv` upload. And since I had nested values in my dataframe as `str(dict)`s I wanted to un-nest them and take advantage of the document based structure of Google Firestore. 

This un-nesting string method would work for MongoDB as well. 

The primary advantage of using Firestore is that I won't have to worry about building a scalable API, and the cost of storage/querying is very low. 

An added advantage is that my front-end dev partners can write all the code they'd like in JS, and I can write and interact with the database using Python. That's a major win.

## Set up the Google Cloud Environment

I recommend using the [Firestore Quickstart Tutorials](https://cloud.google.com/firestore/docs/quickstart-servers) to get started. You will need to get registered on Google Cloud Platform, and create a Google Cloud Platform project.

Follow the instructions to create your database and download the authentication credential (`.json`).

Then you will need to set your environment variable as well. If you're using jupyter like me you can put the key in the same folder as your notebook (**ADD THE FILE NAME TO YOUR GITIGNORE**) and then follow the steps below to add the key to your Anaconda env.

Via Bash:
```
cd $CONDA_PREFIX
ls
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
nano ./etc/conda/activate.d/env_vars.sh
```
add these lines: 
```
#!/bin/sh
export GOOGLE_APPLICATION_CREDENTIALS="yourkey.json"
```
then ctrl+x, y.
```
nano ./etc/conda/deactivate.d/env_vars.s
```
add these lines:
```
#!/bin/sh
unset GOOGLE_APPLICATION_CREDENTIALS
```

*Then you're ready to launch your Jupyter notebook.*

## Load dependencies

In [1]:
#!pip install --upgrade google-cloud-firestore
import pandas as pd
import numpy as np
import json
import datetime
from google.cloud import firestore
from tqdm import tqdm_pandas
from tqdm import tqdm_notebook as tqdm
# Load TQDM
tqdm_pandas(tqdm())

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




## Check for Env variable

In [2]:
!if [ -z ${GOOGLE_APPLICATION_CREDENTIALS+x} ]; then echo "GOOGLE_APPLICATION_CREDENTIALS is unset"; else echo "GOOGLE_APPLICATION_CREDENTIALS is set to '$GOOGLE_APPLICATION_CREDENTIALS'";fi

GOOGLE_APPLICATION_CREDENTIALS is set to 'toxic-hackers-firebase-adminsdk-ss814-89fb3a76b5.json'


## Initialize Cloud Firestore

In [3]:
db = firestore.Client()

## I created the collection `commentor_stats` manually using the firestore dashboard.

In [4]:
users_ref = db.collection('commentor_stats')
users_ref

<google.cloud.firestore_v1.collection.CollectionReference at 0x7f7f576820f0>

## A few notes about creating new documents on Firestore

 [The Google Cloud Tutorial for Uploading Data](https://cloud.google.com/firestore/docs/manage-data/add-data)

#### Add a document with a specified document id using set  
```db.collection(u'cities').document(u'new-city-id').set(data)```

#### Let firestore create the id using the .add method. 
```db.collection(u'cities').add(city.to_dict())```

#### The data structure for creating proper imports. 
```
data = {
  u'stringExample': u'Hello, World!',
  u'booleanExample': True,
  u'numberExample': 3.14159265,
  u'dateExample': datetime.datetime.now(), #pd.timestamp works too.
  u'arrayExample': [5, True, u'hello'],
  u'nullExample': None,
  u'objectExample': {
    u'a': 5,
    u'b': True
  }
```


#### Timestamps need to conform to RFC 3339, pd.Timestamp works.

## Now it's time to upload all my data.  First I need to import it to the notebook.

In [5]:
df = pd.read_csv("Final_Data2/hn_commentor_summary.csv")

In [6]:
df.sort_values(by = ["sum_slt_s"], ascending = True).head()

Unnamed: 0,commentor,count_comments,time_of_last_comment,time_of_first_comment,monthly_plot,top_cmnts_s,top_salty_comment,sum_slt_oall,average_slt_oall,min_slt_oall,max_slt_oall,cnt_slt_h,cnt_slt_s,sum_slt_h,sum_slt_s,avg_slt_h,avg_slt_s,rank_lt_amt_slt,rank_lt_qty_sc,rank_oall_slt
13791,DanBC,20405,1548967920,1316712464,"[{'y_m': '11_09', 't_s': 0.0, 't_h': 18.99, 'c...","[{'commentor': 'DanBC', 'comment_time': 144447...","[{'commentor': 'DanBC', 'comment_time': 144447...",18309.592278,0.897309,-2.51619,0.999436,19516.0,889.0,19395.0234,-1085.431121,0.993801,-1.220957,1.0,2.0,
357216,tptacek,47283,1548999272,1193928765,"[{'y_m': '07_11', 't_s': 0.0, 't_h': 7.0, 'c_s...","[{'commentor': 'tptacek', 'comment_time': 1393...","[{'commentor': 'tptacek', 'comment_time': 1393...",45022.323581,0.952188,-2.967871,0.999432,46298.0,985.0,46098.805516,-1076.481936,0.995698,-1.092875,2.0,1.0,
279573,pg_is_a_butt,857,1510157764,1386703885,"[{'y_m': '13_12', 't_s': -5.53, 't_h': 4.0, 'c...","[{'commentor': 'pg_is_a_butt', 'comment_time':...","[{'commentor': 'pg_is_a_butt', 'comment_time':...",-408.516125,-0.476682,-3.099355,0.999432,367.0,490.0,359.658917,-768.175042,0.979997,-1.567704,3.0,6.0,1.0
45281,TeMPOraL,18572,1549012070,1280659106,"[{'y_m': '10_08', 't_s': 0.0, 't_h': 2.0, 'c_s...","[{'commentor': 'TeMPOraL', 'comment_time': 147...","[{'commentor': 'TeMPOraL', 'comment_time': 147...",17212.187029,0.926782,-2.803237,0.999434,17985.0,587.0,17837.573015,-625.385987,0.991803,-1.065394,4.0,3.0,
109540,coldtea,20654,1549011451,1360252206,"[{'y_m': '13_02', 't_s': -9.9, 't_h': 196.41, ...","[{'commentor': 'coldtea', 'comment_time': 1502...","[{'commentor': 'coldtea', 'comment_time': 1502...",19432.280883,0.940848,-1.988798,0.999435,20112.0,542.0,20003.497742,-571.216859,0.994605,-1.053906,5.0,4.0,


In [7]:
df.columns

Index(['commentor', 'count_comments', 'time_of_last_comment',
       'time_of_first_comment', 'monthly_plot', 'top_cmnts_s',
       'top_salty_comment', 'sum_slt_oall', 'average_slt_oall', 'min_slt_oall',
       'max_slt_oall', 'cnt_slt_h', 'cnt_slt_s', 'sum_slt_h', 'sum_slt_s',
       'avg_slt_h', 'avg_slt_s', 'rank_lt_amt_slt', 'rank_lt_qty_sc',
       'rank_oall_slt'],
      dtype='object')

In [8]:
# Rename a few columns for consistency. 
df = df.rename(columns={'count_comments': "cnt_cmnts_oall",
                        'time_of_last_comment': "time_cmnt_lst",
                        'time_of_first_comment': "time_cmnt_fst",
                        'average_slt_oall': "avg_slt_oall"})

In [9]:
### These are the columns I'll keep. I'm going to drop a few. 
df = df[["commentor", # Name of commentor.
  "time_cmnt_lst", # Most recent comment time in Dataset.
  "time_cmnt_fst", # First HN comment time.
  "cnt_cmnts_oall", # Count of total number of comments. 
  "sum_slt_oall", # Total Salt Score Overall. (All Salty + NonSalty Scores added up.)
  "avg_slt_oall", # Average Comment Salt Score across all comments. 
  "cnt_slt_s", # Count of JUST salty comments. 
  "sum_slt_s", # Total Salt Score of Salty Comments
  "avg_slt_s", # Average Salt Score of Salty Comments
  "rank_lt_amt_slt", # Rank: Lifetime Salt Scores Total of Salty comments only.
  "rank_lt_qty_sc", # Rank: Lifetime quantity of "Salty Comments" contributed.
  "rank_oall_slt", # Rank: Lifetime overall "Salt Score" total of All Salty + NonSalty comments.
  "top_cmnts_s", # List of 50 Top Salty Comments. 
  "monthly_plot"]];# List of every month of activity for plotting.

### I'll make a copy incase something happens. 

In [54]:
df["commentor_search"] = df["commentor"].str.lower()
df.head()

Unnamed: 0,commentor,time_cmnt_lst,time_cmnt_fst,cnt_cmnts_oall,sum_slt_oall,avg_slt_oall,cnt_slt_s,sum_slt_s,avg_slt_s,rank_lt_amt_slt,rank_lt_qty_sc,rank_oall_slt,top_cmnts_s,monthly_plot,commentor_search
0,0-,1394798525,1394798525,1,0.999353,0.999353,,,,,,,,"[{'y_m': '14_03', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-
1,0--__-_-__--0,1541102045,1541102045,1,0.999393,0.999393,,,,,,,,"[{'y_m': '18_11', 't_s': 0.0, 't_h': 1.0, 'c_s...",0--__-_-__--0
2,0-0,1259867712,1259867712,1,0.999354,0.999354,,,,,,,,"[{'y_m': '09_12', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-0
3,0-4,1288646410,1288394371,12,11.99109,0.999258,,,,,,,,"[{'y_m': '10_10', 't_s': 0.0, 't_h': 10.99, 'c...",0-4
4,0-9,1524324694,1477669562,2,1.997974,0.998987,,,,,,,,"[{'y_m': '16_10', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-9


In [55]:
test = df.copy()
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 388118 entries, 0 to 388117
Data columns (total 15 columns):
commentor           388118 non-null object
time_cmnt_lst       388118 non-null int64
time_cmnt_fst       388118 non-null int64
cnt_cmnts_oall      388118 non-null int64
sum_slt_oall        388118 non-null float64
avg_slt_oall        388118 non-null float64
cnt_slt_s           51914 non-null float64
sum_slt_s           51914 non-null float64
avg_slt_s           51914 non-null float64
rank_lt_amt_slt     51914 non-null float64
rank_lt_qty_sc      51914 non-null float64
rank_oall_slt       6168 non-null float64
top_cmnts_s         51914 non-null object
monthly_plot        388118 non-null object
commentor_search    388118 non-null object
dtypes: float64(8), int64(3), object(4)
memory usage: 44.4+ MB


### Convert Unix-Epoch times (currently `int`) to Pandas Timestamps. 

In [56]:
# Convert Unix time to Timestamps
test["time_cmnt_lst"] = test["time_cmnt_lst"].apply(lambda x: pd.Timestamp(x, unit='s').tz_localize('UTC'))
test["time_cmnt_fst"] = test["time_cmnt_fst"].apply(lambda x: pd.Timestamp(x, unit='s').tz_localize('UTC'))
test.head(5)

Unnamed: 0,commentor,time_cmnt_lst,time_cmnt_fst,cnt_cmnts_oall,sum_slt_oall,avg_slt_oall,cnt_slt_s,sum_slt_s,avg_slt_s,rank_lt_amt_slt,rank_lt_qty_sc,rank_oall_slt,top_cmnts_s,monthly_plot,commentor_search
0,0-,2014-03-14 12:02:05+00:00,2014-03-14 12:02:05+00:00,1,0.999353,0.999353,,,,,,,,"[{'y_m': '14_03', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-
1,0--__-_-__--0,2018-11-01 19:54:05+00:00,2018-11-01 19:54:05+00:00,1,0.999393,0.999393,,,,,,,,"[{'y_m': '18_11', 't_s': 0.0, 't_h': 1.0, 'c_s...",0--__-_-__--0
2,0-0,2009-12-03 19:15:12+00:00,2009-12-03 19:15:12+00:00,1,0.999354,0.999354,,,,,,,,"[{'y_m': '09_12', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-0
3,0-4,2010-11-01 21:20:10+00:00,2010-10-29 23:19:31+00:00,12,11.99109,0.999258,,,,,,,,"[{'y_m': '10_10', 't_s': 0.0, 't_h': 10.99, 'c...",0-4
4,0-9,2018-04-21 15:31:34+00:00,2016-10-28 15:46:02+00:00,2,1.997974,0.998987,,,,,,,,"[{'y_m': '16_10', 't_s': 0.0, 't_h': 1.0, 'c_s...",0-9


In [57]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 388118 entries, 0 to 388117
Data columns (total 15 columns):
commentor           388118 non-null object
time_cmnt_lst       388118 non-null datetime64[ns, UTC]
time_cmnt_fst       388118 non-null datetime64[ns, UTC]
cnt_cmnts_oall      388118 non-null int64
sum_slt_oall        388118 non-null float64
avg_slt_oall        388118 non-null float64
cnt_slt_s           51914 non-null float64
sum_slt_s           51914 non-null float64
avg_slt_s           51914 non-null float64
rank_lt_amt_slt     51914 non-null float64
rank_lt_qty_sc      51914 non-null float64
rank_oall_slt       6168 non-null float64
top_cmnts_s         51914 non-null object
monthly_plot        388118 non-null object
commentor_search    388118 non-null object
dtypes: datetime64[ns, UTC](2), float64(8), int64(1), object(4)
memory usage: 44.4+ MB


## I'll use this function to unpack the messy list of comments from `top_cmnts_s' field. 

In [58]:
test = test.sort_values(by = ["sum_slt_s"], ascending = True)
test.shape

(388118, 15)

In [59]:
# Here's what it looks like before unpacking/repacking:
test_cmnts = test.iloc[1]["top_cmnts_s"]
test_cmnts[0:1000]

'[{\'commentor\': \'tptacek\', \'comment_time\': 1393815150, \'comment_saltiness\': -2.967871080227196, \'is_salty\': True, \'is_severe_toxicity\': True, \'is_obscene\': True, \'is_identity_attack\': False, \'is_insult\': True, \'is_threat\': False, \'parent_type\': \'comment\', \'parent_author\': 7331783, \'parent_title\': \'Another Comment\', \'cleaned_comment\': \'Fuck L0ck. Seriously fuck those guys.\', \'comment_id\': 7331783, \'parent_id\': 7331604}, {\'commentor\': \'tptacek\', \'comment_time\': 1324960550, \'comment_saltiness\': -2.398391169384122, \'is_salty\': True, \'is_severe_toxicity\': False, \'is_obscene\': True, \'is_identity_attack\': False, \'is_insult\': True, \'is_threat\': False, \'parent_type\': \'comment\', \'parent_author\': 3394830, \'parent_title\': \'Another Comment\', \'cleaned_comment\': \'So great. "MOTHERFUCKERS!"\', \'comment_id\': 3394830, \'parent_id\': 3394742}, {\'commentor\': \'tptacek\', \'comment_time\': 1505572578, \'comment_saltiness\': -2.23519

In [60]:
t_df = pd.read_json(json.dumps(eval(test_cmnts)))
display(t_df.columns)
display(t_df.head())

Index(['cleaned_comment', 'comment_id', 'comment_saltiness', 'comment_time',
       'commentor', 'is_identity_attack', 'is_insult', 'is_obscene',
       'is_salty', 'is_severe_toxicity', 'is_threat', 'parent_author',
       'parent_id', 'parent_title', 'parent_type'],
      dtype='object')

Unnamed: 0,cleaned_comment,comment_id,comment_saltiness,comment_time,commentor,is_identity_attack,is_insult,is_obscene,is_salty,is_severe_toxicity,is_threat,parent_author,parent_id,parent_title,parent_type
0,Fuck L0ck. Seriously fuck those guys.,7331783,-2.967871,2014-03-03 02:52:30,tptacek,False,True,True,True,True,False,7331783,7331604,Another Comment,comment
1,"So great. ""MOTHERFUCKERS!""",3394830,-2.398391,2011-12-27 04:35:50,tptacek,False,True,True,True,False,False,3394830,3394742,Another Comment,comment
2,"It's also fucking stupid. Peiter ""mudge"" Zatko...",15264806,-2.235195,2017-09-16 14:36:18,tptacek,False,True,True,True,False,False,15264806,15263080,Another Comment,comment
3,Could you be more of a tool? Literally: the on...,5883501,-2.095188,2013-06-15 01:42:45,tptacek,False,True,True,True,False,False,5883501,5883491,Another Comment,comment
4,Kindly go fuck yourself.,6098474,-1.963429,2013-07-24 19:26:11,tptacek,False,True,True,True,False,False,6098474,6098462,Another Comment,comment


In [61]:
t_df = t_df.drop(columns=['commentor','is_salty']).reset_index()

In [62]:
def dict_top_comments(top_cmnt_obj):
    """Unpack the top comments and turn it into a good dict.
    
    Args:
        top_cmnt_obj, a str of dicts, from df.top_cmnts_s.
    
    Evaluates the string, json.dumps it, and reads it back in to a dataframe.
    Drops all unecessary columns. 
    Creates a named index for each comment. 
    Turns df into an indexed dict. 
    
    Returns: 
        temp, a properly formed nested dict. 
    """
    try:
        temp = pd.read_json(json.dumps(eval(top_cmnt_obj)))
        temp = temp.drop(columns=['commentor','is_salty']).reset_index()
        temp["c_id"] = temp["index"].apply(lambda x: "c_" + str(x))
        temp["comment_time"] = temp["comment_time"].apply(lambda x: pd.Timestamp(x, unit='s').tz_localize('UTC'))
        temp.drop(columns = ["index"], inplace = True)
        temp = temp.set_index("c_id").to_dict("index")
    except:
        temp = np.NaN
    return temp


# Preview the dict after unpacking & cleaning. 
unpacked_repacked = dict_top_comments(test_cmnts)
# After unpacking/repacking
unpacked_repacked["c_0"]

{'cleaned_comment': 'Fuck L0ck. Seriously fuck those guys.',
 'comment_id': 7331783,
 'comment_saltiness': -2.967871080227196,
 'comment_time': Timestamp('2014-03-03 02:52:30+0000', tz='UTC'),
 'is_identity_attack': False,
 'is_insult': True,
 'is_obscene': True,
 'is_severe_toxicity': True,
 'is_threat': False,
 'parent_author': 7331783,
 'parent_id': 7331604,
 'parent_title': 'Another Comment',
 'parent_type': 'comment'}

## I'll use this function to unpack / repack my `monthly_plot`

In [63]:
# Before unpack/repack
test_plts = test.iloc[1]["monthly_plot"]
test_plts[0:1000]

"[{'y_m': '07_11', 't_s': 0.0, 't_h': 7.0, 'c_s': 0.0, 'c_h': 7.0}, {'y_m': '07_12', 't_s': -1.01, 't_h': 25.98, 'c_s': 1.0, 'c_h': 26.0}, {'y_m': '08_01', 't_s': -5.22, 't_h': 27.8, 'c_s': 4.0, 'c_h': 28.0}, {'y_m': '08_02', 't_s': -1.74, 't_h': 22.78, 'c_s': 1.0, 'c_h': 23.0}, {'y_m': '08_03', 't_s': -6.04, 't_h': 108.94, 'c_s': 6.0, 'c_h': 110.0}, {'y_m': '08_04', 't_s': -7.55, 't_h': 181.45, 'c_s': 6.0, 'c_h': 182.0}, {'y_m': '08_05', 't_s': -4.98, 't_h': 148.18, 'c_s': 5.0, 'c_h': 149.0}, {'y_m': '08_06', 't_s': -16.8, 't_h': 244.51, 'c_s': 13.0, 'c_h': 246.0}, {'y_m': '08_07', 't_s': -5.27, 't_h': 162.67, 'c_s': 5.0, 'c_h': 163.0}, {'y_m': '08_08', 't_s': -15.76, 't_h': 254.06, 'c_s': 14.0, 'c_h': 257.0}, {'y_m': '08_09', 't_s': -9.04, 't_h': 150.21, 'c_s': 8.0, 'c_h': 151.0}, {'y_m': '08_10', 't_s': -7.15, 't_h': 118.94, 'c_s': 6.0, 'c_h': 120.0}, {'y_m': '08_11', 't_s': -7.28, 't_h': 228.52, 'c_s': 6.0, 'c_h': 230.0}, {'y_m': '08_12', 't_s': -10.1, 't_h': 252.08, 'c_s': 8.0, 'c

In [64]:
def dict_monthly_plot(monthly_plot_obj):
    """Turns the list of dicts into a nested dict w/ indexes.
    
    Args:
        monthly_plot_obj, an array of dicts.
    
    Returns: 
        temp, a dict of dicts. 
    """
    try:
        temp = pd.DataFrame.from_dict(eval(monthly_plot_obj)).set_index("y_m").to_dict("index")
    except:
        temp = np.NaN
    return temp

unpacked_repacked = dict_monthly_plot(test_plts)
# After unpacking/repacking
unpacked_repacked['07_11']

{'c_h': 7.0, 'c_s': 0.0, 't_h': 7.0, 't_s': 0.0}

## Finally, I'll use this function to process all the data for upload. 
This may take a while. :)

I'll turn my df into a numpy array of dicts then apply the function. 

In [65]:
upload = test.sort_values(by=['commentor']).to_dict("records")

In [66]:
len(upload)

388118

In [67]:
#!pip install joblib
from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()
num_cores

12

In [68]:
x_dict = upload
def prep_for_upload(x):
    """"""
    x["monthly_plot"] = dict_monthly_plot(x["monthly_plot"]) # Processes the monthly plots 
    x["top_cmnts_s"] = dict_top_comments(x["top_cmnts_s"]) # Proceses the top comments
    #print("uploaded ", x["commentor"])
    return x
results =[]
results.append(Parallel(n_jobs=11)(delayed(prep_for_upload)(x) for x in tqdm(x_dict)))
print("done")

HBox(children=(IntProgress(value=0, max=388118), HTML(value='')))

done


##  This lets you set the start point incase you get interrupted and have to restart. 

In [69]:
final = results[0]

In [70]:
len(final)

388118

## And here's the upload function. 

Notice how it batches the records into groups of 500 then submits them. The submission step was having the occasional timeout but adding the `try` worked great. 

In [72]:
from time import sleep
x = 1
batch_no = 1
for entry in tqdm(final):
    if x == 1:
        # Do this part the first time.
        batch = db.batch()
        print("set batch")
        
    # Do this part for every single one. 
    #print ("added %s to batch" % x)
    batch.set(db.collection(u'commentor_stats').document(), entry)
    
    if x % 500 == 0:
        #Do this part every 500th time.
        #Had to add a try/except for this pesky submission error.
        try:
            batch.commit()
        except:
            print("Commit of batch %s failed... reattempting." % batch_no)
            sleep(5) # Wait 5 seconds, then retry. 
            batch.commit()
        print("sent batch %s" % batch_no)
        batch = db.batch()
        batch_no += 1
    x += 1

# One last batch commit to send the last non-500 docsize batch.
batch.commit()

HBox(children=(IntProgress(value=0, max=388118), HTML(value='')))

set batch
sent batch 1
sent batch 2
sent batch 3
sent batch 4
sent batch 5
sent batch 6
sent batch 7
sent batch 8
sent batch 9
sent batch 10
sent batch 11
sent batch 12
sent batch 13
sent batch 14
sent batch 15
sent batch 16
sent batch 17
sent batch 18
sent batch 19
sent batch 20
sent batch 21
sent batch 22
sent batch 23
sent batch 24
sent batch 25
sent batch 26
sent batch 27
sent batch 28
sent batch 29
sent batch 30
sent batch 31
sent batch 32
sent batch 33
sent batch 34
sent batch 35
sent batch 36
sent batch 37
sent batch 38
sent batch 39
sent batch 40
sent batch 41
sent batch 42
sent batch 43
sent batch 44
sent batch 45
sent batch 46
sent batch 47
sent batch 48
sent batch 49
sent batch 50
sent batch 51
sent batch 52
sent batch 53
sent batch 54
sent batch 55
sent batch 56
sent batch 57
sent batch 58
sent batch 59
sent batch 60
sent batch 61
sent batch 62
sent batch 63
sent batch 64
sent batch 65
sent batch 66
sent batch 67
sent batch 68
sent batch 69
sent batch 70
sent batch 71
sent 

sent batch 554
sent batch 555
sent batch 556
sent batch 557
sent batch 558
sent batch 559
sent batch 560
sent batch 561
sent batch 562
sent batch 563
sent batch 564
sent batch 565
sent batch 566
sent batch 567
sent batch 568
sent batch 569
sent batch 570
sent batch 571
sent batch 572
sent batch 573
sent batch 574
sent batch 575
sent batch 576
sent batch 577
sent batch 578
sent batch 579
sent batch 580
sent batch 581
sent batch 582
sent batch 583
sent batch 584
sent batch 585
sent batch 586
sent batch 587
sent batch 588
sent batch 589
sent batch 590
sent batch 591
sent batch 592
sent batch 593
sent batch 594
sent batch 595
sent batch 596
sent batch 597
sent batch 598
sent batch 599
sent batch 600
sent batch 601
sent batch 602
sent batch 603
sent batch 604
sent batch 605
sent batch 606
sent batch 607
sent batch 608
sent batch 609
sent batch 610
sent batch 611
sent batch 612
sent batch 613
sent batch 614
sent batch 615
sent batch 616
sent batch 617
sent batch 618
sent batch 619
sent batch

[update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   seconds: 1558029224
   nanos: 125285000
 }, update_time {
   second

In [48]:
batch.commit()

[update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   seconds: 1557775181
   nanos: 939366000
 }, update_time {
   second

In [None]:
# The basic outline for batch uploading. 
#batch = db.batch()
#batch.set(db.collection(u'commentor_stats').document(),{u'commentor': u'ZTESTZZZZZ'})
#batch.commit()