#Introduction
NoSQL databases are increasingly popular. Is is important to know how to interact with them to fetch and store data

In this lab you will get the opportunity to interact with BigQuery, a NoSQL database service offered by Google.

## Google Patents Research Data.

Google Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com)

Please refer to the following URL for more information:
https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-research-data


### 1. NoSQL database services require authentication prior to any data transfer 

In [1]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


### 2. We need to load Google's libraries to interact with the Database

In [2]:
from google.cloud import bigquery

### 3. We instantiate the Client in charge of handling the connection to the Database

In [3]:
# Create a "Client" object
client = bigquery.Client()

### 4. We open a connection to the "google_patents_research" dataset

In [4]:
# Construct a reference to the "google_patents_research" dataset
bigquery_connection = client.dataset("google_patents_research", project="bigquery-public-data")

In [5]:
# API request - fetch the dataset
google_patents_dataset = client.get_dataset(bigquery_connection)


#### 5. We ask the database to list all the tables contained in the dataset

In [6]:
# List all the tables in the "google_patents_research" dataset
google_patents_tables = list(client.list_tables(google_patents_dataset))

In [7]:
# Print names of all tables in the dataset
for table in google_patents_tables:  
    print(table.table_id)

annotations
annotations_202007
annotations_202101
annotations_202105
annotations_202111
annotations_202204
annotations_grouped
publications
publications_201710
publications_201802
publications_201809
publications_201903
publications_201909
publications_201912
publications_202004
publications_202007
publications_202101
publications_202105
publications_202111
publications_202204


#### 6. We ask the database to retrieve the table "publications_202204"

In [8]:
# Construct a reference to the "publications_202204" table
table_connection = google_patents_dataset.table("publications_202204")




# API request - fetch the table
table_publications_202204 = client.get_table(table_connection)

In [9]:
# Preview the first five lines of the "international top rising terms" table
client.list_rows(table_publications_202204, max_results=5).to_dataframe()

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
0,WO-2009067033-A2,A crystallographic model of the binding site a...,False,The subject matters of the invention are: a cr...,False,"[{'code': 'G16B15/30', 'inventive': True, 'fir...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[pfk, modulator, coordinates, tables, binding ...",[],https://patents.google.com/patent/WO2009067033A2,WIPO (PCT),International application publshed with declar...,[],"[0.13142297, -0.22401766, -0.051257387, -0.145..."
1,WO-2015166045-A2,The application of rescap to attenuate and pre...,False,The present invention relates to the treatment...,False,"[{'code': 'A61P25/00', 'inventive': True, 'fir...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[rescap, disease, brain, neurodegenerative, pr...",[],https://patents.google.com/patent/WO2015166045A2,WIPO (PCT),International application publshed with declar...,[],"[-0.17739123, -0.15348703, -0.0836166, -0.0850..."
2,WO-2021181100-A1,Compositions and methods for inducing an immun...,False,The invention relates to a composition compris...,False,"[{'code': 'C12N2710/10343', 'inventive': False...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[sars, composition, dose, vaccine, chadox1, su...",[],https://patents.google.com/patent/WO2021181100A1,WIPO (PCT),International application published with inter...,[],"[0.053326584, 0.030189348, 0.10093831, -0.1294..."
3,ZA-202006479-B,Improvements in or relating to beam alignment ...,False,,False,"[{'code': 'H04B7/0897', 'inventive': True, 'fi...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[relating, alignment, electronically steered, ...",[],https://patents.google.com/patent/ZA202006479B,South Africa,Granted patent,[],"[0.015821163, 0.0035525584, -0.27819073, -0.05..."
4,ZA-202005398-B,Managing extended 5g-s-tmsi in lte connected t...,False,,False,"[{'code': 'H04W76/27', 'inventive': True, 'fir...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[tmsi, lte, managing extended, managing, exten...",[],https://patents.google.com/patent/ZA202005398B,South Africa,Granted patent,[],"[0.052048348, -0.094614774, -0.16915298, -0.11..."


#### 7. We ask the database to retrieve the table "table_publications_202204", this time we store the result in a Pandas DataFrame

In [10]:
publications_202204DataFrame=client.list_rows(table_publications_202204, max_results=5).to_dataframe()

#### 8. We are ready to process the DataFrame as usual

In [11]:
type(publications_202204DataFrame)

pandas.core.frame.DataFrame

In [12]:
publications_202204DataFrame.head(5)

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
0,WO-2009067033-A2,A crystallographic model of the binding site a...,False,The subject matters of the invention are: a cr...,False,"[{'code': 'G16B15/30', 'inventive': True, 'fir...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[pfk, modulator, coordinates, tables, binding ...",[],https://patents.google.com/patent/WO2009067033A2,WIPO (PCT),International application publshed with declar...,[],"[0.13142297, -0.22401766, -0.051257387, -0.145..."
1,WO-2015166045-A2,The application of rescap to attenuate and pre...,False,The present invention relates to the treatment...,False,"[{'code': 'A61P25/00', 'inventive': True, 'fir...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[rescap, disease, brain, neurodegenerative, pr...",[],https://patents.google.com/patent/WO2015166045A2,WIPO (PCT),International application publshed with declar...,[],"[-0.17739123, -0.15348703, -0.0836166, -0.0850..."
2,WO-2021181100-A1,Compositions and methods for inducing an immun...,False,The invention relates to a composition compris...,False,"[{'code': 'C12N2710/10343', 'inventive': False...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[sars, composition, dose, vaccine, chadox1, su...",[],https://patents.google.com/patent/WO2021181100A1,WIPO (PCT),International application published with inter...,[],"[0.053326584, 0.030189348, 0.10093831, -0.1294..."
3,ZA-202006479-B,Improvements in or relating to beam alignment ...,False,,False,"[{'code': 'H04B7/0897', 'inventive': True, 'fi...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[relating, alignment, electronically steered, ...",[],https://patents.google.com/patent/ZA202006479B,South Africa,Granted patent,[],"[0.015821163, 0.0035525584, -0.27819073, -0.05..."
4,ZA-202005398-B,Managing extended 5g-s-tmsi in lte connected t...,False,,False,"[{'code': 'H04W76/27', 'inventive': True, 'fir...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[tmsi, lte, managing extended, managing, exten...",[],https://patents.google.com/patent/ZA202005398B,South Africa,Granted patent,[],"[0.052048348, -0.094614774, -0.16915298, -0.11..."


In [13]:
publications_202204DataFrame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   publication_number       5 non-null      object 
 1   title                    5 non-null      object 
 2   title_translated         5 non-null      boolean
 3   abstract                 5 non-null      object 
 4   abstract_translated      5 non-null      boolean
 5   cpc                      5 non-null      object 
 6   cpc_low                  5 non-null      object 
 7   cpc_inventive_low        5 non-null      object 
 8   top_terms                5 non-null      object 
 9   similar                  5 non-null      object 
 10  url                      5 non-null      object 
 11  country                  5 non-null      object 
 12  publication_description  5 non-null      object 
 13  cited_by                 5 non-null      object 
 14  embedding_v1             5 non

## Lets embed all the code in a try statement

### in real settings it is always advisable to embed queries to external services within try statements to manage potential errors.


In [16]:
try:
    client = bigquery.Client()
    bigquery_connection = client.dataset("google_patents_research", project="bigquery-public-data")
    google_patents_dataset = client.get_dataset(bigquery_connection)
    table_connection = google_patents_dataset.table("publications_202204")
    table_publications_202204 = client.get_table(table_connection)
    publications_202204DataFrame=client.list_rows(table_publications_202204, max_results=500).to_dataframe()

except:
    print("unable to peform the operation")


In [17]:
publications_202204DataFrame.sample(10)

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
486,WO-2017009303-A1,Biomarkers for hbv treatment response,False,The present invention relates to methods that ...,False,"[{'code': 'C12Q1/706', 'inventive': True, 'fir...","[C12Q1/706, C12Q1/701, C12Q1/70, C12Q1/00, C12...","[C12Q1/706, C12Q1/701, C12Q1/70, C12Q1/00, C12...","[treatment, hbv, patient, pgx, interferon, fin...","[{'publication_number': 'AU-2010249379-B2', 'a...",https://patents.google.com/patent/WO2017009303A1,WIPO (PCT),International application published with inter...,[],"[0.074304834, -0.051979333, -0.014596166, -0.1..."
216,WO-2008098167-A2,Robot and web-based method for affiliation ver...,False,A robotic tool confirms information on organiz...,False,"[{'code': 'G06F16/951', 'inventive': True, 'fi...","[G06F16/951, G06F16/95, G06F16/90, G06F16/00, ...","[G06F16/951, G06F16/95, G06F16/90, G06F16/00, ...","[database, search, affiliations, affiliation, ...","[{'publication_number': 'US-7739128-B2', 'appl...",https://patents.google.com/patent/WO2008098167A2,WIPO (PCT),International application published without in...,[],"[0.20887198, -0.21605313, -0.034984674, -0.060..."
201,WO-2014004472-A1,Soybean event pdab9582.816.15.1 detection method,False,Soybean Event pDAB9582.816.15.1 comprises gene...,False,"[{'code': 'C12Q1/6895', 'inventive': True, 'fi...","[C12Q1/6895, C12Q1/6888, C12Q1/6876, C12Q1/68,...","[C12Q1/6895, C12Q1/6888, C12Q1/6876, C12Q1/68,...","[soybean, event, dna, sequence, seq, soybean e...","[{'publication_number': 'US-8632978-B2', 'appl...",https://patents.google.com/patent/WO2014004472A1,WIPO (PCT),International application published with inter...,[],"[0.070827894, -0.028539212, 0.0028801898, -0.0..."
352,WO-2008045807-A2,Meniscus prosthetic device,False,A prosthetic device that may be utilized as an...,False,"[{'code': 'A61F2002/30069', 'inventive': False...","[A61F2002/30069, A61F2002/3006, A61F2002/30003...","[A61F2002/30069, A61F2002/3006, A61F2002/30003...","[prosthetic, central, fixation, ellipsoidal, s...","[{'publication_number': 'US-9320606-B2', 'appl...",https://patents.google.com/patent/WO2008045807A2,WIPO (PCT),International application published without in...,"[{'publication_number': 'US-10376370-B2', 'app...","[0.0040098396, -0.082375646, -0.05119519, -0.1..."
147,WO-2011155859-A2,"Polysaccharide and derivatives thereof, showin...",False,The invention relates to bacterial lipopolysac...,False,"[{'code': 'G01N2333/4724', 'inventive': False,...","[G01N2333/4724, G01N2333/4701, G01N2333/47, G0...","[G01N2333/4724, G01N2333/4701, G01N2333/47, G0...","[ficolin, alvei, polysaccharide, lps, kdo, oac...","[{'publication_number': 'EP-0126043-B1', 'appl...",https://patents.google.com/patent/WO2011155859A2,WIPO (PCT),International application publshed with declar...,"[{'publication_number': 'CN-111378054-A', 'app...","[0.15280677, -0.07099319, -0.036224972, -0.002..."
88,WO-2021224236-A1,Method for detecting inflammation-related plat...,False,The present invention relates to a method for ...,False,"[{'code': 'G01N2333/523', 'inventive': False, ...","[G01N2333/523, G01N2333/521, G01N2333/52, G01N...","[G01N2333/523, G01N2333/521, G01N2333/52, G01N...","[inflammation, biomarkers, subject, platelet a...","[{'publication_number': 'WO-2021224236-A1', 'a...",https://patents.google.com/patent/WO2021224236A1,WIPO (PCT),International application published with inter...,[],"[-0.085301094, -0.21947666, 0.01912451, -0.098..."
312,WO-2021089770-A2,Protein purification,False,Described herein is a process for protein puri...,False,"[{'code': 'C07K1/18', 'inventive': True, 'firs...","[C07K1/18, C07K1/16, C07K1/14, C07K1/00, C07K,...","[C07K1/18, C07K1/16, C07K1/14, C07K1/00, C07K,...","[protein, hiv, resin, purified, proteins, chro...","[{'publication_number': 'US-10160788-B2', 'app...",https://patents.google.com/patent/WO2021089770A2,WIPO (PCT),International application publshed with declar...,[],"[0.17575336, -0.04315363, 0.12253466, -0.11823..."
467,WO-9837390-A1,Thermometer,False,The disclosed thermometer is intended to perfo...,False,"[{'code': 'G01J5/041', 'inventive': True, 'fir...","[G01J5/041, G01J5/04, G01J5/02, G01J5/00, G01J...","[G01J5/041, G01J5/04, G01J5/02, G01J5/00, G01J...","[signal, antenna, temperature, receiver, input...","[{'publication_number': 'US-3493949-A', 'appli...",https://patents.google.com/patent/WO1998037390A1,WIPO (PCT),International application published with inter...,[],"[0.019019177, 0.097334675, 0.05046171, 0.03013..."
44,WO-0179993-A2,Method and apparatus for method and apparatus ...,False,A computer-implemented method and system for a...,False,"[{'code': 'G06F8/10', 'inventive': True, 'firs...","[G06F8/10, G06F8/00, G06F, G06, G, G06F8/30, G...","[G06F8/10, G06F8/00, G06F, G06, G, G06F8/30, G...","[seq, bag, spec, software program, software, s...","[{'publication_number': 'EP-1330708-A2', 'appl...",https://patents.google.com/patent/WO2001079993A2,WIPO (PCT),International application published without in...,"[{'publication_number': 'US-7523445-B1', 'appl...","[0.07051393, -0.28615355, 0.031477123, -0.0993..."
257,WO-2011081010-A1,Liquid crystal display device and electronic d...,False,To provide a liquid crystal display device whi...,False,"[{'code': 'G02F1/133621', 'inventive': True, '...","[G02F1/133621, G02F1/1336, G02F1/1335, G02F1/1...","[G02F1/133621, G02F1/1336, G02F1/1335, G02F1/1...","[liquid crystal, light, emitting, image, signa...","[{'publication_number': 'US-10861401-B2', 'app...",https://patents.google.com/patent/WO2011081010A1,WIPO (PCT),International application published with inter...,"[{'publication_number': 'US-9448433-B2', 'appl...","[-0.104191884, -0.043901265, -0.20733292, -0.2..."
