#Introduction
NoSQL databases are increasingly popular. Is is important to know how to interact with them to fetch and store data

In this lab you will get the opportunity to interact with BigQuery, a NoSQL database service offered by Google.

## Google Patents Research Data.

Google Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com)

Please refer to the following URL for more information:
https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-research-data


### 1. NoSQL database services require authentication prior to any data transfer 

In [1]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


### 2. We need to load Google's libraries to interact with the Database

In [2]:
from google.cloud import bigquery

### 3. We instantiate the Client in charge of handling the connection to the Database

In [3]:
# Create a "Client" object
client = bigquery.Client()

### 4. We open a connection to the "google_patents_research" dataset

In [4]:
# Construct a reference to the "google_patents_research" dataset
bigquery_connection = client.dataset("google_patents_research", project="bigquery-public-data")

In [5]:
# API request - fetch the dataset
google_patents_dataset = client.get_dataset(bigquery_connection)


#### 5. We ask the database to list all the tables contained in the dataset

In [6]:
# List all the tables in the "google_patents_research" dataset
google_patents_tables = list(client.list_tables(google_patents_dataset))

In [7]:
# Print names of all tables in the dataset
for table in google_patents_tables:  
    print(table.table_id)

annotations
annotations_202007
annotations_202101
annotations_202105
annotations_202111
annotations_202204
annotations_grouped
publications
publications_201710
publications_201802
publications_201809
publications_201903
publications_201909
publications_201912
publications_202004
publications_202007
publications_202101
publications_202105
publications_202111
publications_202204


#### 6. We ask the database to retrieve the table "publications_202204"

In [8]:
# Construct a reference to the "publications_202204" table
table_connection = google_patents_dataset.table("publications_202204")




# API request - fetch the table
table_publications_202204 = client.get_table(table_connection)

In [9]:
# Preview the first five lines of the "international top rising terms" table
client.list_rows(table_publications_202204, max_results=5).to_dataframe()

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
0,WO-2009067033-A2,A crystallographic model of the binding site a...,False,The subject matters of the invention are: a cr...,False,"[{'code': 'G16B15/30', 'inventive': True, 'fir...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[pfk, modulator, coordinates, tables, binding ...",[],https://patents.google.com/patent/WO2009067033A2,WIPO (PCT),International application publshed with declar...,[],"[0.13142297, -0.22401766, -0.051257387, -0.145..."
1,WO-2015166045-A2,The application of rescap to attenuate and pre...,False,The present invention relates to the treatment...,False,"[{'code': 'A61P25/00', 'inventive': True, 'fir...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[rescap, disease, brain, neurodegenerative, pr...",[],https://patents.google.com/patent/WO2015166045A2,WIPO (PCT),International application publshed with declar...,[],"[-0.17739123, -0.15348703, -0.0836166, -0.0850..."
2,WO-2021181100-A1,Compositions and methods for inducing an immun...,False,The invention relates to a composition compris...,False,"[{'code': 'C12N2710/10343', 'inventive': False...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[sars, composition, dose, vaccine, chadox1, su...",[],https://patents.google.com/patent/WO2021181100A1,WIPO (PCT),International application published with inter...,[],"[0.053326584, 0.030189348, 0.10093831, -0.1294..."
3,ZA-202006479-B,Improvements in or relating to beam alignment ...,False,,False,"[{'code': 'H04B7/0897', 'inventive': True, 'fi...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[relating, alignment, electronically steered, ...",[],https://patents.google.com/patent/ZA202006479B,South Africa,Granted patent,[],"[0.015821163, 0.0035525584, -0.27819073, -0.05..."
4,ZA-202005398-B,Managing extended 5g-s-tmsi in lte connected t...,False,,False,"[{'code': 'H04W76/27', 'inventive': True, 'fir...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[tmsi, lte, managing extended, managing, exten...",[],https://patents.google.com/patent/ZA202005398B,South Africa,Granted patent,[],"[0.052048348, -0.094614774, -0.16915298, -0.11..."


#### 7. We ask the database to retrieve the table "table_publications_202204", this time we store the result in a Pandas DataFrame

In [10]:
publications_202204DataFrame=client.list_rows(table_publications_202204, max_results=5).to_dataframe()

#### 8. We are ready to process the DataFrame as usual

In [11]:
type(publications_202204DataFrame)

pandas.core.frame.DataFrame

In [12]:
publications_202204DataFrame.head(5)

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
0,WO-2009067033-A2,A crystallographic model of the binding site a...,False,The subject matters of the invention are: a cr...,False,"[{'code': 'G16B15/30', 'inventive': True, 'fir...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[G16B15/30, G16B15/00, G16B, G16, G, G16C20/50...","[pfk, modulator, coordinates, tables, binding ...",[],https://patents.google.com/patent/WO2009067033A2,WIPO (PCT),International application publshed with declar...,[],"[0.13142297, -0.22401766, -0.051257387, -0.145..."
1,WO-2015166045-A2,The application of rescap to attenuate and pre...,False,The present invention relates to the treatment...,False,"[{'code': 'A61P25/00', 'inventive': True, 'fir...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[A61P25/00, A61P, A61, A, C12Y301/03001, C12Y3...","[rescap, disease, brain, neurodegenerative, pr...",[],https://patents.google.com/patent/WO2015166045A2,WIPO (PCT),International application publshed with declar...,[],"[-0.17739123, -0.15348703, -0.0836166, -0.0850..."
2,WO-2021181100-A1,Compositions and methods for inducing an immun...,False,The invention relates to a composition compris...,False,"[{'code': 'C12N2710/10343', 'inventive': False...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[C12N2710/10343, C12N2710/10341, C12N2710/1031...","[sars, composition, dose, vaccine, chadox1, su...",[],https://patents.google.com/patent/WO2021181100A1,WIPO (PCT),International application published with inter...,[],"[0.053326584, 0.030189348, 0.10093831, -0.1294..."
3,ZA-202006479-B,Improvements in or relating to beam alignment ...,False,,False,"[{'code': 'H04B7/0897', 'inventive': True, 'fi...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[H04B7/0897, H04B7/0891, H04B7/08, H04B7/04, H...","[relating, alignment, electronically steered, ...",[],https://patents.google.com/patent/ZA202006479B,South Africa,Granted patent,[],"[0.015821163, 0.0035525584, -0.27819073, -0.05..."
4,ZA-202005398-B,Managing extended 5g-s-tmsi in lte connected t...,False,,False,"[{'code': 'H04W76/27', 'inventive': True, 'fir...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[H04W76/27, H04W76/20, H04W76/00, H04W, H04, H...","[tmsi, lte, managing extended, managing, exten...",[],https://patents.google.com/patent/ZA202005398B,South Africa,Granted patent,[],"[0.052048348, -0.094614774, -0.16915298, -0.11..."


In [13]:
publications_202204DataFrame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   publication_number       5 non-null      object 
 1   title                    5 non-null      object 
 2   title_translated         5 non-null      boolean
 3   abstract                 5 non-null      object 
 4   abstract_translated      5 non-null      boolean
 5   cpc                      5 non-null      object 
 6   cpc_low                  5 non-null      object 
 7   cpc_inventive_low        5 non-null      object 
 8   top_terms                5 non-null      object 
 9   similar                  5 non-null      object 
 10  url                      5 non-null      object 
 11  country                  5 non-null      object 
 12  publication_description  5 non-null      object 
 13  cited_by                 5 non-null      object 
 14  embedding_v1             5 non

## Lets embed all the code in a try statement

### in real settings it is always advisable to embed queries to external services within try statements to manage potential errors.


In [14]:
try:
    client = bigquery.Client()
    bigquery_connection = client.dataset("google_patents_research", project="bigquery-public-data")
    google_patents_dataset = client.get_dataset(bigquery_connection)
    table_connection = google_patents_dataset.table("publications_202204")
    table_publications_202204 = client.get_table(table_connection)
    publications_202204DataFrame=client.list_rows(table_publications_202204, max_results=50000).to_dataframe()

except:
    print("unable to peform the operation")


In [15]:
publications_202204DataFrame.sample(10)

Unnamed: 0,publication_number,title,title_translated,abstract,abstract_translated,cpc,cpc_low,cpc_inventive_low,top_terms,similar,url,country,publication_description,cited_by,embedding_v1
38481,WO-2005077549-A8,"Thin film transistor and display device, and m...",False,The present invention discloses a display devi...,False,"[{'code': 'G02F1/136295', 'inventive': True, '...","[G02F1/136295, G02F1/136286, G02F1/1362, G02F1...","[G02F1/136295, G02F1/136286, G02F1/1362, G02F1...","[region, display, thin film, film transistor, ...","[{'publication_number': 'WO-2005077549-A8', 'a...",https://patents.google.com/patent/WO2005077549A8,WIPO (PCT),Modified first page,[],"[-0.074693814, -0.09298959, -0.096753806, 0.01..."
13995,WO-2005114862-A3,System and method for frequency burst detectio...,False,"A system, method and computer program product ...",False,"[{'code': 'H04L2007/047', 'inventive': False, ...","[H04L2007/047, H04L7/041, H04L7/04, H04L7/00, ...","[H04L2007/047, H04L7/041, H04L7/04, H04L7/00, ...","[frequency burst, prediction error, current sa...","[{'publication_number': 'WO-2005114862-A3', 'a...",https://patents.google.com/patent/WO2005114862A3,WIPO (PCT),Later publication of ISR with revised front page,[],"[-0.11058439, 0.0023948378, -0.2558889, -0.055..."
40507,WO-2019227787-A1,Coaxial loudspeaker,False,"Disclosed is a coaxial loudspeaker, having bet...",False,"[{'code': 'H04R9/063', 'inventive': True, 'fir...","[H04R9/063, H04R9/06, H04R9/00, H04R, H04, H, ...","[H04R9/063, H04R9/06, H04R9/00, H04R, H04, H, ...","[cabin, voice coil, coaxial speaker, iron, dus...","[{'publication_number': 'US-4554414-A', 'appli...",https://patents.google.com/patent/WO2019227787A1,WIPO (PCT),International application published with inter...,[],"[-0.043565758, -0.2737519, -0.08199306, 0.0981..."
14173,WO-2007055227-A1,"Preference information providing unit, content...",False,[PROBLEMS] To customize a product without trou...,False,"[{'code': 'G06F16/217', 'inventive': True, 'fi...","[G06F16/217, G06F16/21, G06F16/20, G06F16/00, ...","[G06F16/217, G06F16/21, G06F16/20, G06F16/00, ...","[information, preference, content, user, prefe...","[{'publication_number': 'US-9247301-B2', 'appl...",https://patents.google.com/patent/WO2007055227A1,WIPO (PCT),International application published with inter...,[],"[0.048365004, -0.15274896, -0.04066298, 0.0711..."
31239,WO-2017177957-A1,Non-local adaptive loop filter,False,Aspects of the disclosure provide a method for...,False,"[{'code': 'H04N19/186', 'inventive': True, 'fi...","[H04N19/186, H04N19/169, H04N19/10, H04N19/00,...","[H04N19/186, H04N19/169, H04N19/10, H04N19/00,...","[patch, patches, search, current, similar, pic...","[{'publication_number': 'US-10462459-B2', 'app...",https://patents.google.com/patent/WO2017177957A1,WIPO (PCT),International application published with inter...,"[{'publication_number': 'WO-2018166513-A1', 'a...","[0.006206031, -0.077904694, -0.27887297, 0.048..."
35948,WO-2015038927-A3,Waveguide superlattices for high density photo...,False,An apparatus and method for transmitting a plu...,False,"[{'code': 'G02B6/122', 'inventive': True, 'fir...","[G02B6/122, G02B6/12, G02B6/10, G02B6/00, G02B...","[G02B6/122, G02B6/12, G02B6/10, G02B6/00, G02B...","[waveguides, waveguide, light, superlattices, ...","[{'publication_number': 'WO-2015038927-A3', 'a...",https://patents.google.com/patent/WO2015038927A3,WIPO (PCT),Later publication of ISR with revised front page,[],"[-0.05519917, -0.1179679, -0.21365029, 0.02690..."
43011,WO-2010078263-A3,Methods and systems for observing sensor param...,False,The invention disclosed herein provides method...,False,"[{'code': 'A61B5/14532', 'inventive': True, 'f...","[A61B5/14532, A61B5/145, A61B5/00, A61B, A61, ...","[A61B5/14532, A61B5/145, A61B5/00, A61B, A61, ...","[sensor, methods, state, systems, sensor param...","[{'publication_number': 'WO-2010078263-A3', 'a...",https://patents.google.com/patent/WO2010078263A3,WIPO (PCT),Later publication of ISR with revised front page,[],"[0.060153298, -0.18704945, -0.13873196, -0.158..."
32135,WO-2010072418-A1,A process for purifying a polymer mixture,False,A process for purifying copolymer peptides suc...,False,"[{'code': 'C07K14/00', 'inventive': True, 'fir...","[C07K14/00, C07K, C07, C, C07K14/001, C08G69/1...","[C07K14/00, C07K, C07, C, C07K14/001, C08G69/1...","[cop, reaction mixture, polypeptides, acid, ul...","[{'publication_number': 'AU-2005302500-B2', 'a...",https://patents.google.com/patent/WO2010072418A1,WIPO (PCT),International application published with inter...,"[{'publication_number': 'US-8399600-B2', 'appl...","[-0.043658167, -0.025953328, 0.0062218388, -0...."
30255,WO-2007042708-A1,Method and apparatus for characterizing skin b...,False,The invention concerns a method and an apparat...,False,"[{'code': 'G06K9/00', 'inventive': True, 'firs...","[G06K9/00, G06K, G06, G, G06T7/00, G06T, G06T7...","[G06K9/00, G06K, G06, G, G06T7/00, G06T, G06T7...","[image, skin, area, imperfections, zone, digit...","[{'publication_number': 'EP-1932118-B2', 'appl...",https://patents.google.com/patent/WO2007042708A1,WIPO (PCT),International application published with inter...,"[{'publication_number': 'DE-102010016631-A1', ...","[-0.13018943, -0.14916857, -0.04442721, -0.094..."
25038,WO-2018111468-A1,Front end motor-generator system and hybrid el...,False,A system and method are provided for hybrid el...,False,"[{'code': 'B60L58/15', 'inventive': True, 'fir...","[B60L58/15, B60L58/12, B60L58/10, B60L58/00, B...","[B60L58/15, B60L58/12, B60L58/10, B60L58/00, B...","[engine, motor, generator, operating, clutch, ...","[{'publication_number': 'US-10543833-B2', 'app...",https://patents.google.com/patent/WO2018111468A1,WIPO (PCT),International application published with inter...,[],"[0.00063194835, -0.20225392, 0.11793445, -0.22..."
