# Obtaining priority data from WIPO PatentScope

**Version**: Dec 16 2020

Reference: [Web Scraping using Selenium and Python](https://www.scrapingbee.com/blog/selenium-python/)

## Import the package.

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException

## Set options for Google Chrome and create a Chrome WebDriver.

In [2]:
# set chrome options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# create a chrome webdriver
driver = webdriver.Chrome('/usr/bin/chromedriver', options=chrome_options)

# wait for the requested element x seconds before throwing error
# tried to debug the NoSuchElementException on obtaining priority data (see below)
#driver.implicitly_wait(10)

## Navigate to the target webpage.

In [3]:
driver.get('https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2001029057')

## Search for the priority data based on the HTML identifier.
You can use various HTML element identifiers such as tag name, class name, ID, XPath, etc. To find this:
1. Open the target page in a browser, 
2. Inspect webpage elements (access on Windows with Ctrl-Shift-C), and
3. Locate in the HTML code the class name (`class="X"`), ID name (`id="X"`), or other identifier corresponding to the data of interest.

### Here's a search by id. 

In [4]:
search_id = "detailMainForm:PCTBIBLIO_content"

try:
    mydata = driver.find_element_by_id(search_id)
    print(mydata.text)
except NoSuchElementException as e:
    print(e)
    print("The request is invalid, or there is no biblio data")

Publication Number
WO/2001/029057
Publication Date
26.04.2001
International Application No.
PCT/US2000/027963
International Filing Date
11.10.2000
Chapter 2 Demand Filed
10.05.2001
IPC
A61K 38/00 2006.01 C07H 21/00 2006.01 C12N 15/11 2006.01
CPC
A61K 38/00 A61P 35/00 C07H 21/00 C12N 15/113 C12N 2310/315 C12N 2310/321
View more classifications
Applicants
ISIS PHARMACEUTICALS, INC. [US]/[US] (AllExceptUS)
TAYLOR, Jennifer, K. [US]/[US] (UsOnly)
COWSERT, Lex, M. [US]/[US] (UsOnly)
Inventors
TAYLOR, Jennifer, K.
COWSERT, Lex, M.
Agents
LICATA, Jane, Massey
Priority Data
09/418,640 15.10.1999 US
Publication Language
English (EN)
Filing Language
English (EN)
Designated States
View all




Title
(EN) ANTISENSE MODULATION OF BCL-6 EXPRESSION
(FR) MODULATION ANTISENS DE L'EXPRESSION DE BCL-6
Abstract
(EN)
Antisense compounds, compositions and methods are provided for modulating the expression of bcl-6. The compositions comprise antisense coumpounds, particularly antisense oligonucleotides, targ

### Here's a search by class name. The `div` tag of this class is within the `div` tag of the above id search.
The output is nearly identical. The only this output does NOT have is the last line: "Latest bibliographic data on file with the International Bureau".

In [5]:
search_class = "ps-biblio-data"
try:
    mydata = driver.find_element_by_class_name(search_class)
    print(mydata.text)
except NoSuchElementException as e:
    print(e)
    print("The request is invalid, or there is no biblio data")

Publication Number
WO/2001/029057
Publication Date
26.04.2001
International Application No.
PCT/US2000/027963
International Filing Date
11.10.2000
Chapter 2 Demand Filed
10.05.2001
IPC
A61K 38/00 2006.01 C07H 21/00 2006.01 C12N 15/11 2006.01
CPC
A61K 38/00 A61P 35/00 C07H 21/00 C12N 15/113 C12N 2310/315 C12N 2310/321
View more classifications
Applicants
ISIS PHARMACEUTICALS, INC. [US]/[US] (AllExceptUS)
TAYLOR, Jennifer, K. [US]/[US] (UsOnly)
COWSERT, Lex, M. [US]/[US] (UsOnly)
Inventors
TAYLOR, Jennifer, K.
COWSERT, Lex, M.
Agents
LICATA, Jane, Massey
Priority Data
09/418,640 15.10.1999 US
Publication Language
English (EN)
Filing Language
English (EN)
Designated States
View all




Title
(EN) ANTISENSE MODULATION OF BCL-6 EXPRESSION
(FR) MODULATION ANTISENS DE L'EXPRESSION DE BCL-6
Abstract
(EN)
Antisense compounds, compositions and methods are provided for modulating the expression of bcl-6. The compositions comprise antisense coumpounds, particularly antisense oligonucleotides, targ

### Here's a further subset of the data.

In [6]:
search_class = "ps-biblio-data--biblio-card"
try:
    mydata = driver.find_element_by_class_name(search_class)
    print(mydata.text)
except NoSuchElementException as e:
    print(e)
    print("The request is invalid, or there is no biblio data")

Publication Number
WO/2001/029057
Publication Date
26.04.2001
International Application No.
PCT/US2000/027963
International Filing Date
11.10.2000
Chapter 2 Demand Filed
10.05.2001
IPC
A61K 38/00 2006.01 C07H 21/00 2006.01 C12N 15/11 2006.01
CPC
A61K 38/00 A61P 35/00 C07H 21/00 C12N 15/113 C12N 2310/315 C12N 2310/321
View more classifications
Applicants
ISIS PHARMACEUTICALS, INC. [US]/[US] (AllExceptUS)
TAYLOR, Jennifer, K. [US]/[US] (UsOnly)
COWSERT, Lex, M. [US]/[US] (UsOnly)
Inventors
TAYLOR, Jennifer, K.
COWSERT, Lex, M.
Agents
LICATA, Jane, Massey
Priority Data
09/418,640 15.10.1999 US
Publication Language
English (EN)
Filing Language
English (EN)
Designated States
View all


### We then hone in on the "Priority Data" and search by its corresponding id.
This part doesn't seem to work consistently. Sometimes it returns the expected output of:
```
09/418,640 15.10.1999 US
```
Most times, however, I get a `NoSuchElementException` from Selenium.

In [7]:
search_id = "detailMainForm:pctBiblio:j_idt3405"

try:
    mydata = driver.find_element_by_id(search_id)
    print(mydata.text)
except NoSuchElementException as e:
    print(e)
    print("The request is invalid, or there is no biblio data")

Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="detailMainForm:pctBiblio:j_idt3405"]"}
  (Session info: headless chrome=87.0.4280.88)

The request is invalid, or there is no biblio data


## When finished, exit the browser session.

In [8]:
driver.quit()