# Obtaining application / publication numbers from the Japan Patent Office

**Version**: Dec 17 2020

Data acquisition using Selenium and Python.

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException

In [2]:
# set chrome options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# create a chrome webdriver
driver = webdriver.Chrome('/usr/bin/chromedriver', options=chrome_options)

## Navigate to the Japan Patent Office search page.

In [3]:
driver.get("https://www.j-platpat.inpit.go.jp/p0000")

## Change the language from Japanese to English.

In [4]:
# click on the English link and make sure language changed
driver.find_element_by_link_text("English").click()
language_elem = driver.find_element_by_id("cfc001_header_lnkLangChange")
print(language_elem.text)
if language_elem.text != "Japanese":
    print("error in changing language from Japanese to English")

Japanese


## Try to select on the "Number type" drop-down menu.

I first tried this using the Selenium `Select` object, but this webpage uses `<mat-select>` tags which is incompatible.

In [5]:
from selenium.webdriver.support.ui import Select

select = Select(driver.find_element_by_id("p00_srchCondtn_selDocNoInputType0"))

# select by visible text
select.select_by_visible_text('Patent application number')

# select by value 
select.select_by_value('1')

UnexpectedTagNameException: Message: Select only works on <select> elements, not on <mat-select>


## Click on the drop-down arrow on the "Number type" menu.

In [6]:
driver.find_element_by_xpath('//*[@id="p00_srchCondtn_selDocNoInputType0"]/div/div[2]').click()

## Check what the "Number type" value is. 
If the request type is the same as this value, we don't need to alter it.

In [7]:
inputElement = driver.find_element_by_id("p00_srchCondtn_selDocNoInputType0")
inputElement.text

'Patent application number'

## Let's try to click on another option for publication number.
This approach, looking for the element by its xpath, doesn't seem to work.

In [8]:
driver.find_element_by_xpath('//*[@id="mat-option-13"]/span').click()

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="mat-option-13"]/span"}
  (Session info: headless chrome=87.0.4280.88)


## Instead of using the xpath identifier, search for the specific `mat-option` text to find the selection to click.

In [9]:
# https://stackoverflow.com/a/54117155
driver.find_element_by_xpath("//mat-option/span[contains(.,'Publication number of Japanese translation of PCT international application (A)')]").click()

## Check this by re-calling the value of the "Number type" menu as done earlier.

In [10]:
inputElement = driver.find_element_by_id("p00_srchCondtn_selDocNoInputType0")
inputElement.text

'Publication number, Publication number of Japanese translation of PCT international application (A)'

## Alternatively, take a screenshot of the element to see that the selection was updated.

You can also take a full screen screenshot by the following:
```
driver.get_screenshot_as_file("screenshot.png")

```

Note: You may need to change the webpage of the driver to see the full screen. Do this in the browser options before creating the driver.

In [11]:
driver.find_element_by_xpath('//*[@id="p00_srchCondtn_selDocNoInputType0"]/div/div[2]').screenshot
driver.save_screenshot("test.png")

True

## Change the "Number type" back to "Patent application number".

In [12]:
driver.find_element_by_xpath('//*[@id="p00_srchCondtn_selDocNoInputType0"]/div/div[2]').click()
driver.find_element_by_xpath("//mat-option/span[contains(.,'Patent application number')]").click()
inputElement = driver.find_element_by_id("p00_srchCondtn_selDocNoInputType0")
inputElement.text

'Patent application number'

## Enter in the publication number to search.

In [13]:
inputElement = driver.find_element_by_id("p00_srchCondtn_txtDocNoInputNo0")
inputElement.send_keys('2001-531855')

In [14]:
driver.get_screenshot_as_file("test.png")

True

## Click the search button.
Note that on this webpage hitting the Enter button does not trigger search. If it did, one could try:
```
inputElement.send_keys(Keys.ENTER)
```

In [15]:
driver.find_element_by_xpath('//*[@id="p00_searchBtn_btnDocInquiry"]').click()

In [16]:
driver.get_screenshot_as_file("test.png")

True

## Now we're on the results page. Pull out the application and publication numbers.
The page has updated, but the URL is the same. Before moving on, we should check that a result actually shows up. Assuming it does, however:

In [17]:
appno = driver.find_element_by_xpath('//*[@id="patentUtltyIntnlNumOnlyLst_tableView_appNum0"]/label').text
appno

'JP,2001-531855'

In [18]:
pubno = driver.find_element_by_xpath('//*[@id="patentUtltyIntnlNumOnlyLst_tableView_publicNumArea0"]/a').text
pubno

'JP,2003-512037,A'

## That gives us the desired information. We could go even further and click the publication number link.

It opens up a new window: `https://www.j-platpat.inpit.go.jp/p0200`. There, we could extract more information about this patent application. Opening the "Bibliography" section gives the following:

```
(19) [Publication country] Japan Patent Office (JP)
(12) [Kind of official gazette] Published Japanese translations of PCT international publication for patent applications (A)
(11) [Publication number of Japanese translation of PCT international application] JP 2003 - 512037A (P2003-512037A)
(43) [Publication date of Japanese translation of PCT international application] Heisei 15(2003) April 2 (2003.4.2)
(54) [Title of the invention] Antisense modulation of Bcl - 6 expression
(51) [International Patent Classification 7th Edition]
C12N 15/09    ZNA
A61K 31/7088
48/00
A61P 35/00
C12N  5/06
[FI]
A61K 31/7088               
48/00                 
A61P 35/00                 
C12N 15/00    ZNA A        
5/00        E        
[Request for examination] Y
[Request for preliminary examination] Y
[Total number of pages] 104
(21) [Application number] Japanese Patent Application No. 2001-531855 (P2001-531855)
(86)(22)[Filing date]Heisei 12(2000) October 11 (2000.10.11)   
(85) [Submission date of translated text] Heisei 14(2002) April 11 (2002.4.11)
(86) [International application number] PCT/US00/27963
(87) [International publication number] WO01/029057
(87) [International publication date] Heisei 13(2001) April 26 (2001.4.26)
(31) [Application number of the priority] 09/418,640
(32) [Priority date] Heisei 11(1999) October 15 (1999.10.15)
(33) [Priority claim country] U.S. (US)
(81) [Designated country/region] EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE), OA (BF, BJ, CF, CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AE, AG, AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, CU, CZ, DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, TZ, UA, UG, US, UZ, VN, YU,ZA,ZW
(71) [Applicant]
[Name]The Isis Farr Matthew Thika Luce Inc.
[Name (in original language)]Isis  PharmaceuticalsInc
(72) [Inventor]
[Name]Taylor, Jennifer Kay
(72) [Inventor]
[Name]Cow ***, REXX em
(74) [Representative]
[Patent attorney]
[Name] SHAMOTO, Ichio    (and 5 others)
[Theme code (reference)]
4B024
4B065
4C084
4C086
[F-term (reference) ]
4B024 AA01 CA05 HA12
4B065 AA93X BB14 BD35 CA46
4C084 AA13 MA01 NA14 ZB212 ZB262
4C086 AA01 AA02 AA03 EA16 MA01 MA04 NA14 ZB21 ZB26
```