# Brief walkthrough of the Python EPO OPS Client

**Version**: Dec 14 2020

Reference: [python-epo-ops-client](https://github.com/gsong/python-epo-ops-client)

Reference: [EPO OPS Reference Guide](http://documents.epo.org/projects/babylon/eponet.nsf/0/F3ECDCC915C9BCD8C1258060003AA712/$File/ops_v3.2_documentation_-_version_1.3.16_en.pdf)

## Import the package and create a client.

NOTE: Before you do this you will need to request access credentials from the European Patent Office (EPO)'s Open Patent Services (OPS). See more in their [developer's hub](https://developers.epo.org/).

In [1]:
import epo_ops
client = epo_ops.Client(key='ENTER_YOUR_KEY_HERE', secret='ENTER_YOUR_SECRET_KEY_HERE', accept_type='json')

## Let's see what the documentation says for requesting published data.

In [2]:
help(client.published_data)

Help on method published_data in module epo_ops.api:

published_data(reference_type, input, endpoint='biblio', constituents=None) method of epo_ops.api.Client instance



## Consider the different input types for a search request.

According to `help(epo_ops.models)` (long output not shown), there are three possible input types:
* Docdb
* Epodoc
* Original

According to the [EPO website](https://www.epo.org/service-support/faq/online-services/ops.html):

> In the EPODOC format, only the country code (CC) and document number are mandatory and must be given in one string. Leading zeros can generally be ignored, e.g.:
> 
> ```
> <doc-number>EP453930</doc-number>
> ```
> 
> In the DOCDB format, the country code (CC), document number and kind code (KC) are all mandatory. If you don't know the kind code, you can replace it in full or in part with the wildcard '%':
> 
> ```
> <country>EP</country>
> <doc-number>453930</doc-number>
> <kind>A2</kind>
> ```

We'll use Epodoc based on the search number we have.


In [3]:
help(epo_ops.models.Epodoc)

Help on class Epodoc in module epo_ops.models:

class Epodoc(BaseInput)
 |  Method resolution order:
 |      Epodoc
 |      BaseInput
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, number, kind_code=None, date=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from BaseInput:
 |  
 |  as_api_input(self)
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from BaseInput:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



## Let's request information about application no. EP00970724.

In [4]:
response = client.published_data(                      # Retrieve bibliography data
  reference_type = 'application',                      # publication, application, priority
  input = epo_ops.models.Epodoc('EP00970724'),         # original, docdb, epodoc
  endpoint = 'biblio',                                 # optional, defaults to biblio in case of published_data
  constituents = ['biblio']                            # optional, e.g., full-cycle, images, biblio, abstract
)

In [5]:
response

<Response [200]>

## The http request response should be of code 200 for a successful response.

From [Mozilla Developer Network](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status):

> HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped in five classes:
> 1. Informational responses (100–199)
> 2. Successful responses (200–299)
> 3. Redirects (300–399)
> 4. Client errors (400–499)
> 5. Server errors (500–599)


## Now let's extract the data out of this response.

Check out the docstring on the response object. There is a promising `content` attribute.

In [6]:
help(response)

Help on Response in module requests.models object:

class Response(builtins.object)
 |  The :class:`Response <Response>` object, which contains a
 |  server's response to an HTTP request.
 |  
 |  Methods defined here:
 |  
 |  __bool__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code, is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, *args)
 |  
 |  __getstate__(self)
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self)
 |      Allows you to use a response as an iterator.
 |  
 |  __nonzero__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if

## We can also directly see the attributes of `response` by calling `dir()`.

In [7]:
dir(response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

## The `content` attribute provides the JSON output as a byte-encoded string.

NOTE: The output is in JSON format because that's what we specified when we created the `Client` object up above (parameter `accept_type`). Otherwise the output will be in XML format.

In [8]:
response.content

b'{"ops:world-patent-data":{"@xmlns":{"ops":"http://ops.epo.org","$":"http://www.epo.org/exchange","xlink":"http://www.w3.org/1999/xlink"},"exchange-documents":{"exchange-document":[{"@system":"ops.epo.org","@family-id":"23658965","@country":"EP","@doc-number":"1220866","@kind":"A1","bibliographic-data":{"publication-reference":{"document-id":[{"@document-id-type":"docdb","country":{"$":"EP"},"doc-number":{"$":"1220866"},"kind":{"$":"A1"},"date":{"$":"20020710"}},{"@document-id-type":"epodoc","doc-number":{"$":"EP1220866"},"date":{"$":"20020710"}}]},"classification-ipc":{"text":[{"$":"C07H21/04"},{"$":"A61K48/00"},{"$":"C07H21/02"},{"$":"C12N15/85"},{"$":"C12N15/86"},{"$":"C12Q1/68"}]},"classifications-ipcr":{"classification-ipcr":[{"@sequence":"1","text":{"$":"C12N  15/    09            A I"}},{"@sequence":"2","text":{"$":"A61K  31/  7088            A I"}},{"@sequence":"3","text":{"$":"A61K  48/    00            A I"}},{"@sequence":"4","text":{"$":"A61P  35/    00            A I"}},{"

## We then pull out the JSON data from the byte-encoded string.

In [9]:
mydata = response.json()
mydata

{'ops:world-patent-data': {'@xmlns': {'ops': 'http://ops.epo.org',
   '$': 'http://www.epo.org/exchange',
   'xlink': 'http://www.w3.org/1999/xlink'},
  'exchange-documents': {'exchange-document': [{'@system': 'ops.epo.org',
     '@family-id': '23658965',
     '@country': 'EP',
     '@doc-number': '1220866',
     '@kind': 'A1',
     'bibliographic-data': {'publication-reference': {'document-id': [{'@document-id-type': 'docdb',
         'country': {'$': 'EP'},
         'doc-number': {'$': '1220866'},
         'kind': {'$': 'A1'},
         'date': {'$': '20020710'}},
        {'@document-id-type': 'epodoc',
         'doc-number': {'$': 'EP1220866'},
         'date': {'$': '20020710'}}]},
      'classification-ipc': {'text': [{'$': 'C07H21/04'},
        {'$': 'A61K48/00'},
        {'$': 'C07H21/02'},
        {'$': 'C12N15/85'},
        {'$': 'C12N15/86'},
        {'$': 'C12Q1/68'}]},
      'classifications-ipcr': {'classification-ipcr': [{'@sequence': '1',
         'text': {'$': 'C12N  15/

## Gotta dig deep in order to find the priority data.

A helpful step here is to call `dict_name.keys()` to traverse the dictionary, rather than trying to parse the curly braces by eye.

## The path inwards is pretty straightforward, but then we notice that there is a _list_ of two objects about 3 levels in.

It turns out they represent different **kind codes** of the same application. The kind code specifies the "document type," such as patent application publication, a re-publication, a corrected publication, a patent, etc. You can find kind code summary tables online.

We get all related kind codes in our output since earlier we specified the Epodoc format without providing a kind code when entering our search request.

In [10]:
len(mydata['ops:world-patent-data']['exchange-documents']['exchange-document'])

2

In [11]:
mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][-1].keys()

dict_keys(['@system', '@family-id', '@country', '@doc-number', '@kind', 'bibliographic-data'])

In [12]:
mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][0]['@kind']

'A1'

In [13]:
mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][1]['@kind']

'A4'

## Let's take the latest document available and keep probing.

In [14]:
mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][-1]

{'@system': 'ops.epo.org',
 '@family-id': '23658965',
 '@country': 'EP',
 '@doc-number': '1220866',
 '@kind': 'A4',
 'bibliographic-data': {'publication-reference': {'document-id': [{'@document-id-type': 'docdb',
     'country': {'$': 'EP'},
     'doc-number': {'$': '1220866'},
     'kind': {'$': 'A4'},
     'date': {'$': '20050119'}},
    {'@document-id-type': 'epodoc',
     'doc-number': {'$': 'EP1220866'},
     'date': {'$': '20050119'}}]},
  'classification-ipc': {'text': [{'$': 'C07H21/04'},
    {'$': 'A61K48/00'},
    {'$': 'C07H21/02'},
    {'$': 'C12N15/85'},
    {'$': 'C12N15/86'},
    {'$': 'C12Q1/68'}]},
  'classifications-ipcr': {'classification-ipcr': [{'@sequence': '1',
     'text': {'$': 'C12N  15/    09            A I'}},
    {'@sequence': '2', 'text': {'$': 'A61K  31/  7088            A I'}},
    {'@sequence': '3', 'text': {'$': 'A61K  48/    00            A I'}},
    {'@sequence': '4', 'text': {'$': 'A61P  35/    00            A I'}},
    {'@sequence': '5', 'text': {'

In [15]:
mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][-1][
    'bibliographic-data']['priority-claims']['priority-claim']

[{'@sequence': '1',
  '@kind': 'national',
  'document-id': [{'@document-id-type': 'epodoc',
    'doc-number': {'$': 'WO2000US27963'},
    'date': {'$': '20001011'}},
   {'@document-id-type': 'original', 'doc-number': {'$': 'US0027963'}},
   {'@document-id-type': 'original', 'doc-number': {'$': 'US2000027963'}}]},
 {'@sequence': '2',
  '@kind': 'national',
  'document-id': [{'@document-id-type': 'epodoc',
    'doc-number': {'$': 'US19990418640'},
    'date': {'$': '19991015'}},
   {'@document-id-type': 'original', 'doc-number': {'$': '418640'}}]}]

## We're almost there! The value inside the `priority-claim` key is a list of two objects.

The current application has priority claims to two other documents. Let's loop through the list and print out those document numbers.

In [16]:
pc_list = mydata['ops:world-patent-data']['exchange-documents']['exchange-document'][-1][
    'bibliographic-data']['priority-claims']['priority-claim']

for pc in pc_list:
    
    # the list in document-id may have 2-3 (or more?) objects
    # first object in list is of document-id-type EPODOC; second is ORIGINAL
    print(pc['document-id'][0]['doc-number']['$'])

WO2000US27963
US19990418640


## If we compare this to the results from an [Espacenet search](https://worldwide.espacenet.com/patent/search/family/023658965/publication/EP1220866A1?q=EP00970724) we see that it matches.

There are two priority documents listed for this application:
1. US0027963W·2000-10-11
2. US41864099A·1999-10-15

Note that we were **not** able to access the "Published as" data in this API request, which are:
* AU8005900A
* EP1220866A1
* EP1220866A4
* JP2003512037A
* US6140125A
* WO0129057A1
