# Using the OpenAlex API

##  🎓 About OpenAlex

[OpenAlex](https://docs.openalex.org/) is a great resource for searching and accessing **scholarly publication data** comprehensively. Furthermore, OpenAlex holds data about **authors, institutions and sources** and more [entities](https://docs.openalex.org/api-entities/entities-overview) relating to those publications. For this, it brings together data from a series well-known [data sources](https://docs.openalex.org/additional-help/faq#where-does-your-data-come-from).

And, what's maybe the best about it: OpenAlex data is published under the Creative Commons Public Domain Dedication ([CC0](https://creativecommons.org/publicdomain/zero/1.0/)) - waiving copyright at all!

**💻 Snapshots and API**

OpenAlex offers data [snapshots](https://docs.openalex.org/download-snapshot) on a regularly basis, but can accessed far more nuanced via [**REST API**](https://docs.openalex.org/how-to-use-the-api/api-overview). This notebook will give you a small overview about its basic functionality and the data which can be accessed there.

There is one thing which is helpful to understand and use OpenAlex' API: It consists of different endpoints representing the different entity types, OpenAlex provides data on:
* `/works` - scholarly publications like journal articles, book chapters etc.
* `/authors` - the contributors of these publications
* `/topics` - the subjects, which are treated by works
* `/institutions` - the institutions, which host contributors of publications like universities
* `/sources` - the sources which hold the single works, like journals, series, conferences, and repositories
* `/publishers`- companies and organizations that distribute works
* `/funders`- organizations that fund research


This notebook will show these endpoints, but not all possible actions. Despite this, OpenAlex API is constantly growing: To get it all, consult the [documentation](https://docs.openalex.org/how-to-use-the-api/api-overview).

**☝️ Caveat**

Gathering and integrating data of scholarly publications is a tough thing to do. Data sources are manifold and even original data is often incomplete. So, also data of aggregators like OpenAlex will not be 100% complete, free from errors or duplicates. **Handle all derivative numbers with care** and draw conclusions with respect to this imperfect data base. If possible, use other data sources to crosscheck your findings.

**😇 Be polite and show who you are**

One more essential hint about using the OpenAlex API: You can access it freely and without any API key or any other authentication. However, it is recommended to be "polite" and show who you are resp. that you are human by **specifying your e-mail address**. This also will give you faster and more consistent **response times**!

In [1]:
# Check Python version
!python --version

Python 3.11.5


In [2]:
# Install necessary libraries if not already available
#!pip install requests pandas

In [3]:
# Import the necessary libraries
import json, requests
import pandas as pd

In [4]:
# IMPORTANT: Fill in your e-mail address
email = '...'

## 📚 Works endpoint

The core of OpenAlex data are **scholarly publications**. These are are represented as ***Work* entities**, which can be retrieved via the `/works` endpoint. Single *Work* entities can be directly accessed by their OpenAlex identifier (starting with a "W" like `W3093430496`) or by their Digital Object Identifier (DOI) respectively. You can access a random publication with `https://api.openalex.org/works/random`.

To learn about the attributes a *Work* entity can have, take a look into the [Work](https://docs.openalex.org/about-the-data/work) object documentation.

We'll start by accessing a single *Work* entity aka a publication record. The code cell below queries the OpenAlex API (via it's `/works` endpoint). The base URL is completed with an identifier and enriched with the e-mail statement.

There are several examples you might try by commenting out the identifiers and adapt the request accordingly.

In [5]:
base_url = 'https://api.openalex.org/works/'
params = {'mailto': email}

# Using different work identifiers
openalex_id = 'W3093430496'                               # doctoral thesis
#openalex_id = 'W2489129678'                              # monograph
#openalex_id = 'W4200635144'                             # conference publication
#doi = 'doi:10.1007/978-3-319-20319-5_4'                  # book part
#doi = 'doi:10.1007/10.22617/tim210529'                   # report

doi =  'doi:https://doi.org/10.1007/s10479-016-2314-1'    # green OA article
#doi = 'doi:https://doi.org/10.1371/journal.pone.0121874' # gold OA article
#doi = 'doi:https://doi.org/10.1007/s10489-020-02029-z'   # bronze OA article
#doi = 'https://doi.org/10.1134/s2079086415010053'         # closed article

    
# Adapt identifier if necessary: doi, openalex_id
r = requests.get(base_url + doi, params=params)
data = r.json()
type(data)

dict

The data answer of the OpenAlex API is [JSON](https://en.wikipedia.org/wiki/JSON) formatted. Therefore, it is decoded with the `json()` command and written into an Python object named `data`. Resembling the JSON structure, this is a Python [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).

This request and decoding method will be used throughout the whole notebook.

The `list()` function gives you an overview of the available data fields (which are the *Work*'s attributes) in the dictionary by listing all its *keys*. Calling a certain attribute *value* can easily done with its key, for instance `data['id']`.

In [6]:
# Overview of available data fields
list(data)

['id',
 'doi',
 'title',
 'display_name',
 'publication_year',
 'publication_date',
 'ids',
 'language',
 'primary_location',
 'type',
 'type_crossref',
 'indexed_in',
 'open_access',
 'authorships',
 'countries_distinct_count',
 'institutions_distinct_count',
 'corresponding_author_ids',
 'corresponding_institution_ids',
 'apc_list',
 'apc_paid',
 'has_fulltext',
 'fulltext_origin',
 'cited_by_count',
 'cited_by_percentile_year',
 'biblio',
 'is_retracted',
 'is_paratext',
 'primary_topic',
 'topics',
 'keywords',
 'concepts',
 'mesh',
 'locations_count',
 'locations',
 'best_oa_location',
 'sustainable_development_goals',
 'grants',
 'datasets',
 'versions',
 'referenced_works_count',
 'referenced_works',
 'related_works',
 'ngrams_url',
 'abstract_inverted_index',
 'cited_by_api_url',
 'counts_by_year',
 'updated_date',
 'created_date']

In [7]:
# Calling the OpenAlex work ID field by its key ("id")
data['id']

'https://openalex.org/W2518998880'

In [8]:
# Publication title
data['title']

'A framework for investigating optimization of service parts performance with big data'

In [9]:
# All available identifiers, given as an own Python dictionary
data['ids']

{'openalex': 'https://openalex.org/W2518998880',
 'doi': 'https://doi.org/10.1007/s10479-016-2314-1',
 'mag': '2518998880'}

In [10]:
# Publication type
data['type']

'article'

In [11]:
# Bibliographic information
data['biblio']

{'volume': '270', 'issue': '1-2', 'first_page': '65', 'last_page': '74'}

In [12]:
# Has OpenAlex a fulltext available (only) for indexing?
data['has_fulltext']

True

In [13]:
# Fulltext source information
data['fulltext_origin']

'ngrams'

In [14]:
# See ngrams made from fulltext
data['ngrams_url']

'https://api.openalex.org/works/W2518998880/ngrams'

In [15]:
# Bibliographic information of primary location (= publication medium/infrastructure)
# "issn_l" should be used preferably of all ISSNs while using OpenAlex API with ISSN identifier
data['primary_location']

{'is_oa': False,
 'landing_page_url': 'https://doi.org/10.1007/s10479-016-2314-1',
 'pdf_url': None,
 'source': {'id': 'https://openalex.org/S57667410',
  'display_name': 'Annals of operation research/Annals of operations research',
  'issn_l': '0254-5330',
  'issn': ['0254-5330', '1572-9338'],
  'is_oa': False,
  'is_in_doaj': False,
  'host_organization': 'https://openalex.org/P4310319900',
  'host_organization_name': 'Springer Science+Business Media',
  'host_organization_lineage': ['https://openalex.org/P4310319965',
   'https://openalex.org/P4310319900'],
  'host_organization_lineage_names': ['Springer Nature',
   'Springer Science+Business Media'],
  'type': 'journal'},
 'license': None,
 'version': None,
 'is_accepted': False,
 'is_published': False}

In [16]:
# Alternative locations, especially to provide Open Access versions of closed publications
data['locations']

[{'is_oa': False,
  'landing_page_url': 'https://doi.org/10.1007/s10479-016-2314-1',
  'pdf_url': None,
  'source': {'id': 'https://openalex.org/S57667410',
   'display_name': 'Annals of operation research/Annals of operations research',
   'issn_l': '0254-5330',
   'issn': ['0254-5330', '1572-9338'],
   'is_oa': False,
   'is_in_doaj': False,
   'host_organization': 'https://openalex.org/P4310319900',
   'host_organization_name': 'Springer Science+Business Media',
   'host_organization_lineage': ['https://openalex.org/P4310319965',
    'https://openalex.org/P4310319900'],
   'host_organization_lineage_names': ['Springer Nature',
    'Springer Science+Business Media'],
   'type': 'journal'},
  'license': None,
  'version': None,
  'is_accepted': False,
  'is_published': False},
 {'is_oa': True,
  'landing_page_url': 'https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1058&context=scm_pubs',
  'pdf_url': 'https://dr.lib.iastate.edu/bitstreams/be13ecc3-ad5c-4d7e-8a41-75e62733974c/dow

In [17]:
# Open Access status
data['open_access']

{'is_oa': True,
 'oa_status': 'green',
 'oa_url': 'https://dr.lib.iastate.edu/bitstreams/be13ecc3-ad5c-4d7e-8a41-75e62733974c/download',
 'any_repository_has_fulltext': True}

In [18]:
# The first 5 cited works by their OpenAlex links
data['referenced_works'][:5]

['https://openalex.org/W1558765224',
 'https://openalex.org/W1570319579',
 'https://openalex.org/W1595530445',
 'https://openalex.org/W1605623810',
 'https://openalex.org/W1608367906']

In [19]:
# Times cited
data['cited_by_count']

17

In [20]:
# Citations by year (since 2012)
data['counts_by_year']

[{'year': 2023, 'cited_by_count': 2},
 {'year': 2022, 'cited_by_count': 3},
 {'year': 2021, 'cited_by_count': 2},
 {'year': 2020, 'cited_by_count': 3},
 {'year': 2019, 'cited_by_count': 2},
 {'year': 2018, 'cited_by_count': 1},
 {'year': 2017, 'cited_by_count': 4}]

In [21]:
# API link to citing publications
data['cited_by_api_url']

'https://api.openalex.org/works?filter=cites:W2518998880'

In [22]:
# Status information: paratext, retraction
print(data['is_paratext'])
data['is_retracted']

False


False

A special feature of OpenAlex *Work* data is the attribute `abstract_inverted_index`. Instead of delivering the plain text **abstract of a publication**, its vocabulary or tokenized version is given, listing all words/tokens with their position in the abstract. This is done for copyright reasons. However, this information can be used to search *Works* (see 2.2) and build other applications upon.
Note that punctuation is treated as part of the tokens.

In [23]:
# Inverted index with positions of words
data['abstract_inverted_index']

{'As': [0],
 'national': [1],
 'economies': [2],
 'continue': [3],
 'to': [4, 14, 23, 27, 52, 78, 92, 120, 133, 149, 177],
 'evolve': [5],
 'across': [6],
 'the': [7, 126, 156],
 'globe,': [8],
 'businesses': [9, 46, 116, 122],
 'are': [10, 47, 57, 69, 75, 108, 117],
 'increasing': [11],
 'their': [12],
 'capacity': [13],
 'not': [15, 58],
 'only': [16],
 'generate': [17],
 'new': [18],
 'products': [19],
 'and': [20, 56, 162, 169],
 'deliver': [21],
 'them': [22],
 'customers,': [24],
 'but': [25],
 'also': [26],
 'increase': [28],
 'levels': [29],
 'of': [30, 36, 97, 128, 158],
 'after-sales': [31],
 'service.': [32],
 'One': [33],
 'major': [34],
 'component': [35],
 'after-sale': [37],
 'service': [38, 40, 44, 66, 98, 105, 114, 151, 179],
 'involves': [39],
 'parts': [41, 45, 67, 99, 106, 115, 152, 180],
 'management.': [42, 100],
 'However,': [43, 101],
 'typically': [48],
 'seen': [49],
 'as': [50],
 'add-ons': [51],
 'existing': [53],
 'business': [54],
 'models,': [55],
 'well'

In [24]:
# First 5 related works by their OpenAlex links
data['related_works'][:5]

['https://openalex.org/W1969008554',
 'https://openalex.org/W2283214536',
 'https://openalex.org/W2549510740',
 'https://openalex.org/W2808736867',
 'https://openalex.org/W2154017244']

In [25]:
# How up to date is this information?
data['updated_date']

'2024-04-11T21:41:48.457905'

In [26]:
# Attached concepts (subjects) - with a more sophisticated data structure: a list of dictionaries
data['concepts'][:3]

[{'id': 'https://openalex.org/C2780378061',
  'wikidata': 'https://www.wikidata.org/wiki/Q25351891',
  'display_name': 'Service (business)',
  'level': 2,
  'score': 0.66861594},
 {'id': 'https://openalex.org/C2775899829',
  'wikidata': 'https://www.wikidata.org/wiki/Q3109007',
  'display_name': 'Globe',
  'level': 2,
  'score': 0.5246128},
 {'id': 'https://openalex.org/C61063171',
  'wikidata': 'https://www.wikidata.org/wiki/Q532781',
  'display_name': 'Service design',
  'level': 4,
  'score': 0.4948439}]

In [27]:
# Use Python list comprehension to access the display names of subjects
[i['display_name'] for i in data['concepts']]

['Service (business)',
 'Globe',
 'Service design',
 'Business',
 'Service delivery framework',
 'Process management',
 'Computer science',
 'Big data',
 'Service provider',
 'Marketing',
 'Knowledge management',
 'Medicine',
 'Ophthalmology',
 'Operating system']

In [28]:
# How many MESH descriptors are there?
len(data['mesh'])

0

In [29]:
# Show the first 5 MESH descriptors with qualifier ("None" = no qualifier)
[[i['descriptor_name'], i['qualifier_name']] for i in data['mesh'][:5]]

[]

In [30]:
# Showing the APC of the source journal
data['apc_list']

{'value': 2390, 'currency': 'EUR', 'value_usd': 2990, 'provenance': 'doaj'}

### Author and affiliation data

The *Works* object can hold attributes which provide more detailed information, or in a certain way, sub-attributes, like the attribute `authorships`.
Let's take a look at some `authorships` data: It holds all authors of the publication, each introduced by the `author_position` statement. Thereafter follows the individual author's data, resembling the *Author* entity will get to now later with its own endpoint.

Note that authors can have multiple `institutions` (which are their affiliations), each in turn described in detail, too.

In [31]:
# Showing the first 2 authors, assembled in "authorships" attribute
data['authorships'][:2]

[{'author_position': 'first',
  'author': {'id': 'https://openalex.org/A5080205434',
   'display_name': 'Christopher A. Boone',
   'orcid': 'https://orcid.org/0000-0001-9654-9062'},
  'institutions': [{'id': 'https://openalex.org/I128956969',
    'display_name': 'Texas Christian University',
    'ror': 'https://ror.org/054b0b564',
    'country_code': 'US',
    'type': 'education',
    'lineage': ['https://openalex.org/I128956969']}],
  'countries': ['US'],
  'is_corresponding': False,
  'raw_author_name': 'Christopher A. Boone',
  'raw_affiliation_string': 'Neeley School of Business, Texas Christian University, TX, USA',
  'raw_affiliation_strings': ['Neeley School of Business, Texas Christian University, TX, USA']},
 {'author_position': 'middle',
  'author': {'id': 'https://openalex.org/A5020324702',
   'display_name': 'Benjamin T. Hazen',
   'orcid': None},
  'institutions': [{'id': 'https://openalex.org/I55061410',
    'display_name': 'Air Force Institute of Technology',
    'ror': 

In [32]:
# Print out the names, ORCIDs, raw affiliation strings and positions of all authors with a loop
for i in data['authorships']:
    print(i['author']['display_name'])
    print(i['author']['orcid'])
    print(i['raw_affiliation_string'])
    print(i['author_position'])
    print('---')

Christopher A. Boone
https://orcid.org/0000-0001-9654-9062
Neeley School of Business, Texas Christian University, TX, USA
first
---
Benjamin T. Hazen
None
Department of Operational Sciences, Air Force Institute of Technology, OH, USA
middle
---
Joseph B. Skipper
None
Stafford School of Business, Abraham Baldwin Agricultural College, Tifton, USA
middle
---
Robert E. Overstreet
https://orcid.org/0000-0002-5047-2415
Department of Supply Chain and Information Systems, Iowa State University, Ames, USA
last
---


In [33]:
# Print out all affiliations (short, normalized form) of the authors with a nested loop
for i in data['authorships']:
    for j in i['institutions']:
        print(j['display_name'])

Texas Christian University
Air Force Institute of Technology
Abraham Baldwin Agricultural College
Iowa State University


### Use the search parameter

With a slight adjustment you can use the *Works* endpoint for a simple **search in the title, abstract fields** and **fulltext**, if available for OpenALex. You can search multiple terms just by separating them with a blank. You can also search for **exact phrases** using quotation marks.

The ranking (`relevance_score`) depends on similarity between request and field data, proximity of the single terms and citation counts. See [here](https://docs.openalex.org/api-entities/works/search-works) for a more detailed description of the search functionality.

In [34]:
# Search 2 terms in title, abstract and fulltext, where available
url = 'https://api.openalex.org/works?search=einstein philosoph'

# Compare
#url = 'https://api.openalex.org/works?search=einsteins philosophie'

# Search an exact phrase using quotes ""
#url = 'https://api.openalex.org/works?search="einstein als philosoph"'

params = {'mailto': email}

r = requests.get(url, params=params)
data = r.json()

In [35]:
data['meta']

{'count': 453,
 'db_response_time_ms': 118,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [36]:
for i in data['results'][:10]:
    print(i['title'])
    print(i['relevance_score'])
    print(i['type'])
    print('---')

Albert Einstein als Philosoph und Naturforscher
197.5913
book
---
Albert Einstein als Philosoph und Naturforscher : eine Auswahl
79.943756
article
---
Albert Einstein als Philosoph und Naturforscher
72.85244
book
---
Einsteins Philosoph: Moritz Schlick und die Relativitätstheorie
56.678917
article
---
Die Korrespondenz Einstein-Schlick: Zum Verhältnis der Physik zur Philosophie
32.561615
article
---
Teil V: Schlussbemerkungen: Einstein – Wissenschaftler und Philosoph
17.587198
book-chapter
---
Schill, P. A. : Albert Einstein als Philosoph und Naturforscher
16.57221
article
---
Semiconductor Optics
14.787688
book
---
Albert Einstein und Ernst Mach: Das Machsche Prinzip und die Krise des logischen Positivismus: Walther Gerlach zum Gedächtnis
13.120136
article
---
P. A. Schilpp (Ed.): <i>Albert Einstein als Philosoph und Naturforscher.</i> Eine Auswahl. Friedr. Vieweg &amp; Sohn, Braunschweig/Wiesbaden 1983. 250 Seiten, Preis: DM 34,—
11.933929
article
---


### Access multiple Work entities

After retrieving one specific publication record successfully, it's time to look for a bunch of publications. First of all, it is easy to **access all available works** with the OpenAlex API - just use the *Works* endpoint without any further parameters. But presumably you will need this amount of data only in exceptional cases.

But you can easily **add** to the *Work* endpoint **several filters** to narrow down your search. You find the **available filters** for *Work* entities [here](https://docs.openalex.org/api-entities/works/filter-works). As you may notice these are mainly the attributes and sub-attributes of the *Work* entities seen above.

#### Filter works by publication type

Let's start with filtering all available publications due to a **certain publication type**, like for instance book parts or reports. Note that OpenAlex uses the same publication types like [Crossref](https://api.crossref.org/types). Feel free to try them out in the  code cell below.

In [37]:
# Accessing all publication of a certain publication type
#url = 'https://api.openalex.org/works'   # all available publications

url = 'https://api.openalex.org/works?filter=type:dissertation'
params = {'mailto': email}

r = requests.get(url, params=params) 
data = r.json()

In [38]:
print(list(data))
data['meta']

['meta', 'results', 'group_by']


{'count': 6151129,
 'db_response_time_ms': 41,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In the `results` section you will find the matching records.

The `count` attribute in the `meta`section says how many results are there at all. The `per_page` attribute tells you how many records are delivered at once as default.

In [39]:
# Pick one example from the results list
data['results'][11]

{'id': 'https://openalex.org/W4240619026',
 'doi': 'https://doi.org/10.25148/etd.fi14061505',
 'title': 'The construct of work commitment: testing an integrative framework',
 'display_name': 'The construct of work commitment: testing an integrative framework',
 'publication_year': 2017,
 'publication_date': '2017-11-13',
 'ids': {'openalex': 'https://openalex.org/W4240619026',
  'doi': 'https://doi.org/10.25148/etd.fi14061505'},
 'language': 'en',
 'primary_location': {'is_oa': True,
  'landing_page_url': 'https://doi.org/10.25148/etd.fi14061505',
  'pdf_url': 'https://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=3834&context=etd',
  'source': None,
  'license': None,
  'version': 'publishedVersion',
  'is_accepted': True,
  'is_published': True},
 'type': 'dissertation',
 'type_crossref': 'dissertation',
 'indexed_in': ['crossref'],
 'open_access': {'is_oa': True,
  'oa_status': 'bronze',
  'oa_url': 'https://digitalcommons.fiu.edu/cgi/viewcontent.cgi?article=3834&context=etd',


#### Filter works by "source":  journal, repository etc

Next we try to retrieve all publications of a certain "source". ***Source* entities** in OpenAlex are the venues, like **journals, conferences, preprint repositories,** and **institutional repositories**, that host publications. For journals articles this will be their journals, for book parts their volume. Sources can also be monograph series like [S4210167446](https://api.openalex.org/sources/S4210167446), repositories like [S4306402567](https://api.openalex.org/sources/S4306402567).

Source information is given in the *Works* attribute `primary_location` and `locations`. To get an impression of sources you may use the random generation `https://api.openalex.org/sources/random`.

To identify the desired source you can use its OpenAlex ID (starting with a "S"), for instance `S4210209919`. This identifier is also a sub-attribute of `locations.source`, and can be addressed with `locations.source.id` in the *Works* endpoint filter. You may also use the popular [ISSN](https://en.wikipedia.org/wiki/International_Standard_Serial_Number) identifier in the same way.

In [40]:
# Accessing works of a certain host venue, using OpenAlex ID
url = 'https://api.openalex.org/works?filter=locations.source.id:S4210209919'

# Using ISSN identifier
#url = 'https://api.openalex.org/works?filter=locations.source.issn:2391-7652'

params = {'mailto': email}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 133,
 'db_response_time_ms': 241,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [41]:
# Article titles of the result set
[i['title'] for i in data['results']]

['Caste: The Origins of Our Discontents',
 'Caste Identities and Structures of Threats: Stigma, Prejudice and Social Representation in Indian universities',
 'Dalit or Brahmanical Patriarchy? Rethinking Indian Feminism',
 'Politics of Recognition and Caste among Muslims: A Study of Shekhra Biradari of Bihar, India',
 'The Human Dignity Argument against Manual Scavenging in India',
 'The Caste of Campus Habitus: Caste and Gender Encounters of the First-generation Dalit Women Students in Indian Universities',
 'As a Dalit Women',
 'Caste and Socioeconomic Inequality in Child Health and Nutrition in India: Evidences from National Family Health Survey',
 'Manual Scavenging in India: The Banality of An Everyday Crime',
 'Why a Journal on Caste?',
 'Sex as a Weapon to Settle Scores against Dalits: An Quotidian Phenomenon',
 'Editorial Essay',
 'Caste and Consequences',
 'Revolt of the Upper Castes',
 'Population-Poverty Linkages and Health Consequences',
 'Caste, Religion and Ethnicity',
 'W

We might adapt now the request by defining a **bigger default result set** to get all hits at once.

Instead to add the parameter to the URL directly, you may also use the `params` dictionary for more clarity.

In [42]:
url = 'https://api.openalex.org/works?'

params = {'mailto': email,
          'filter': 'locations.source.id:S4210209919',
          'per_page': 100}

r = requests.get(url, params=params) 
data = r.json()

In [43]:
data['meta']

{'count': 133,
 'db_response_time_ms': 123,
 'page': 1,
 'per_page': 100,
 'groups_count': None}

In [44]:
# Pick one example of the result set
data['results'][67]

{'id': 'https://openalex.org/W3099866883',
 'doi': 'https://doi.org/10.26812/caste.v1i2.198',
 'title': 'Dalit Counterpublic and Social Space on Indian Campuses',
 'display_name': 'Dalit Counterpublic and Social Space on Indian Campuses',
 'publication_year': 2020,
 'publication_date': '2020-10-31',
 'ids': {'openalex': 'https://openalex.org/W3099866883',
  'doi': 'https://doi.org/10.26812/caste.v1i2.198',
  'mag': '3099866883'},
 'language': 'en',
 'primary_location': {'is_oa': True,
  'landing_page_url': 'https://doi.org/10.26812/caste.v1i2.198',
  'pdf_url': 'https://journals.library.brandeis.edu/index.php/caste/article/download/198/52',
  'source': {'id': 'https://openalex.org/S4210209919',
   'display_name': 'Caste',
   'issn_l': '2639-4928',
   'issn': ['2639-4928'],
   'is_oa': True,
   'is_in_doaj': True,
   'host_organization': None,
   'host_organization_name': None,
   'host_organization_lineage': [],
   'host_organization_lineage_names': [],
   'type': 'journal'},
  'licens

In [45]:
# Access 5 publications by title
[i['title'] for i in data['results'][35:40]]

['“Our Poverty has No Shame; the Stomach has No Shame, so We Migrate Seasonally”: Women Sugarcane Cutters from Maharashtra, India',
 'Periyar: Forging a Gendered Utopia',
 'Repertoires of Anti-caste Sentiments in the Everyday Performance: Narratives of a Dalit Woman Singer',
 'The Bir Sunarwala: An Uncharted Dalit Land Movement of Haryana, India',
 'Teaching Dalit Bahujan Utopias: Notes from the Classroom']

In [46]:
# Access the subjects of a sample from the middle of the publication set
[[j['display_name'] for j in i['concepts']] for i in data['results'][35:37]]

[['Caste',
  'Poverty',
  'Shame',
  'Socioeconomics',
  'Sanitation',
  'Economic growth',
  'Work (physics)',
  'Geography',
  'Political science',
  'Business',
  'Sociology',
  'Medicine',
  'Economics',
  'Mechanical engineering',
  'Engineering',
  'Pathology',
  'Law'],
 ['Utopia',
  'Sociology',
  'Feminism',
  'Ideology',
  'Legitimacy',
  'Gender studies',
  'Grassroots',
  'Coercion (linguistics)',
  'Law',
  'Political science',
  'Politics',
  'Philosophy',
  'Linguistics']]

#### Filter works by author & handle big result sets

Similarly to the venue filtering, you may show all publication by a certain author, using its OpenAlex ID (starting with a "A"), for instance `A2420227930`. This identifier can be addressed with `authorships.author.id` or shorter, `author.id` in the *Works* endpoint filter.

Furthermore, you may access a certain sub result set ("result page") via the `page` parameter, like in the example below.

In [47]:
url = 'https://api.openalex.org/works?filter=author.id:A5067955208&page=1&per-page=50'
params = {'mailto': email}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 365,
 'db_response_time_ms': 50,
 'page': 1,
 'per_page': 50,
 'groups_count': None}

Since the `per-page` parameter only can be set up to 200, several requests are necessary to access the whole result set.

This can be easily done by writing a small loop.

In [48]:
results = []
for page in range(1,5):
    url = 'https://api.openalex.org/works?'
    params = {'mailto': email,
              'filter': 'author.id:A5067955208',
              'per-page': 100,
              'page': page}
    r = requests.get(url, params=params)
    data = r.json()
    for record in data['results']:
        results.append(record)
len(results)

365

In [49]:
[i['title'] for i in results[320:330]]

['A new and easier method for the assessment of final erosion depth in cancellous bone',
 'Solid-phase radioimmunoassay of immunoglobulins G, A and M: Applicability in analysis of sucrose gradients',
 'A Model Curriculum for Teaching Teachers To Use Computers as an Instructional Aid.',
 'Review for "History of Previous Fracture and Imminent Fracture Risk in Swedish Women Aged 55–90 Years Presenting with a Fragility Fracture"',
 'En kvinne i 50-årene med residiverende svimmelhet',
 'TransCon PTH, a long-acting PTH, in patients with hypoparathyroidism: Results of the phase 2 PaTH forward trial',
 'Changes in protein profile in bone marrow extracts before and one year after gastric bypass surgery',
 'Contributors',
 'Steroidhormones downregulate integrins in human stroma derived osteobilasts',
 'Reply']

#### Group works due to a certain feature

Grouping is another way to adjust your data search. Here, you **split the result set** into groups **according to a certain data attribute**. Note that doing so, you are not targeting the records themselves, but deriving statements from the data. Because of this you will not use `filter` as parameter but `group_by` in your request. Accordingly, you will find the `results` section empty, but your demanded data in the `group_by`section.

For example you might look for the hosting sources of all publications, grouping them around this feature, and get the size of these groups. Note that the maximum of groups is 200 - if there are more variants you will get the 200 biggest of them.

In [50]:
url = 'https://api.openalex.org/works'
params = {'mailto': email,
         'group_by': 'locations.source.id'}

r = requests.get(url, params=params) 
data = r.json()

In [51]:
list(data)

['meta', 'group_by']

In [52]:
data['meta']

{'count': 251664028,
 'db_response_time_ms': 2091,
 'page': 1,
 'per_page': 200,
 'groups_count': 200}

In [53]:
data['group_by'][:5]

[{'key': 'https://openalex.org/S4306525036',
  'key_display_name': 'PubMed',
  'count': 34833104},
 {'key': 'https://openalex.org/S2764455111',
  'key_display_name': 'PubMed Central',
  'count': 7987269},
 {'key': 'https://openalex.org/S4306401280',
  'key_display_name': 'DOAJ (DOAJ: Directory of Open Access Journals)',
  'count': 5651831},
 {'key': 'https://openalex.org/S4306400806',
  'key_display_name': 'Europe PMC (PubMed Central)',
  'count': 5311547},
 {'key': 'https://openalex.org/S4306400194',
  'key_display_name': 'arXiv (Cornell University)',
  'count': 3516599}]

In [54]:
# Accessing the biggest 20 sources, default sorted in descending count
[[i['key_display_name'], i['count']] for i in data['group_by'][:20]]

[['PubMed', 34833104],
 ['PubMed Central', 7987269],
 ['DOAJ (DOAJ: Directory of Open Access Journals)', 5651831],
 ['Europe PMC (PubMed Central)', 5311547],
 ['arXiv (Cornell University)', 3516599],
 ['DataCite API', 3258344],
 ['Springer eBooks', 3212360],
 ['HAL (Le Centre pour la Communication Scientifique Directe)', 2748048],
 ['Zenodo (CERN European Organization for Nuclear Research)', 1991504],
 ['Routledge eBooks', 1541318],
 ['De Gruyter eBooks', 1279414],
 ['Oxford University Press eBooks', 1201386],
 ['Elsevier eBooks', 1161028],
 ['OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)',
  1135259],
 ['RePEc: Research Papers in Economics', 1103339],
 ['Social Science Research Network', 1038875],
 ['Cambridge University Press eBooks', 648440],
 ['LA Referencia (Red Federada de Repositorios Institucionales de Publicaciones Científicas)',
  610943],
 ['CRC Press eBooks', 596660],
 ['CiteSeer X (The Pennsylvania State University)', 576106]]

#### Combine filters and grouping

You can easily combine filters and grouping, for instance, filter according to a specific institution, like *University of Bern* `I118564535`, and group its publications by `concepts.id`. Then, of course, the resulting groups are based on the filtered subset.

In [55]:
# Use the "params" dictionary for using filter and group_by at the same time
url = 'https://api.openalex.org/works'
params = {'mailto': email,
          'filter': 'institutions.id:I118564535',
          'group_by': 'concepts.id'}

r = requests.get(url, params=params) 
data = r.json()

In [56]:
# Accessing the most 20 concepts, default sorted in descending count
[[i['key_display_name'], i['count']] for i in data['group_by'][:20]]

[['Medicine', 52766],
 ['Biology', 38588],
 ['Internal medicine', 26943],
 ['Computer science', 23669],
 ['Physics', 23046],
 ['Chemistry', 22563],
 ['Biochemistry', 15357],
 ['Mathematics', 12999],
 ['Pathology', 12559],
 ['Genetics', 12508],
 ['Gene', 11837],
 ['Psychology', 11681],
 ['Geology', 11170],
 ['Surgery', 10468],
 ['Political science', 10227],
 ['Philosophy', 9518],
 ['Ecology', 9420],
 ['Geography', 9200],
 ['Materials science', 9198],
 ['Quantum mechanics', 9131]]

In [57]:
concepts_data =[[i['key_display_name'], i['count']] for i in data['group_by']]
concepts_sum = 0
for i in concepts_data:
    concepts_sum += i[1]
concepts_sum

777856

In [58]:
# Calculating and adding percentages
for i in concepts_data:
    i.append(str(round(i[1]/concepts_sum*100,1)) + '%')
concepts_data[:10]

[['Medicine', 52766, '6.8%'],
 ['Biology', 38588, '5.0%'],
 ['Internal medicine', 26943, '3.5%'],
 ['Computer science', 23669, '3.0%'],
 ['Physics', 23046, '3.0%'],
 ['Chemistry', 22563, '2.9%'],
 ['Biochemistry', 15357, '2.0%'],
 ['Mathematics', 12999, '1.7%'],
 ['Pathology', 12559, '1.6%'],
 ['Genetics', 12508, '1.6%']]

## 👨‍🎓  Authors endpoint

Similar to the *Works* endpoint, there is an ***Authors*** endpoint in OpenAlex, delivering rich information about authors of scholarly publications: `/authors`. The *Authors* endpoint can be accessed via OpenAlex identifier (starting with an "A" like `A2156811675`) or for instance, via the popular [ORCID](https://orcid.org/) identifier. For detailed information see the *Author* object documentation [here](https://docs.openalex.org/about-the-data/author).


In [59]:
base_url = 'https://api.openalex.org/authors/'
params = {'mailto': email} 

# Usable identifiers:
openalex_id = 'A2156811675'
openalex_id_namespace = 'openalex:A2156811675'
orcid = 'orcid:0000-0002-9344-1029'

# Adjust the used identifier
r = requests.get(base_url + orcid, params=params) 
data = r.json()

In [60]:
list(data)

['id',
 'orcid',
 'display_name',
 'display_name_alternatives',
 'works_count',
 'cited_by_count',
 'summary_stats',
 'ids',
 'affiliations',
 'last_known_institution',
 'last_known_institutions',
 'x_concepts',
 'counts_by_year',
 'works_api_url',
 'updated_date',
 'created_date']

In [61]:
data['id']

'https://openalex.org/A5015900970'

In [62]:
# All available person identifiers
data['ids']

{'openalex': 'https://openalex.org/A5015900970',
 'orcid': 'https://orcid.org/0000-0002-9344-1029'}

In [63]:
# Normalized plain name and variants
print(data['display_name'])
data['display_name_alternatives']

Christine L. Borgman


['C L. Borgman',
 'Christine Borgman',
 'Christine Louise Borgman',
 'C.L. Christine L. Borgman',
 'C. Borgman',
 'Christine L. Borgman',
 'C. L. Borgman']

In [64]:
# How many times the publications of this person were cited?
print(data['cited_by_count'])
data['summary_stats']

8348


{'2yr_mean_citedness': 0.5217391304347826, 'h_index': 42, 'i10_index': 98}

In [65]:
# Count of published works and citations per year, sicne 2012
data['counts_by_year']

[{'year': 2024, 'works_count': 1, 'cited_by_count': 82},
 {'year': 2023, 'works_count': 1, 'cited_by_count': 381},
 {'year': 2022, 'works_count': 3, 'cited_by_count': 460},
 {'year': 2021, 'works_count': 20, 'cited_by_count': 482},
 {'year': 2020, 'works_count': 29, 'cited_by_count': 683},
 {'year': 2019, 'works_count': 8, 'cited_by_count': 409},
 {'year': 2018, 'works_count': 24, 'cited_by_count': 376},
 {'year': 2017, 'works_count': 21, 'cited_by_count': 443},
 {'year': 2016, 'works_count': 23, 'cited_by_count': 512},
 {'year': 2015, 'works_count': 42, 'cited_by_count': 487},
 {'year': 2014, 'works_count': 27, 'cited_by_count': 523},
 {'year': 2013, 'works_count': 20, 'cited_by_count': 381},
 {'year': 2012, 'works_count': 22, 'cited_by_count': 376}]

In [66]:
# API link to the publications of the author (using filter of the "Works" endpoint)
data['works_api_url']

'https://api.openalex.org/works?filter=author.id:A5015900970'

In [67]:
# The persons last known affiliation
data['last_known_institution']

{'id': 'https://openalex.org/I161318765',
 'ror': 'https://ror.org/046rm7j60',
 'display_name': 'University of California, Los Angeles',
 'country_code': 'US',
 'type': 'education',
 'lineage': ['https://openalex.org/I161318765',
  'https://openalex.org/I2803209242']}

In [68]:
# Please notice: Concepts for authors is a experimental fetaure by now - use with caution!
data['x_concepts']

[{'id': 'https://openalex.org/C41008148',
  'wikidata': 'https://www.wikidata.org/wiki/Q21198',
  'display_name': 'Computer science',
  'level': 0,
  'score': 91.5},
 {'id': 'https://openalex.org/C17744445',
  'wikidata': 'https://www.wikidata.org/wiki/Q36442',
  'display_name': 'Political science',
  'level': 0,
  'score': 50.4},
 {'id': 'https://openalex.org/C136764020',
  'wikidata': 'https://www.wikidata.org/wiki/Q466',
  'display_name': 'World Wide Web',
  'level': 1,
  'score': 48.4},
 {'id': 'https://openalex.org/C199539241',
  'wikidata': 'https://www.wikidata.org/wiki/Q7748',
  'display_name': 'Law',
  'level': 1,
  'score': 43.1},
 {'id': 'https://openalex.org/C2522767166',
  'wikidata': 'https://www.wikidata.org/wiki/Q2374463',
  'display_name': 'Data science',
  'level': 1,
  'score': 42.3},
 {'id': 'https://openalex.org/C111919701',
  'wikidata': 'https://www.wikidata.org/wiki/Q9135',
  'display_name': 'Operating system',
  'level': 1,
  'score': 36.3},
 {'id': 'https://op

### Use the search parameter

Using the search parameter on the `/authors`endpoint looks for the terms in the `display_name` attribute only. Read more about the search functionality [here](https://docs.openalex.org/api-entities/authors/search-authors).

In [69]:
base_url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'search': 'erik eriksen'}

r = requests.get(base_url, params=params)
data = r.json()

In [70]:
data['meta']

{'count': 25,
 'db_response_time_ms': 140,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [71]:
# Name, number of works and first ranked concept of "ted underwood" authors
[[i['display_name'], i['works_count'], i['last_known_institution']['display_name']] \
                     for i in data['results'][:10]]

[['Erik Fink Eriksen', 362, 'SpesialistSenteret Pilestredet Park'],
 ['Erik Oddvar Eriksen', 146, 'University of Oslo'],
 ['E. Eriksen', 37, 'University of Oslo'],
 ['Tor Erik Eriksen', 61, 'Norwegian Institute for Water Research'],
 ['Erik Nymann Eriksen', 27, 'University of Copenhagen'],
 ['Erik A. Eriksen', 23, 'ExxonMobil (Germany)'],
 ['Erik Fink-Eriksen', 1, 'Aarhus University Hospital'],
 ['Erik Eriksen', 3, 'Oslo University Hospital'],
 ['Jan Erik Eriksen', 2, 'Halliburton (United Kingdom)'],
 ['E. L. Eriksen', 2, 'Novartis (China)']]

### Access multiple Author entities

Like for publications (aka *Works* entities), you can search for mutliple authors at once, and **use filters and groupings**. 

#### Filter authors by institution & put the data into a dataframe

For example, you might search for all authors, which last known affiliation is *University of Bern*. For this, you can use its OpenAlex identifier `I118564535`. Filtering authors will use the `works_count` attribute to sort the results by default. To adjust the default sorting methods look [here](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sort-entity-lists).

In [72]:
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'filter': 'last_known_institution.id:I118564535',
         'per-page': 20}

r = requests.get(url, params=params) 
data = r.json()

In [73]:
data['meta']

{'count': 13108,
 'db_response_time_ms': 201,
 'page': 1,
 'per_page': 20,
 'groups_count': None}

By reading out certain attributes of the result set you can easily transform the data into a dataframe for further research.

In [74]:
names = []
ids = []
orcids = []
workcounts = []
citedcounts = []
workslinks = []

for i in data['results']:
    names.append(i['display_name'])
    ids.append(i['id'])
    orcids.append(i['orcid'])
    workcounts.append(i['works_count'])
    citedcounts.append(i['cited_by_count'])
    workslinks.append(i['works_api_url'])
    
zipped = list(zip(names, ids, orcids, workcounts, citedcounts, workslinks))
df = pd.DataFrame(zipped, columns=['author', 'openalex_id', 'orcid', \
                                    'works_count', 'cited_by_count', 'works_link'])

In [75]:
df

Unnamed: 0,author,openalex_id,orcid,works_count,cited_by_count,works_link
0,H. P. Beck,https://openalex.org/A5080302579,https://orcid.org/0000-0001-7212-1096,1498,55948,https://api.openalex.org/works?filter=author.i...
1,S. Haug,https://openalex.org/A5055549637,https://orcid.org/0000-0003-0442-3361,1467,65351,https://api.openalex.org/works?filter=author.i...
2,Peter Würz,https://openalex.org/A5080499966,https://orcid.org/0000-0002-2603-1169,1237,18587,https://api.openalex.org/works?filter=author.i...
3,Thomas Berger,https://openalex.org/A5086222360,https://orcid.org/0000-0002-2432-7791,1082,23433,https://api.openalex.org/works?filter=author.i...
4,Bernhard Meier,https://openalex.org/A5088511695,,1073,40873,https://api.openalex.org/works?filter=author.i...
5,Mark A. Rubin,https://openalex.org/A5035068594,https://orcid.org/0000-0002-8321-9950,939,85684,https://api.openalex.org/works?filter=author.i...
6,Franz H. Messerli,https://openalex.org/A5021945153,https://orcid.org/0000-0002-4107-2583,907,41881,https://api.openalex.org/works?filter=author.i...
7,Philippe Renaud,https://openalex.org/A5052161335,https://orcid.org/0000-0002-9069-7109,895,16524,https://api.openalex.org/works?filter=author.i...
8,Norbert Thom,https://openalex.org/A5044612304,,821,410,https://api.openalex.org/works?filter=author.i...
9,A. Miucci,https://openalex.org/A5084822178,https://orcid.org/0000-0001-8828-843X,809,32653,https://api.openalex.org/works?filter=author.i...


#### Filter by multiple features & use logical operators

You can not only use several filters at once, by **simply stacking** them. An you can use filters with **logical operators**, like `!` for NOT, `|` for OR, less-than `<`, or greater-than `>`. For further details see [this page](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists).

In [76]:
# Example: Authors with only a few publications, but high citation count
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'filter': 'works_count:<5,cited_by_count:>50000',
         'per-page': 20}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 3,
 'db_response_time_ms': 59,
 'page': 1,
 'per_page': 20,
 'groups_count': None}

In [77]:
[[i['display_name'], i['works_count'], i['cited_by_count'], i['works_api_url']] for i in data['results']]

[['S Nicklen',
  2,
  59717,
  'https://api.openalex.org/works?filter=author.id:A5035256244'],
 ['Rose J. Randall',
  2,
  304570,
  'https://api.openalex.org/works?filter=author.id:A5074535928'],
 ['N. J. Rosebrough',
  1,
  304586,
  'https://api.openalex.org/works?filter=author.id:A5032482932']]

In [78]:
# Example: Authors from companies or nonprofit institutions outside the USA and China
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'filter': 'last_known_institution.type:company|nonprofit,last_known_institution.country_code:!US|CN',
         'per-page': 20}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 1596836,
 'db_response_time_ms': 282,
 'page': 1,
 'per_page': 20,
 'groups_count': None}

In [79]:
[[i['display_name'], i['last_known_institution']['display_name']] for i in data['results']]

[['Marina Bährle-Rapp', 'New Valve Technology (Germany)'],
 ['Ki Wook Kim', 'Robert Bosch (Germany)'],
 ['Miguel Ángel', 'Danieli (Italy)'],
 ['C.H. Arrowsmith', 'Structural Genomics Consortium'],
 ['Alamat Florist', 'Pancur Kasih Association'],
 ['Hiroshi Ishiguro', 'Sumitomo Dainippon Pharma (Japan)'],
 ['Hiroshi Takahashi', 'John Wiley & Sons (United Kingdom)'],
 ['Redaktion Facharztmagazine', 'Springer Nature (Germany)'],
 ['Nigel Collar', 'BirdLife international'],
 ['Àlex', 'Campden BRI (Hungary)'],
 ['A.M. Edwards', 'Structural Genomics Consortium'],
 ['Yusuke Nakamura', 'Japanese Foundation For Cancer Research'],
 ['Ji‐Hyun Lee', 'Samsung (South Korea)'],
 ['T. Dorigo', 'Universal Scientific Education and Research Network'],
 ['Ho Kim', 'Robert Bosch (Germany)'],
 ['Holger Eickhoff', 'Scienion (Germany)'],
 ['Yuming Wang', 'Huawei Technologies (Sweden)'],
 ['Joshua Cantlon-Bruce', 'Scienion (Germany)'],
 ['Kazuo Yamada', 'Honda (Japan)'],
 ['Dong-Won Kim', 'Doosan Heavy Industr

#### Group authors due to a certain feature

As a use case, it is very easy to group all authors according to their last known institution's country. Just use the `last_known_institution.country_code` sub-attribute of the *Author* object as **grouping parameter**. Find more features to group authors by [here](https://docs.openalex.org/api-entities/authors/group-authors).

In [80]:
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'group_by': 'last_known_institution.country_code'}

r = requests.get(url, params=params) 
data = r.json()

In [81]:
data['meta']

{'count': 91549341,
 'db_response_time_ms': 113,
 'page': 1,
 'per_page': 200,
 'groups_count': 200}

In [82]:
# Accessing the 10 biggest countries of affiliations, default sorted in descending count
[[i['key_display_name'], i['count']] for i in data['group_by'][:10]]

[['United States of America', 6625148],
 ['China', 3422192],
 ['Brazil', 1571521],
 ['Germany', 1427754],
 ['Japan', 1389100],
 ['United Kingdom of Great Britain and Northern Ireland', 1371968],
 ['India', 1333513],
 ['Indonesia', 1271352],
 ['France', 1042041],
 ['Russian Federation', 881732]]

#### Combine filters and grouping

Of course you can combine filters and groupings also for authors' data. In the example beneath you will group all authors writing about *Ethical Implications of Artificial Intelligence* `T10883` according to their (last known) institution type.

In [83]:
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'filter': 'topics.id:T10883',
         'group_by': 'last_known_institution.type'}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 18132,
 'db_response_time_ms': 55,
 'page': 1,
 'per_page': 200,
 'groups_count': 8}

In [84]:
data['group_by']

[{'key': 'education', 'key_display_name': 'education', 'count': 12038},
 {'key': 'company', 'key_display_name': 'company', 'count': 1036},
 {'key': 'facility', 'key_display_name': 'facility', 'count': 847},
 {'key': 'nonprofit', 'key_display_name': 'nonprofit', 'count': 537},
 {'key': 'healthcare', 'key_display_name': 'healthcare', 'count': 455},
 {'key': 'government', 'key_display_name': 'government', 'count': 400},
 {'key': 'other', 'key_display_name': 'other', 'count': 229},
 {'key': 'archive', 'key_display_name': 'archive', 'count': 33}]

In [85]:
inst_data =[[i['key_display_name'], i['count']] for i in data['group_by']]
inst_sum = 0
for i in inst_data:
    inst_sum += i[1]
inst_sum

15575

In [86]:
# Calculating and adding percentages
for i in inst_data:
    i.append(str(round(i[1]/inst_sum*100,1)) + '%')
inst_data

[['education', 12038, '77.3%'],
 ['company', 1036, '6.7%'],
 ['facility', 847, '5.4%'],
 ['nonprofit', 537, '3.4%'],
 ['healthcare', 455, '2.9%'],
 ['government', 400, '2.6%'],
 ['other', 229, '1.5%'],
 ['archive', 33, '0.2%']]

Now let's have a look at those authors writing about AI Ethics, whose last know affiliation is a company. For this, we will use 2 filters concurrently here: `topics.id:T10883` and `last_known_institution.type=company`.

In [87]:
url = 'https://api.openalex.org/authors?'
params = {'mailto': email,
         'filter': 'topics.id:T10883,last_known_institution.type:company'}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 1038,
 'db_response_time_ms': 134,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [88]:
# Showing the 20 most productive authors in AI Ethics working at a company
[[i['display_name'], i['last_known_institution']['display_name'], i['works_count']] for i in data['results'][:20]]

[['Alekh Agarwal', 'Systems Analytics (United States)', 445],
 ['J. Vaughan', 'Microsoft (United States)', 406],
 ['Andreas Hepp', 'Walter de Gruyter (Germany)', 398],
 ['Simon M. Lucas', 'Merck (Germany)', 389],
 ['Bernd Stahl', 'Nutricia Research (Netherlands)', 364],
 ['Jun Zhao', 'Aviation Industry Corporation of China (China)', 363],
 ['Jordon Rowe Adams', 'LinkedIn (United States)', 359],
 ['Hal Daumé', 'Microsoft (United States)', 334],
 ['Lora Aroyo', 'Google (United States)', 327],
 ['Elizabeth F. Churchill', 'Google (United States)', 318],
 ['Yunlong Cai', 'Visa (United Kingdom)', 313],
 ['Thomas H. Davenport', 'Deloitte (United States)', 312],
 ['Cacm Staff', 'Intelligent Systems Research (United States)', 302],
 ['Stamatis Karnouskos',
  'Systems, Applications & Products in Data Processing (Germany)',
  293],
 ['Ivica Crnković', 'Software (Spain)', 288],
 ['Ryan A. Rossi', 'Adobe Systems (United States)', 283],
 ['Antonino Rotolo', 'Zambon (Italy)', 280],
 ['Stephen J. Cimb

## 🔖 Topics endpoint

***Topics*** entities are OpenAlex' notion of subjects, which are attached to *Works*, using an automated system that takes into account the available information including title, abstract, source (journal) name, and citations. The highest-scoring topic is that Work's `primary_topic`. There is also an attribute `topics`of a Work entity with additional highly ranked topics.

Topics are hierarchically organzied in sub-fields, fields, and domains, for further information see the [Topics page](https://docs.openalex.org/api-entities/topics)


In [89]:
base_url = 'https://api.openalex.org/topics/'
params = {'mailto': email} 

openalex_id = 'T10101'

r = requests.get(base_url + openalex_id, params=params) 
data = r.json()

In [90]:
# Available attributes of the Concept entity
list(data)

['id',
 'display_name',
 'description',
 'keywords',
 'ids',
 'subfield',
 'field',
 'domain',
 'siblings',
 'works_count',
 'cited_by_count',
 'updated_date',
 'created_date']

In [91]:
print(data['display_name'])
print(data['ids'])
data['keywords']

Cloud Computing and Big Data Technologies
{'openalex': 'https://openalex.org/T10101', 'wikipedia': 'https://en.wikipedia.org/wiki/Cloud_computing'}


['Cloud Computing',
 'Big Data',
 'MapReduce',
 'Virtualization',
 'Data Centers',
 'Resource Management',
 'Hadoop',
 'Distributed Systems',
 'Energy Efficiency',
 'Scalability']

In [92]:
data['description']

'This cluster of papers covers a wide range of topics related to cloud computing, big data technologies, and data center management. It includes discussions on MapReduce, virtualization, Hadoop, resource management, energy efficiency, scalability, and distributed systems. The papers also address challenges and opportunities in the field of cloud computing and big data.'

## 🏫 Institutions endpoint

***Institution*** entities are the organizations which host the authors when they write their publications (affiliations). Thus, publications can be assigned as output to certain institutions. OpenAlex uses the categories of institutions of the [ROR](https://ror.org/) registry: education, healthcare, governnment, nonprofit, archive, company, facility, unknown. See also the [Institution](https://docs.openalex.org/about-the-data/institution) object documentation.

You can use the `/institutions` endpoint with OpenAlex identifier (starting with an "I" like `I118564535`) or the popular identifier ROR ID.

In [93]:
base_url = 'https://api.openalex.org/institutions/'
params = {'mailto': email} 

#openalex_id = 'I118564535'
#openalex_id_namespace = 'openalex:I118564535'
ror_id = 'ror:02k7v4d05'

# Adapt identifier if necessary
r = requests.get(base_url + ror_id, params=params) 
data = r.json()

In [94]:
# Overview of data fields
list(data)

['id',
 'ror',
 'display_name',
 'country_code',
 'type',
 'type_id',
 'lineage',
 'homepage_url',
 'image_url',
 'image_thumbnail_url',
 'display_name_acronyms',
 'display_name_alternatives',
 'repositories',
 'works_count',
 'cited_by_count',
 'summary_stats',
 'ids',
 'geo',
 'international',
 'associated_institutions',
 'counts_by_year',
 'roles',
 'topics',
 'topic_share',
 'x_concepts',
 'works_api_url',
 'updated_date',
 'created_date']

In [95]:
# OpenAlex institution ID
data['id']

'https://openalex.org/I118564535'

In [96]:
# All institution identifiers
data['ids']

{'openalex': 'https://openalex.org/I118564535',
 'ror': 'https://ror.org/02k7v4d05',
 'mag': '118564535',
 'grid': 'grid.5734.5',
 'wikipedia': 'https://en.wikipedia.org/wiki/University%20of%20Bern',
 'wikidata': 'https://www.wikidata.org/wiki/Q659080'}

In [97]:
# Preferable name
data['display_name']

'University of Bern'

In [98]:
# Display name in other languages, e.g. Latin
data['international']['display_name']['la']

'Universitas Bernensis'

In [99]:
# API link to all affiliated publications
data['works_api_url']

'https://api.openalex.org/works?filter=institutions.id:I118564535'

In [100]:
# Number of citations of affiliated publications
data['cited_by_count']

3582551

In [101]:
# Number of affiliated publications and citations by year, since 2012
data['counts_by_year']

[{'year': 2024, 'works_count': 1970, 'cited_by_count': 103562},
 {'year': 2023, 'works_count': 7872, 'cited_by_count': 349367},
 {'year': 2022, 'works_count': 8130, 'cited_by_count': 341962},
 {'year': 2021, 'works_count': 7892, 'cited_by_count': 322673},
 {'year': 2020, 'works_count': 7064, 'cited_by_count': 270942},
 {'year': 2019, 'works_count': 6126, 'cited_by_count': 222126},
 {'year': 2018, 'works_count': 5529, 'cited_by_count': 193331},
 {'year': 2017, 'works_count': 5118, 'cited_by_count': 171436},
 {'year': 2016, 'works_count': 4923, 'cited_by_count': 159501},
 {'year': 2015, 'works_count': 4192, 'cited_by_count': 153036},
 {'year': 2014, 'works_count': 3880, 'cited_by_count': 140873},
 {'year': 2013, 'works_count': 3517, 'cited_by_count': 130026},
 {'year': 2012, 'works_count': 3348, 'cited_by_count': 114152}]

In [102]:
# Associated institutions
data['associated_institutions']

[{'id': 'https://openalex.org/I2801112126',
  'ror': 'https://ror.org/01q9sj412',
  'display_name': 'University Hospital of Bern',
  'country_code': 'CH',
  'type': 'healthcare',
  'relationship': 'related'},
 {'id': 'https://openalex.org/I4210087665',
  'ror': 'https://ror.org/003bz8x96',
  'display_name': 'Wiederkäuerklinik',
  'country_code': 'CH',
  'type': 'healthcare',
  'relationship': 'related'}]

In [103]:
# Geographical information
data['geo']

{'city': 'Bern',
 'geonames_city_id': '2661552',
 'region': None,
 'country_code': 'CH',
 'country': 'Switzerland',
 'latitude': 46.94809,
 'longitude': 7.44744}

### Access multiple Institution entities

You can access all *Institutions* at once, and use filters and groupings on them. See the available filters for *Institution* entities [here](https://docs.openalex.org/api-entities/institutions/filter-institutions) and possible grouping parameters [here](https://docs.openalex.org/api-entities/institutions/group-institutions).

In the example you will narrow down to Swiss institutions only and group these due to institution types.

In [104]:
url = 'https://api.openalex.org/institutions?'
params = {'mailto': email,
         'filter': 'country_code:CH',
         'group_by': 'type'}

r = requests.get(url, params=params) 
data = r.json()

In [105]:
data['meta']

{'count': 1548,
 'db_response_time_ms': 29,
 'page': 1,
 'per_page': 200,
 'groups_count': 8}

In [106]:
[[i['key_display_name'], i['count']] for i in data['group_by']]

[['company', 591],
 ['nonprofit', 202],
 ['other', 177],
 ['healthcare', 163],
 ['education', 129],
 ['government', 100],
 ['archive', 96],
 ['facility', 90]]

Now, let's look at the 96 Swiss archives producing scholarly publications. To get this data, you have to use 2 filters at once: `country_code:CH` and `type:archive`.

In [107]:
url = 'https://api.openalex.org/institutions?'
params = {'mailto': email,
         'filter': 'country_code:CH,type:archive'}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 96,
 'db_response_time_ms': 42,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [108]:
# Showing the top 10
[[i['display_name'], i['works_count']] for i in data['results'][:20]]

[["Musée d'Art et d'Histoire", 1637],
 ['Swiss Archaeology', 1493],
 ['Natural History Museum of Geneva', 1441],
 ['Natural History Museum of Basel', 845],
 ['Amt für Archäologie', 724],
 ['Zürich Zoological Garden', 716],
 ['Natural History Museum of Bern', 711],
 ['Conservatory and Botanical Garden of the City of Geneva', 657],
 ['International Museum of Horology', 533],
 ['Swiss National Museum', 446],
 ['Swiss Federal Archives', 379],
 ['Musée cantonal de zoologie de Lausanne', 359],
 ['Sukkulenten-Sammlung Zürich', 348],
 ['Centre Suisse de Cartographie de la Faune', 274],
 ["Musée Cantonal d'Archéologie et d'Histoire", 233],
 ['Swiss National Park', 160],
 ['Laténium', 80],
 ['Kantonsarchäologie Aargau', 59],
 ['Natur Museum Luzern', 53],
 ['Zentral und Hochschulbibliothek Luzern', 43]]

## 📰 Sources endpoint

***Sources*** entities in OpenAlex are the **journals, conferences, preprint repositories**, and **institutional repositories** that hold the single publications aka `Works`. For journal articles this will be their journals, for book parts their volume. Sources for instance can also be monograph series like `S4210167446` or repositories like `S2734324842`.

To access data about a certain *Source*, you can use its OpenAlex identifier (starting with a "S" like [S4210209919](https://openalex.org/S4210209919)) or the the popular [ISSN](https://en.wikipedia.org/wiki/International_Standard_Serial_Number) identifier. See also the [Source](https://docs.openalex.org/api-entities/sources) entity documentation and try a random source with `https://api.openalex.org/sources/random`.

In [109]:
base_url = 'https://api.openalex.org/sources/'
params = {'mailto': email} 

#openalex_id = 'S98348390'
#issn = 'issn:0028-0836'   # non-OA journal
issn = 'issn:1654-6369'   # OA journal

r = requests.get(base_url + issn, params=params) 
data = r.json()

In [110]:
list(data)

['id',
 'issn_l',
 'issn',
 'display_name',
 'host_organization',
 'host_organization_name',
 'host_organization_lineage',
 'works_count',
 'cited_by_count',
 'summary_stats',
 'is_oa',
 'is_in_doaj',
 'ids',
 'homepage_url',
 'apc_prices',
 'apc_usd',
 'country_code',
 'societies',
 'alternate_titles',
 'abbreviated_title',
 'type',
 'topics',
 'topic_share',
 'x_concepts',
 'counts_by_year',
 'works_api_url',
 'updated_date',
 'created_date']

In [111]:
# All identifiers. "issn_l" should be used preferably while using OpenAlex API.
data['ids']

{'openalex': 'https://openalex.org/S107893744',
 'issn_l': '1654-6369',
 'issn': ['1654-6369', '1654-4951'],
 'mag': '107893744'}

In [112]:
# Title of the source & name of host organization/publisher & website
print(data['display_name'])
print(data['host_organization_name'])
data['homepage_url']

Ethics & global politics
Taylor & Francis


'https://www.tandfonline.com/toc/zegp20/current'

In [113]:
# Is it an Open Access venue (and is it in DOAJ registry)?
print(data['is_oa'])
data['is_in_doaj']

True


True

In [114]:
# Number of hosted publications
data['works_count']

243

In [115]:
# Topics
[[i['display_name'] for i in data['topics']]]

[['Theoretical Perspectives on Global Justice',
  'The Responsibility to Protect in International Relations',
  'Politics and Social Implications of Immigration',
  'International Criminal Law and Human Rights Obligations',
  'Biopolitics and State of Exception Studies',
  'Human Rights and Development in Global Governance',
  'Ethics of Just War Theory and Self-Defense',
  'Education for Global Citizenship in a Globalized World',
  'Political Thought of Hannah Arendt',
  'Foucauldian Governmentality and Neoliberalism Studies',
  'Foreign Aid and Development Policies',
  'Antonio Gramsci and his Relevance in Contemporary Politics',
  'Consequences of Nuclear War and Global Security',
  'Critique of Political Economy and Capitalist Development',
  'Income Inequality and Poverty Dynamics',
  'Epistemology and Philosophical Knowledge Studies',
  'Moral Distress in Healthcare Professionals',
  'The Methodology of Emancipatory Education',
  'Civil and Religious Law in Europe',
  'Role of Pu

In [116]:
# APC information
data['apc_prices']

[{'price': 0, 'currency': 'USD'}]

In [117]:
# API link to hosted publications
data['works_api_url']

'https://api.openalex.org/works?filter=primary_location.source.id:S107893744'

### Access multiple Source entities

You can access all *Sources* at once, and use filters and groupings on them. See the available filters for *Source* entities [here](https://docs.openalex.org/api-entities/sources/filter-sources) and possible grouping parameters [here](https://docs.openalex.org/api-entities/sources/group-sources).

In the example at the end you will narrow down to open access sources only and group these due to publishers resp. `host_organization`.

In [118]:
# Acessing all sources (sorted by number of works)
url = 'https://api.openalex.org/sources?'
params = {'mailto': email}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 254423,
 'db_response_time_ms': 43,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

In [119]:
# Showing the first 20 biggest sources (name, number of works, OA status)
[[i['display_name'], i['works_count'], i['is_oa']] for i in data['results'][:20]]

[['PubMed', 33075864, False],
 ['PubMed Central', 8009760, True],
 ['Europe PMC (PubMed Central)', 5316266, True],
 ['arXiv (Cornell University)', 3015170, True],
 ['DOAJ (DOAJ: Directory of Open Access Journals)', 2672478, True],
 ['HAL (Le Centre pour la Communication Scientifique Directe)', 2571027, True],
 ['Springer eBooks', 2519831, False],
 ['Zenodo (CERN European Organization for Nuclear Research)', 1405433, True],
 ['RePEc: Research Papers in Economics', 1126422, True],
 ['Social Science Research Network', 1079692, False],
 ['De Gruyter eBooks', 1075912, False],
 ['Routledge eBooks', 977695, False],
 ['Elsevier eBooks', 860097, False],
 ['OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)',
  789870,
  True],
 ['Oxford University Press eBooks', 706494, False],
 ['LA Referencia (Red Federada de Repositorios Institucionales de Publicaciones Científicas)',
  616529,
  True],
 ['Cambridge University Press eBooks', 591290, False],
 ['Lecture notes i

In [120]:
# Inspecting OA sources according to their publishers
url = 'https://api.openalex.org/sources?'
params = {'mailto': email,
         'filter': 'is_oa:True',
         'group_by': 'host_organization'}

r = requests.get(url, params=params) 
data = r.json()
data['meta']

{'count': 44894,
 'db_response_time_ms': 68,
 'page': 1,
 'per_page': 200,
 'groups_count': 200}

In [121]:
pub_data = [[i['key_display_name'], i['count']] for i in data['group_by']]
pub_sum = 0
for i in pub_data:
    pub_sum += i[1]
pub_sum

11560

In [122]:
# Showing the 20 biggest OA publishers (according to number of sources), with percentages
for i in pub_data:
    i.append(str(round(i[1]/pub_sum*100,1)) + '%')
pub_data[:20]

[['Elsevier BV', 1003, '8.7%'],
 ['Hindawi Publishing Corporation', 669, '5.8%'],
 ['De Gruyter Open', 351, '3.0%'],
 ['Multidisciplinary Digital Publishing Institute', 330, '2.9%'],
 ['BioMed Central', 305, '2.6%'],
 ['Science Publishing Group', 271, '2.3%'],
 ['Medknow', 246, '2.1%'],
 ['SAGE Publishing', 244, '2.1%'],
 ['Taylor & Francis', 241, '2.1%'],
 ['Scientific Research Publishing', 234, '2.0%'],
 ['De Gruyter', 219, '1.9%'],
 ['Springer Nature', 184, '1.6%'],
 ['OMICS Publishing Group', 181, '1.6%'],
 ['Frontiers Media', 177, '1.5%'],
 ['Springer Science+Business Media', 174, '1.5%'],
 ['Wiley', 168, '1.5%'],
 ['Bentham Science Publishers', 158, '1.4%'],
 ['Dove Medical Press', 137, '1.2%'],
 ['Süleyman Demirel University', 109, '0.9%'],
 ['Academic Journals', 104, '0.9%']]