In [1]:
import pandas as pd
import requests
import json
import ast

# The API

---

## What is it?

The Pageview API is a collection of REST endpoints that serve analytical data about pageviews in Wikimedia's projects. It's developed and maintained by WMF's Analytics and Services teams, and is implemented using Analytics' Hadoop cluster and RESTBase. This API is meant to be used by anyone interested in pageview statistics on Wikimedia wikis: Foundation, communities, and the rest of the world.

Data goes back to May 1st, 2015.

## How to access

The API is accessible via https at wikimedia.org/api/rest_v1. As it is public, it doesn't need authentication and it supports CORS. The urls are structured like this:

```/metrics/pageviews/{endpoint}/{parameter 1}/{parameter 2}/.../{parameter N}```

https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews

https://wikimedia.org/api/rest_v1/

## Daily counts

Get a pageview count timeseries of en.wikipedia's article Albert Enstein for the month of October 2015

In [30]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Albert_Einstein/daily/2015100100/2015103100')
x = r.json()
a = pd.DataFrame(pd.read_json(json.dumps(x)))
b = a['items'].apply(pd.Series)
b

Unnamed: 0,access,agent,article,granularity,project,timestamp,views
0,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100100,18860
1,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100200,20816
2,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100300,16009
3,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100400,19494
4,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100500,21198
5,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100600,22515
6,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100700,22269
7,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100800,20835
8,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015100900,18319
9,all-access,all-agents,Albert_Einstein,daily,en.wikipedia,2015101000,17088


Get a pageview count timeseries of de.wikipedia's article Johann Wolfgang von Goethe from October 13th 2015 to October 27th 2015 counting only the pageviews generated by human users

In [31]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/de.wikipedia/all-access/user/Johann_Wolfgang_von_Goethe/daily/2015101300/2015102700')
x = r.json()
c = pd.DataFrame(pd.read_json(json.dumps(x)))
d = c['items'].apply(pd.Series)
d

Unnamed: 0,access,agent,article,granularity,project,timestamp,views
0,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101300,2934
1,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101400,3104
2,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101500,2938
3,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101600,2420
4,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101700,2305
5,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101800,2739
6,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015101900,3344
7,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015102000,3068
8,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015102100,3048
9,all-access,user,Johann_Wolfgang_von_Goethe,daily,de.wikipedia,2015102200,2973


Get the number of pageviews of es.wiktionary's entry hoy generated via mobile web on November 1st, 2015

In [32]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/es.wiktionary/mobile-web/all-agents/hoy/daily/2015110100/2015110100')
x = r.json()
e = pd.DataFrame(pd.read_json(json.dumps(x)))
f = e['items'].apply(pd.Series)
f

Unnamed: 0,access,agent,article,granularity,project,timestamp,views
0,mobile-web,all-agents,hoy,daily,es.wiktionary,2015110100,2


## Monthly counts

Get a monthly pageview count de.wikipedia's article Barack_Obama for the year 2016

In [33]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/de.wikipedia/all-access/all-agents/Barack_Obama/monthly/2016010100/2016123100')
x = r.json()
g = pd.DataFrame(pd.read_json(json.dumps(x)))
h = g['items'].apply(pd.Series)
h

Unnamed: 0,access,agent,article,granularity,project,timestamp,views
0,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016010100,112458
1,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016020100,114036
2,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016030100,134238
3,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016040100,127099
4,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016050100,96700
5,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016060100,76502
6,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016070100,85732
7,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016080100,75056
8,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016090100,87647
9,all-access,all-agents,Barack_Obama,monthly,de.wikipedia,2016100100,117706


## Slice and dice pageview counts

Get a daily pageview count timeseries of all projects for the month of October 2015

In [34]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/daily/2015100100/2015103100')
x = r.json()
i = pd.DataFrame(pd.read_json(json.dumps(x)))
j = i['items'].apply(pd.Series)
j

Unnamed: 0,access,agent,granularity,project,timestamp,views
0,all-access,all-agents,daily,all-projects,2015100100,614236484
1,all-access,all-agents,daily,all-projects,2015100200,594526358
2,all-access,all-agents,daily,all-projects,2015100300,580496576
3,all-access,all-agents,daily,all-projects,2015100400,629356620
4,all-access,all-agents,daily,all-projects,2015100500,653133733
5,all-access,all-agents,daily,all-projects,2015100600,666303084
6,all-access,all-agents,daily,all-projects,2015100700,658445283
7,all-access,all-agents,daily,all-projects,2015100800,623923450
8,all-access,all-agents,daily,all-projects,2015100900,598831259
9,all-access,all-agents,daily,all-projects,2015101000,583322547


Get an hourly timeseries of all project's pageviews belonging to human users visiting the mobile app on October 1st, 2015

In [35]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/mobile-app/user/hourly/2015100100/2015100123')
x = r.json()
k = pd.DataFrame(pd.read_json(json.dumps(x)))
l = k['items'].apply(pd.Series)
l

Unnamed: 0,access,agent,granularity,project,timestamp,views
0,mobile-app,user,hourly,all-projects,2015100100,190238
1,mobile-app,user,hourly,all-projects,2015100101,197807
2,mobile-app,user,hourly,all-projects,2015100102,199163
3,mobile-app,user,hourly,all-projects,2015100103,193361
4,mobile-app,user,hourly,all-projects,2015100104,190302
5,mobile-app,user,hourly,all-projects,2015100105,188688
6,mobile-app,user,hourly,all-projects,2015100106,188102
7,mobile-app,user,hourly,all-projects,2015100107,184144
8,mobile-app,user,hourly,all-projects,2015100108,179789
9,mobile-app,user,hourly,all-projects,2015100109,186098


Get the number of pageviews of ca.wikipedia generated by spiders on mobile web on November 1st, 2015

In [36]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/ca.wikipedia/mobile-web/spider/daily/2015110100/2015110100')
x = r.json()
m = pd.DataFrame(pd.read_json(json.dumps(x)))
n = m['items'].apply(pd.Series)
n

Unnamed: 0,access,agent,granularity,project,timestamp,views
0,mobile-web,spider,daily,ca.wikipedia,2015110100,31434


## Most viewed articles

Get the top 1000 most visited articles from en.wikipedia for October 10th, 2015

In [43]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/10/10')
x = r.json()
o = pd.DataFrame(pd.read_json(json.dumps(x)))
p = o['items'].apply(pd.Series)
p

Unnamed: 0,access,articles,day,month,project,year
0,all-access,"[{'article': 'Main_Page', 'views': 18793503, '...",10,10,en.wikipedia,2015


Get the top 1000 articles from pt.wikipedia visited via the mobile app on November 1st, 2015

In [40]:
r = requests.get(url='http://wikimedia.org/api/rest_v1/metrics/pageviews/top/pt.wikipedia/mobile-app/2015/11/01')
x = r.json()
q = pd.DataFrame(pd.read_json(json.dumps(x)))
r = q['items'].apply(pd.Series)
r

Unnamed: 0,access,articles,day,month,project,year
0,mobile-app,"[{'article': 'Wikipedia:Página_principal', 'vi...",1,11,pt.wikipedia,2015


Get the top 1000 most visited articles from en.wikisource for all days in October, 2015

In [44]:
r = requests.get(url='https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikisource/all-access/2015/10/all-days')
x = r.json()
s = pd.DataFrame(pd.read_json(json.dumps(x)))
t = s['items'].apply(pd.Series)
t

Unnamed: 0,access,articles,day,month,project,year
0,all-access,"[{'article': 'Main_Page', 'views': 92640, 'ran...",all-days,10,en.wikisource,2015


## Pageviews for ALL projects

### Daily

In [45]:
r = requests.get(url='https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/daily/2015100100/2015103000')
x = r.json()
u = pd.DataFrame(pd.read_json(json.dumps(x)))
v = u['items'].apply(pd.Series)
v

Unnamed: 0,access,agent,granularity,project,timestamp,views
0,all-access,all-agents,daily,all-projects,2015100100,614236484
1,all-access,all-agents,daily,all-projects,2015100200,594526358
2,all-access,all-agents,daily,all-projects,2015100300,580496576
3,all-access,all-agents,daily,all-projects,2015100400,629356620
4,all-access,all-agents,daily,all-projects,2015100500,653133733
5,all-access,all-agents,daily,all-projects,2015100600,666303084
6,all-access,all-agents,daily,all-projects,2015100700,658445283
7,all-access,all-agents,daily,all-projects,2015100800,623923450
8,all-access,all-agents,daily,all-projects,2015100900,598831259
9,all-access,all-agents,daily,all-projects,2015101000,583322547


### Monthly

In [46]:
r = requests.get(url='https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/monthly/2015100100/2016103000')
x = r.json()
w = pd.DataFrame(pd.read_json(json.dumps(x)))
x = w['items'].apply(pd.Series)
x

Unnamed: 0,access,agent,granularity,project,timestamp,views
0,all-access,all-agents,monthly,all-projects,2015100100,19551810896
1,all-access,all-agents,monthly,all-projects,2015110100,19269221845
2,all-access,all-agents,monthly,all-projects,2015120100,19012482674
3,all-access,all-agents,monthly,all-projects,2016010100,20865413322
4,all-access,all-agents,monthly,all-projects,2016020100,19491676571
5,all-access,all-agents,monthly,all-projects,2016030100,20053032262
6,all-access,all-agents,monthly,all-projects,2016040100,19999876232
7,all-access,all-agents,monthly,all-projects,2016050100,19918073483
8,all-access,all-agents,monthly,all-projects,2016060100,18955145337
9,all-access,all-agents,monthly,all-projects,2016070100,19188170725
