This handbook is a summary of https://m.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial with some parts taken to other tutorials. It is build for quick reference. It is recommanded to read the tutorials before using it. These tutorials are released under Creative Commons Attribution-ShareAlike License. Feel free to copy and adapt the Notebook under this licence.


# I/ Initialization

Import Pywikibot and set the site to work on.

In [None]:
import pywikibot

wikidata_site = pywikibot.Site("wikidata", "wikidata")
wikidata_repo = wikidata_site.data_repository()

# II/ Modifiying a textual page

This manipulation is available on all Wikis. It let modify a simple text page. Not an item.

In [None]:
page = pywikibot.Page(wikidata_site, 'Wikidata:WikiProject_Materials/Test')
page.exists()

In [None]:
page.text

In [None]:
page.text = 'Hello world !!!'
page.save(u"Testing Pywikibot (sandbox page)") #Beware, this replace all the content !

In [None]:
page.text

# III/ The "ItemPage" class

ItemClass are Pywikibot object that store Wikidata items data.

In [None]:
item = pywikibot.ItemPage(wikidata_repo, "Q2225")
print(item)

print('\nItemsPage are Pywikibot objects:')
print(type(item))
item

In [None]:
# Showing methods for this object
dir(item)

In [None]:
#Methods to print the title and properties
print('\nitem.title() = ' + str(item.title()))
print('\nitem.properties() = ' + str(item.properties()))

In [None]:
# The get() method unpack the data of the item
item_dict = item.get()
print('get() provide a: '+ str(type(item_dict))+'\n')
item_dict

In [None]:
 # Go to the claim dictionary
clm_dict = item_dict["claims"]
clm_dict

In [None]:
#Focus on a specific property's claims list
clm_list = clm_dict["P2069"]
print(clm_list)

In [None]:
#Get general data
repo = wikidata_site.data_repository()  # this is a DataSite object
item = pywikibot.ItemPage(repo, 'Q42')  # This will be functionally the same as the other item we defined
item.get()  # you need to call it to access any data.
sitelinks = item.sitelinks
aliases = item.aliases
if 'en' in item.labels:
    print('The label in English is: ' + item.labels['en'])
if item.claims:
    if 'P31' in item.claims: # instance of
        print(item.claims['P31'][0].getTarget())
        print(item.claims['P31'][0].sources[0])  # let's just assume it has sources.

# IV / The "Claim" class

A Pywikibot object that store Wikidata claims

In [None]:
#For each claim in the claim list get the content of the claim

for clm in clm_list:
    print(clm.toJSON())
    
    print('\nThe claim is another Pywikibot object:')
    print(type(clm))
    
    print('\nThe claim has several methods:')
    print(dir(clm))
    
    print('\nclm.rank = ' + str(clm.rank))
    print('\nclm.id = ' + str(clm.id))
    print('\nclm.isReference = ' + str(clm.isReference))
    print('\nclm.snak = ' + str(clm.snak))
    print('\nclm.on_item = ' + str(clm.on_item))
    

# V / The target classes

These are Pywikibot object that store Wikidata claims's target.

There is actually several classes depending on the datatype. Herabove a WbQuantity object that represent a quantity with an upper and a lower bound.

In [None]:
#Claim's target is the value of the property
for clm in clm_list:
    
    print('\nFocus on a claim\'s target using getTarget() method:')
    clm_trgt = clm.getTarget()
    print(clm_trgt)
    
    print('\nTarget type is:')
    print(type(clm_trgt))
    
    print('\nTarget class\'s methods are:')
    print(dir(clm_trgt))
    
    print('\nclm_trgt.amount = ' + str(clm_trgt.amount))
    print('\nclm_trgt.unit = '+ str(clm_trgt.unit))

# VI/ Selecting Item by Wikidata Statement

This is the method to select items using a SPARQL query.

In [None]:
from pywikibot import pagegenerators as pg

with open('pka-query.rq', 'r') as query_file: #The Query is stored in a separated file.
    QUERY = query_file.read()
    print('The content of pka-query.rq is:\n\n'+QUERY)

wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)

print('\n\nItems in the generator are:')
for item in generator:
    print(item)

It is of course possible to pass arguments in the string :

In [None]:
property = 'P1117'

QUERY2 = f'''
SELECT ?item ?value
WHERE 
{{
  ?item wdt:{property} ?value .
}}
'''

generator2 = pg.WikidataSPARQLPageGenerator(QUERY2, site=wikidata_site)

print('Items in the generator are:')
for item in generator2:
    print(item)

# VII/ Using the sandbox wikidata site for tests

We will use the sandbox site on https://test.wikidata.org/wiki/Q194617

In [None]:
site = pywikibot.Site("test", "wikidata")
repo = site.data_repository()
item = pywikibot.ItemPage(repo, "Q194617")

# VIII/ Changing labels, descriptions and alias

We can use specific Pywikibot functions :

In [None]:
new_labels = {"en": "bear2", "de": "Bär2"}
new_descr = {"en": "gentle creature of the forrest2", "de": "Friedlicher Waldbewohner2"}
new_alias = {"en": ["brown bear2", "grizzly bear2", "polar bear2"], "de": ["Braunbär2", "Grizzlybär2", "Eisbär2"]}
item.editLabels(labels=new_labels, summary="Setting new labels2.")
item.editDescriptions(new_descr, summary="Setting new descriptions2.")
item.editAliases(new_alias, summary="Setting new aliases2.")

Or we can use the general editEntity() function :

In [None]:
data = {"labels": {"en": "bear", "de": "Bär"},
  "descriptions": {"en": "gentle creature of the forrest", "de": "Friedlicher Waldbewohner"},
       "aliases": {"en": ["brown bear", "grizzly bear", "polar bear"], "de": ["Braunbär", "Grizzlybär", "Eisbär"]},
     "sitelinks": [{"site": "enwiki", "title": "Bear"}, {"site": "dewiki", "title": "Bär"}]}
item.editEntity(data, summary=u'Edited item: set labels, descriptions, aliases')

# IX/ Changing claims

In this example, we will change a propery "color" wrongly set to homonymes.

Beware that it is a real example, modifying the real Wikidata.

In [None]:
import pywikibot
from pywikibot import pagegenerators as pg

wikidata_site = pywikibot.Site("wikidata", "wikidata")
wikidata_repo = wikidata_site.data_repository()

property = "P462" # The property "Color" that is to be settled to an item representing a color.

#The error dict has homonymes as keys and colors as values.
#The, items having the color property set a key have to be changed for the value.
error_dict = {"Q13191": "Q39338",    #orange - "fruit": "color"
              "Q897": "Q208045",     #gold - "element": "color"
              "Q753": "Q2722041",   #copper - "element": "color"
              "Q25381": "Q679355",   #amber - "material": "color"
              "Q134862": "Q5069879", #champagne - "drink": "color"
              "Q1090": "Q317802",    #silver - "element": "color"
              "Q1173": "Q797446",    #burgundy - "region": "color
              "Q13411121": "Q5148721", #peach - "fruit": "color"
              }

def correct_claim(generator, key):
    '''This function call call a generator result to find a list of items having
    the color property setted to a key, that represent an homonyme.'''
    for page in generator:
        item_dict = page.get()  #calling the dictionary containing all the values of the item.
        claim_list = item_dict["claims"][property] #calling the claims related the color property (variable settled above)
        for claim in claim_list:
            trgt = claim.getTarget()
            if trgt.id == key: #if a claim target is settled to a key of the errors dictionary
                print(f'Correcting {key} to {error_dict[key]}')
                correct_page = pywikibot.ItemPage(wikidata_repo, error_dict[key], 0) #gettting the right value for the property from the error dict
                claim.changeTarget(correct_page) #changing the target to the right value

for key in error_dict:
    query = f'''
    SELECT ?item
    WHERE 
    {{
      ?item wdt:{property} wd:{key} .
     }}
    ''' #calls items having color settle to an homonyme.
    generator = pg.WikidataSPARQLPageGenerator(query, site=wikidata_site)  #a generator store these items
    generator = wikidata_site.preloadpages(generator, pageprops=True) #improves performance ?
    correct_claim(generator, key)
    

# X/ Adding claims

**This advanced code create a full claim with a quantity ± uncertainity and sources :**

In [None]:
import pywikibot
from pywikibot.data import api
import pprint

# FIXME Hardcoded for test.wikidata
# Define properties and data
p_stated_in = "P149"
p_half_life = "P525"
p_ref_url = "P93"
precision = 10 ** -10
# data = [quantity, uncertainty, unit (Q1748 = hours)]
# source = [stated in item, ref url]
half_life_data = {"uranium-240": {"data": ["14.1", "0.1", "Q1748"],
                                  "source": ["Q1751", "http://www.nndc.bnl.gov/chart/reCenter.jsp?z=92&n=148"]}
                  }

site = pywikibot.Site("test", "wikidata") #Please only modify the test site unless you know what you do !
repo = site.data_repository()

def get_items(site, item_title):
    """
    Requires a site and search term (item_title) and returns the results.
    """
    params = {"action": "wbsearchentities",
              "format": "json",
              "language": "en",
              "type": "item",
              "search": item_title}
    request = api.Request(site=site, **params)
    return request.submit()

def check_claim_and_uncert(item, property, data):
    """
    Requires a property, value, uncertainty and unit and returns boolean.
    Returns the claim that fits into the defined precision or None.
    This will be used to see if the claim is already settled
    """
    item_dict = item.get()
    value, uncert, unit = data
    value, uncert = float(value), float(uncert)
    try:
        claims = item_dict["claims"][property]
    except:
        return None

    try:
        claim_exists = False
        uncert_set = False
        for claim in claims:
            wb_quant = claim.getTarget()
            delta_amount = wb_quant.amount - value
            if abs(delta_amount) < precision:
                claim_exists = True
            delta_lower = wb_quant.amount - wb_quant.lowerBound
            delta_upper = wb_quant.upperBound - wb_quant.amount
            check_lower = abs(uncert - delta_lower) < precision
            check_upper = abs(delta_upper - uncert) < precision
            if check_upper and check_lower:
                uncert_set = True

            if claim_exists and uncert_set:
                return claim
    except:
        return None

def check_source_set(claim, property, data):
    """
    Takes a claim, a property and data.
    Return a boolean
    This will be used to see if the property is already settled
    """
    source_claims = claim.getSources()
    if len(source_claims) == 0:
        return False #if there is no sources, return false

    for source in source_claims:
        try:
            stated_in_claim = source[p_stated_in] #check if the property "stated in" is set
        except:
            return False #if not, we can create a new reference
        for claim in stated_in_claim:
            trgt = claim.target
            if trgt.id == data[0]:
                return True # return true only if the claim exists and is settled accoding to our import dataset

def set_claim(item, property, data):
    """
    Set the claim's property according to our import data
    """
    value, uncert, unit = data #get these variables for our import data
    value, uncert = float(value), float(uncert)
    claim = pywikibot.Claim(repo, property) #create a claim object with wanted property
    unit_item = pywikibot.ItemPage(repo, unit) #create a unit item object with wanted unit
    entity_helper_string = "http://test.wikidata.org/entity/Q1748".format()
    wb_quant = pywikibot.WbQuantity(value, entity_helper_string, uncert) #create a quantity object with wanted quantity
    claim.setTarget(wb_quant) #modify the created claim object and add quantity
    
    item.addClaim(claim, bot=False, summary="Adding half-life claim from NNDC.") #by the end we can add the claim object in the item
    
    print('Running set_claim...\n   On item: ' + str(item) + '\n   Setting claim:\n' + str(claim) + '\n')
    
    return claim

def create_source_claim(claim, source_data):
    trgt_item, ref_url = source_data
    trgt_itempage = pywikibot.ItemPage(repo, trgt_item) #create an item object for the source
    source_claim = pywikibot.Claim(repo, p_stated_in, isReference=True) #create the claim object for the source
    source_claim.setTarget(trgt_itempage) #set the item source object as a target for the claim object
    
    claim.addSources([source_claim]) #by the end we can add the source to the claim
    
    print('Running create_source_claim...\n\n   On claim :\n' + str(claim) + '\n\n   Setting source:\n' + str(source_claim) + '\n')
    
    return True

for key in half_life_data: #since there is actually only 1 key in our example the loop will run once
    search_results = get_items(site, key)
    print('Value of search_result is:\n' + str(search_results) + '\n')
    if len(search_results["search"]) == 1: #only one item shall match the search term (uranium-240)
        item = pywikibot.ItemPage(repo, search_results["search"][0]["id"]) #we will modifiy this item
        print('Value of item from results is:' + str(item) + '\n')
        data = half_life_data[key]["data"] #getting the value of the property from the data we want to import
        print('Value of data is:' + str(data) + '\n')
        source_data = half_life_data[key]["source"] #getting the value of the source of the property from the data we want to import
        print('Value of source_data is:' + str(source_data) + '\n')

        claim = check_claim_and_uncert(item, p_half_life, data) #check if our claim is already settled correctly in wikidata
        print('Value of claim from check_claim_and_uncert is : ' + str(claim) + '\n')
        if claim: # if the claim already exist, we will check if the source exists too and create it if not 
            source = check_source_set(claim, key, source_data)
            print('Value of source is: ' + str(source) + '\n')
            if source:
                pass
            else:
                create_source_claim(claim, source_data)
        else: #if the claim does not exists, we will create it with source
            claim = set_claim(item, p_half_life, data)
            create_source_claim(claim, source_data)
            
    else: #only one item shall match the search term (uranium-240). If more, the program don't know wich one to settle.
        print("No result or too many found for {}.", key)


# XI/ Code templates

These are code templates for copy past. It has already be described above.

In [None]:
# Typical generator to iterate over a SPARQL query

import pywikibot
from pywikibot import pagegenerators as pg

with open('pka-query.rq', 'r') as query_file:
    QUERY = query_file.read()

wikidata_site = pywikibot.Site("wikidata", "wikidata")
generator = pg.WikidataSPARQLPageGenerator(QUERY, site=wikidata_site)

print('Items in the generator are:')
for item in generator:
    print(item)

In [None]:
#Create Items

import pywikibot
site = pywikibot.Site("test", "wikidata")

def create_item(site, label_dict):
    new_item = pywikibot.ItemPage(site)
    new_item.editLabels(labels=label_dict, summary="Setting labels")
    # Add description here or in another function
    return new_item.getID()

some_labels = {"en": "Hamburg Main Station", "de": "Hamburg Hauptbahnhof"}
new_item_id = create_item(site, some_labels)