# Riksdagen corpus v0.13.0

2024-01-15

This Colab notebook demonstrates how to quickly access data from the Riksdagen corpus.

First, we download and unzip the data. On your local machine, you can also use your browser to download the file, just use the link below.

In [1]:
!wget https://github.com/welfare-state-analytics/riksdagen-corpus/releases/latest/download/corpus.zip --show-progress
!7z x corpus.zip

--2024-01-15 14:54:52--  https://github.com/welfare-state-analytics/riksdagen-corpus/releases/latest/download/corpus.zip
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/welfare-state-analytics/riksdagen-corpus/releases/download/v0.13.0/corpus.zip [following]
--2024-01-15 14:54:52--  https://github.com/welfare-state-analytics/riksdagen-corpus/releases/download/v0.13.0/corpus.zip
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/346788931/27860700-b906-43c3-a3ba-ffaee1bded4a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240115%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240115T135420Z&X-Amz-Expires=300&X-Amz-Signature=51

Now we can start to work with the data. For that, we need a couple of python modules. Let's install them and set things up

In [2]:
%pip install pyriksdagen
from lxml import etree
import progressbar, argparse
from pyparlaclarin.read import paragraph_iterator, speeches_with_name
from pyriksdagen.utils import protocol_iterators

# We need a parser for reading in XML data
parser = etree.XMLParser(remove_blank_text=True)

Note: you may need to restart the kernel to use updated packages.


Now we can go over some protocols from, say, 1955-1956.

In [3]:
protocols = list(protocol_iterators("corpus/protocols/", start=1955, end=1956))
protocols[:5]

['corpus/protocols/1955/prot-1955--ak--001.xml',
 'corpus/protocols/1955/prot-1955--ak--002.xml',
 'corpus/protocols/1955/prot-1955--ak--003.xml',
 'corpus/protocols/1955/prot-1955--ak--004.xml',
 'corpus/protocols/1955/prot-1955--ak--005.xml']

It is straightforward to print out all content, including speeches, dates, speaker introductions and topic titles.

In [4]:
# Select a protocol cause it's a whole lot of text
protocol_in_question = protocols[12]
print(protocol_in_question)
root = etree.parse(protocol_in_question, parser).getroot()

corpus/protocols/1955/prot-1955--ak--013.xml


In [5]:
for elem in list(paragraph_iterator(root, output="lxml"))[:7]:
  print(" ".join(elem.itertext()))



          RIKSDAGENS Ar PROTOKOLL
        

          1955 ANDRA KAMMAREN Nr 13
        

          13—15 april
        

          Debatter m. m.
        

          Onsdagen den 13 april Sid.
        

          Familjerådgivning «... 5 Interpellation av herr Ericsson : i
          Näs ang. de minskade perioderna
        


Moreover, the metadata catalogues are also available. They are stored as CSV files, and can be accessed with pandas in python, or the spreadsheet program of your choice.

In [6]:
import pandas as pd
from pyriksdagen.db import filter_db
from pyriksdagen.utils import parse_date

mop = pd.read_csv("corpus/metadata/member_of_parliament.csv")
name = pd.read_csv("corpus/metadata/name.csv")
name = name[name["primary_name"]][["swerik_id", "name"]]
person = pd.read_csv("corpus/metadata/person.csv")

# We merge mandate periods of the MOPs with the names of the MOPs
mop = mop.merge(name, on="swerik_id", how="left")
# Let's also add person-level metadata, such as birth year and gender
mop = mop.merge(person, on="swerik_id", how="left")
mop

Unnamed: 0,swerik_id,start,end,district,role,name,born,dead,gender,riksdagen_id
0,i-PFAPNmRqeUAaxDzNRTG1x1,1867,1867,Eskilstuna och Strängnäs valkrets,andrakammarledamot,Sven Palmgren,1821-08-30,1880-09-29,man,
1,i-QSYHiJ6G54WwZYYDpVnD4u,1867,1867,Västra Götalands läns västra valkrets,förstakammarledamot,Gustaf Daniel Björck,1806-05-30,1888-01-03,man,
2,i-Ddmtm1uG9esPH37c8XjUXZ,1867,1867,Torna härads valkrets,andrakammarledamot,Robert De la Gardie den äldre,1823-12-17,1916-05-19,man,
3,i-65rmwEXkkhA1kxSrD4oMUw,1867,1867,Värmlands läns valkrets,förstakammarledamot,Gustaf Ekman,1804-05-26,1876-05-03,man,
4,i-AvGpsGJvs5PXTcEG4DtbFt,1867,1867,Kristianstads läns valkrets,förstakammarledamot,Eric Gyllenstierna,1825-01-26,1870-08-09,man,
...,...,...,...,...,...,...,...,...,...,...
13198,i-SRx5MYgiUBkgGLPueXgws5,2023-11-14,,Skåne läns västra valkrets,ledamot,Ola Möller,1983-02-06,,man,0271338654822
13199,i-AvpuCBNCdN7LHjCsU7ixe5,2023-11-27,,Malmö kommuns valkrets,ledamot,Zinaida Kajevic,1979,,woman,0990587970528
13200,i-59GzwSB23e1tdcfxS3Mstm,2024-01-01,,Värmlands läns valkrets,ledamot,Mona Smedman,1975,,woman,0287315215921
13201,i-x1CuoKmRHYgQr9i2kh3B5,,,,ledamot,Bengt Nording,,,man,


Let's find a specific person based on their name, for example Elis Håstad

In [7]:
# Let's find mr. Håstad
mop[mop.name.str.contains("Håstad")]

Unnamed: 0,swerik_id,start,end,district,role,name,born,dead,gender,riksdagen_id
4463,i-S9dMiL9yRpDaYPfVGjK1Gp,1941-12-05,1959-05-07,Stockholms kommuns valkrets,andrakammarledamot,Elis Håstad,1900-01-18,1959-05-07,man,


His identifier is i-S9dMiL9yRpDaYPfVGjK1Gp. Using that, we can find all his speeches. Let's do that and print out the first one

In [8]:
# Elis Håstad (i-S9dMiL9yRpDaYPfVGjK1Gp)
hastad_speeches = []
for protocol in progressbar.progressbar(protocols):
  root = etree.parse(protocol, parser).getroot()
  protocol_speech = []
  for speech in speeches_with_name(root, name="i-S9dMiL9yRpDaYPfVGjK1Gp"):
    protocol_speech.append(speech)
  protocol_speech = "\n".join(protocol_speech).strip()
  if protocol_speech != "":
    hastad_speeches.append(protocol_speech)

print(hastad_speeches[0])

100% (130 of 130) |######################| Elapsed Time: 0:00:00 Time:  0:00:00


Herr talman! Då debatten är så långt framskriden, skall jag inte
            gå in på hela detta stora ämne. Jag kan i allt väsentligt instämma
            i de rent praktiska synpunkter, som herr Fast här har givit till
            känna. Jag skall inte heller gå in på frågan om det stora intrång
            på den kyrkliga självstyrelsen, som motionen evident syftar till.
          

            Vad jag skulle vilja begränsa mig till är frågan, huruvida herr
            Lundberg och herr Edberg och övriga motionärer i denna kammare,
            när de nu kräver att en stor del av beslutanderätten på det kyrkliga
            området skall överflyttas till den borgerliga kommunen, företräder
            en mera demokratisk åsikt än anhängarna av den nu gällande ordningen.
            Det har väl allmänt betraktats som en stor vinning, att vi 1862
            gjorde en åtskillnad mellan den borgerliga och den kyrkliga kommunen.
            Denna vinning har ju bestått sedan man på bägge