Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Setup quality control for MP database #265

Closed
6 of 10 tasks
MansMeg opened this issue Apr 5, 2023 · 35 comments
Closed
6 of 10 tasks

Setup quality control for MP database #265

MansMeg opened this issue Apr 5, 2023 · 35 comments

Comments

@MansMeg
Copy link
Collaborator

MansMeg commented Apr 5, 2023

We want to add two data integrity tests for the MP database that can be checked automatically.

  1. Are all persons in the registers (@emla5688 s files) included in the database
  2. Are all iort in the registers (@emla5688 s files) included in the database
  3. Do we have the correct number om MPs at each date in the parliament

The files (wd) can be found here:
https://github.com/MansMeg/Wikidata_riksdagen-corpus/tree/main/data

So what we should do is to:

  • Combine @emla5688 and Salgos files into a quality control file in the quality assessment folder. Say call it known_mps.csv. I described in a README here how to set up the file: https://github.com/welfare-state-analytics/riksdagen-corpus/tree/quality_assesment_feature/corpus/quality_assesment/known_mps
    Note: There are duplicates. See: FYI dubletter i WD Register - Enkammarriksdagen 1971-1993_94 Band1-2  #270
  • Check that all these mps are included in our MP database as a test suite.
  • Fix the mps that are missing in our database so they are included. This might include fixing stuff at wikidata. Errors or missing data at wikidata should be filed as a separate issue so we can get some help from the wikidata people.
  • Check that all the iort in these files are included in our MP database as a test suite
  • Check that all names in these files are included in our database. Especially the duplicaded names in FYI dubletter i WD Register - Enkammarriksdagen 1971-1993_94 Band1-2  #270
  • Fix iorts that are missing in our database so they are included. This might be adding stuff to wikidata or removing it. Errors or missing data at wikidata should be filed as a separate issue so we can get some help from the wikidata people.
  • Include the number of people per party and year in the quality assessment folder. @fredrik1984 has a book with this information for a large part of the period. Create a csv-file called party_mp_frequencies.csv where the frequency of each party is included. I describe how here: https://github.com/welfare-state-analytics/riksdagen-corpus/tree/quality_assesment_feature/corpus/quality_assesment/party_mp_frequencies
  • Check that this number (or more due to replacements) of people included in the MP database for all days during that year. We know the minimum number of persons that should be included is the number of seats. If we have the information at a daily level we can also easily see when there are too many for a longer time periods (a replacement mean that there can be more than the set number for a day). Then check that (a) there are not too many MPs for a longer time period (like a month) and (b) there are no days where there fewer than the number of positions in the parliament.
  • Further add the same frequencies for the one-chamber period by taking the frequencies from Wikipedia. Add them to the same file and also check the MPs for this period.
  • Again, check that this number (or more due to replacements) of people included in the MP database as above.
@MansMeg MansMeg added this to the MP database milestone Apr 5, 2023
@MansMeg

This comment was marked as resolved.

@salgo60

This comment was marked as resolved.

@fredrik1984
Copy link
Collaborator

fredrik1984 commented Apr 5, 2023

The book that Måns refer to that I have is Nils Stjernqvist "Tvåkammartiden" from 1996.
img_8342_720

See also mail from Lotta 2023-04-13 that confirms that these tables are correct:

Har nu tittat i Stjernquist och Nilsson och Hägglund https://www.riksdagen.se/globalassets/15.-bestall-och-ladda-ned/ovriga-trycksaker/riksbankens-jubileumsfond---tvakammarriksdagen-1867-1970.pdf
Vad jag kan utläsa där så har det ändrats över tid.
Stjernquist s. 107-108
S. 107 Stjernqvist: Ledamotsantalet hade inte fixerats till bestämda tal 1866. Växande folkmängd - ökade antalet ledamöter.
1867: 125 FK och 190 AK
Reform 1894 ("vingklippningen") Då fastställdes: 150 FK och 230 AK.
I takt med avfolkning på landsbygden: 1909, varje andrakammarvalkrets var garanterad tre mandat. Utökades 1953, utökad minimigaranti till fem mandat (om man var berättigad tre mandat). 1957 även för första kammaren. Garantireglerna (s. 108) gjorde att det blev 151 FK och 233 AK.

Nilsson och Hägglund s. 46 - ger samma antal mandat i kamrarna som Stjernquist för perioden 1945-1969.

@fredrik1984
Copy link
Collaborator

fredrik1984 commented Apr 6, 2023

And as Väinö mentioned, we should get the exact dates for which day the Riksdag opened and which it closed. This will differ over the years so it probably will require some manual work. In this wiki you find that information for every Riksdag year: https://sv.wikipedia.org/wiki/Riksm%C3%B6tets_%C3%B6ppnande

@salgo60

This comment was marked as off-topic.

@MansMeg

This comment was marked as resolved.

@MansMeg MansMeg changed the title Setup quality assesment documents for MPs Setup quality control for MP database Apr 7, 2023
@salgo60

This comment was marked as resolved.

@MansMeg

This comment was marked as resolved.

@BobBorges
Copy link
Collaborator

BobBorges commented Apr 14, 2023

Do we know anything about these Nils Wallis instances?
image
Edit: I stuck the wrong screenshot in there first. sorry for the confusion.

@BobBorges
Copy link
Collaborator

@fredrik1984 I just edited the comment. I stuck in the wrong screenshot. I added a dummy wiiki_id because I couldn't find either of them on wikidata, but I didn't want to delete them from Emil's list.

@fredrik1984
Copy link
Collaborator

fredrik1984 commented Apr 14, 2023

Hm, I suspect it is this person: Nils Wallmark i Smedsbyn https://www.wikidata.org/wiki/Q6231420 and Nils Petter Wallmark i Selånger https://www.wikidata.org/wiki/Q6231427

I don't know how they ended up as Wallis though. Maybe @emla5688 or @salgo60 has a clue?

@salgo60
Copy link
Contributor

salgo60 commented Apr 14, 2023

I am out walking but we had an error in on one version of the list on "Wallis" I can check later tonight

check if you find Wallis in the book if not its just a typo...

@BobBorges
Copy link
Collaborator

There were no wiki IDs for those two entries... I'm going through the whole list now b/c some of the surname/iort columns have commas that throw off the csv alignment.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 14, 2023

Do you have the book? Then you can look them up using the references (i.e. 5:444).

As Magnus said, I also remember there was an issue here. Maybe @emla5688 knows?

@fredrik1984
Copy link
Collaborator

@BobBorges — I am sending over the two biography books over Swedish MPs from 1867 to 1994 to you now over Sprend (they are very big pdf-files)

@BobBorges
Copy link
Collaborator

ok!

@BobBorges
Copy link
Collaborator

Indeed it was the Wallmarks @fredrik1984 mentioned above.

@emla5688
Copy link
Contributor

In my file there is no wiki-id for them, but according to the book it is supposed to be Wallmark i Selånger and Wallmark i Smedsbyn

@emla5688
Copy link
Contributor

Yeah, I checked and @fredrik1984 is correct with the wiki-ids

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 14, 2023

Great!

@BobBorges
Copy link
Collaborator

I want to reread Måns' instructions before I commit / push anything, but some unit tests are ready:
image

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 14, 2023

This looks excellent!

Now we have a failing test. Now the next step is to identify why the test fails. =) Why are these missing?

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 14, 2023

Feel free to do a PR to the dev branch, then me and @ninpnin can do a code review.

@BobBorges
Copy link
Collaborator

I was hoping there would be lots of overlap in missing IDs (new unit tests fail because of the same IDs), but unfortunately that seems not to be the case: There are 1400+ "issue" IDs from the Emil/Magnus files. Additionally some things to sort out before we apply this new MP catalog to the corpus metadata:

  1. there are duplicate IDs -- more-or-less obviously refer to the same person, others perhaps refer to different people. How should we handle this?

  2. There are three people without wiki_ids. I can find them in the bio-books, but not on wikidata. I guess we should add them (?).
    image

  3. Four people don't have i-ort
    image

Sth for next week – good weekend all!

@salgo60
Copy link
Contributor

salgo60 commented Apr 15, 2023

@BobBorges are you using the latest file?

there are duplicate IDs -- more-or-less obviously refer to the same person, others perhaps refer to different people. How should we handle this?

The 5 books are written over time and one person can have more articles

  • when I use Open Refine and do find duplicates of Qnumbers used I get the following

Qnumber I Riksdagen kallad Förnamn Band1
Q4963023 Alm i Stockholm Ulla G 1:49
Q4963023 Alm-Lindström i Stockholm Ulla G 1:49
Q5548010 Alströmer i Östad Jonas 4:199
Q5548010 Alströmer i Östad C Thore J 4:199
Q6086540 Anderson i Råstock senare Björkbrotorp Anders 4:470
Q5554662 Andersson i Löbbo K Abel M 2:139
Q5554662 Andersson i Södergården senare Löbbo Gustav E 2:140
Q5578035 Andersson i Björkäng senare Lindesberg Karl A 4:471
Q6213695 Andersson i Torngård Oscar S 2:313
Q6224947 Andersson i Vigelsbo senare Mamre Sven J 1:422
Q5566043 Barnekow i Sörbytorp Adolf G 3:105
Q5566043 Barnekow i Sörbytorp K Ragnar M-D 3:106
Q5578035 Björkänge i Björkäng senare Lindesberg Karl A 4:471
Q6215663 Bonde i Trolleholm Carl J 3:198
Q5603467 Carlsson i Malmberget senare Frosterud o Skåre Lars Johan 4:393 5:410
Q5603467 Carlsson-Frosterud i Skåre Lars Johan 4:393 5:410
Q5777511 Casenberg i Kasenberg Arthur W 4:226
Q53626 De Geer af Finspång i Hanaskog Arvid L G 3:113
Q53626 De Geer af Finspång i Hanaskog senare Kristianstad G Louis 3:113
Q5887544 Dockered i Sjövik G Robert 4:241
Q5623378 Domö i Domö senare Eggby Mariestad o åter Eggby J Fritiof 4:320
Q4946074 Ekendahl i Stockholm Sigrid H E 1:82
Q5711044 Enander i Borås A Johan 4:220
Q5711044 Enander i Lillhärdal S Johan 5:294
Q5718128 Eriksson i Björka O August 5:58
Q4946074 Eriksson i Stockholm Sigrid H E 1:82
Q5718128 Ernfors i Björka O August 5:58
Q5394979 Eskhult i Eskhult Ernst W 4:354
Q5724702 Falla i Falla Gunnar 4:506
Q5744512 Friggeråker i Friggeråker Johan 4:239
Q5774246 Gränebo i Gränebo C Petrus V 2:274
Q4951427 Gunne i Göteborg A Stina M 4:167
Q5777511 Gustafson i Kasenberg Arthur W 4:226
Q5623378 Gustafson i Järpås senare Domö Eggby Mariestad o åter Eggby J Fritiof 4:320
Q16945247 Gustafsson i Borås J Axel T 4:226
Q16945247 Gustafsson i Jönköping senare Lindesberg T D M Axel 2:155
Q5886669 Gustafsson i Brånsta Lars Erik 4:486
Q5621134 Hallagård i Hallagården Gustav A 4:328
Q6082121 Hansson i Rubbestad senare Vrine Axel L 4:230
Q5798778 Hellbacken i Hällbacken Gustaf K 5:88
Q5820900 Högstedt i Hanåsa P A Julius 2:281
Q5820900 Jansson i Hanåsa P A Julius 2:281 - dublett
Q5621134 Johanson i Hallagården Gustav A 4:328
Q5744512 Johansson i Friggeråker Johan 4:239
Q6028693 Johansson i Onsjö senare Larv Johannes 4:329
Q5886669 Johansson i Brånsta John H 4:494
Q5888233 Johansson i Jönköping C Robert 2:164
Q5887544 Johansson i Dockered senare Sjövik G Robert 4:241
Q5888233 Johansson-Dahr i Jönköping C Robert 2:164
Q5653837 Johnsson i Skoglösa J Harald 3:126
Q5895976 Jönsson i Gärds Köpinge Carl 3:127
Q5895976 Jönsson i Gärds Köpinge Karl 3:127
Q5942265 Leander i Ystad Axel 3:231
Q5942265 Leander-Jönsson i Ystad Axel 3:231
Q5955465 Linder i Stockholm Nils 1:132
Q5955465 Linder i Stockholm senare Malmö F William 3:133
Q5955991 Lindgren i Örebro Adolf 4:499
Q5955991 Lindgren i Örebro Göran O V 4:499
Q111493181 Lindqvist-Pettersson i Östersund Anna E 5:305
Q4963023 Lindström i Stockholm Ulla G 1:49
Q4964561 Lundberg-Löfqvist i Mörrum Thyra C 3:65
Q4964561 Löfqvist i Mörrum Thyra C 3:65
Q6014609 Magnell i Kristinehamn Johan Jakob 4:421
Q6010824 Nilsson i Landskrona Carl J 3:247
Q6010824 Nilsson i Landskrona Karl J 3:247
Q5774246 Nilsson i Gränebo C Petrus V 2:274
Q6012727 Nisser i Falun Ernst M W 5:79
Q6012727 Nisser i Falun S Martin 5:79
Q6014609 Nordenfelt i Kristinehamn Carl H 4:427
Q6020574 Nyström i Stockholm Carl L H 1:153 4:131
Q6158007 Närlinge i Närlinge Carl Gustaf 1:326 5:82
Q6020574 Olauson i Västerås Daniel 1:447
Q6158007 Olsson i Norrhyttan senare Golvvasta o Närlinge Carl Gustaf 1:326 5:82
Q6187601 Olsson i Staxäng Ernst V 4:133
Q6028693 Onsjö i Onsjö senare Larv Johannes 4:329
Q6029305 Orgård i Undersvik senare Tomterna Per L 5:165
Q53701 Pehrsson i Bramstorp Axel A 3:255
Q53701 Pehrsson-Bramstorp i Bramstorp Axel A 3:255
Q5724702 Persson i Falla Gunnar 4:506
Q6029305 Persson i Undersvik senare Tomterna Per L 5:165
Q6045550 Petersson i Ugglekull A Fredrik 2:232
Q6045550 Petersson i Ugglekull Peter 2:232
Q5798778 Pettersson i Hällbacken Gustaf K 5:88
Q4974518 Renström i Kumla N Lena 4:508
Q4974518 Renström-Ingenäs i Kumla N Lena 4:508
Q111493181 Rönnebäck i Östersund Anna E 5:305
Q6082121 Rubbestad i Vrine Axel L 4:230
Q6086540 Råstock i Råstock senare Björkbrotorp Anders 4:470
Q5653837 Skoglösa i Skoglösa senare Önnestad J Harald 3:126
Q6187601 Staxäng i Staxäng Ernst V 4:133
Q6192954 Stråle af Ekna i Stockholm G Holdo 1:174
Q6192954 Stråle af Ekna i Stockholm Wilhelm 1:268
Q5394979 Svensson i Eskhult Ernst W 4:354
Q6213695 Tornegård Oscar S 2:313
Q6215663 Trolle-Bonde i Trolleholm Carl J 3:198
Q6215672 Trolle-Wachtmeister i Trolle-Ljungby Axel K 3:154
Q6215672 Wachtmeister af Johannishus i Trolle-Ljungby Axel K 3:154
Q4951427 Wallerius i Göteborg A Stina M 4:167
Q6224947 Vigelsbo i Mamre Sven J 1:422
Q4990681 Witzell i Karlshamn Hugo S 3:85
Q4990681 Wohlin i Stocksund Margit 1:277
Q6256173 Öberg i Domsjö Carl Jonas 5:269
Q6256173 Öberg i Domsjö Erik 5:269

@salgo60
Copy link
Contributor

salgo60 commented Apr 15, 2023

  • video using Open refine finding candidates for using same Qnumber for different people

My candidates for mistakes see video


Qnumber I Riksdagen kallad Förnamn Band
Q5548010 Alströmer i Östad Jonas 4:199 Wrong I guess Q5552501
Q5548010 Alströmer i Östad C Thore J 4:199 Correct
Q5554662 Andersson i Löbbo K Abel M 2:139 Wrong I guess Q5553810
Q5554662 Andersson i Södergården senare Löbbo Gustav E 2:140 Correct
Q5566043 Barnekow i Sörbytorp Adolf G 3:105 Wrong I guess Q5566027
Q5566043 Barnekow i Sörbytorp K Ragnar M-D 3:106 Correct
Q53626 De Geer af Finspång i Hanaskog Arvid L G 3:113 Wrong I guess Q5618613
Q53626 De Geer af Finspång i Hanaskog senare Kristianstad G Louis 3:113 Correct
Q5711044 Enander i Borås A Johan 4:220 Correct
Q5711044 Enander i Lillhärdal S Johan 5:294 Wrong I guess Q5711053
Q16945247 Gustafsson i Borås J Axel T 4:226 Correct
Q16945247 Gustafsson i Jönköping senare Lindesberg T D M Axel 2:155 Wrong I guess Q105086918
Q5886669 Gustafsson i Brånsta Lars Erik 4:486 Wrong I guess Q117714137
Q5886669 Johansson i Brånsta John H 4:494 Correct
Q5955465 Linder i Stockholm Nils 1:132 Correct
Q5955465 Linder i Stockholm senare Malmö F William 3:133 Wrong I guess Q5955520
Q5955991 Lindgren i Örebro Adolf 4:499
Q5955991 Lindgren i Örebro Göran O V 4:499
Q6014609 Magnell i Kristinehamn Johan Jakob 4:421
Q6012727 Nisser i Falun Ernst M W 5:79
Q6012727 Nisser i Falun S Martin 5:79
Q6014609 Nordenfelt i Kristinehamn Carl H 4:427
Q6020574 Nyström i Stockholm Carl L H 1:153 4:131
Q6020574 Olauson i Västerås Daniel 1:447
Q6045550 Petersson i Ugglekull A Fredrik 2:232
Q6045550 Petersson i Ugglekull Peter 2:232
Q6192954 Stråle af Ekna i Stockholm G Holdo 1:174
Q6192954 Stråle af Ekna i Stockholm Wilhelm 1:268
Q4990681 Witzell i Karlshamn Hugo S 3:85
Q4990681 Wohlin i Stocksund Margit 1:277
Q6256173 Öberg i Domsjö Carl Jonas 5:269
Q6256173 Öberg i Domsjö Erik 5:269

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 15, 2023

  1. Its not clear to me what you refer to as duplicates. Is it duplicates in Emil/Magnus test file or duplicate wikidata entries? To solve duplicate entries in wikidata (ie multiple wikidata ids for the same person), we need our own persistant ids in Make metadata IDs persistant #269 . What Magnus has identified is errors/duplicates of names that are incorrect. This should be checked (as Magnus has started to do). I open up a new issue to look into these errors in Emil/Magnus test file. @salgo60 , please move the list above over to that issue. Then we can work with that.
  2. I will separate that out as its own issue. Maybe something @emla5688 and @salgo60 could check?
  3. Is this a problem? Im not sure I follow? Do they have iort in the proceedings?

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 15, 2023

@BobBorges I updated the issue with a link to where files can be found (to check that you have the latest versions).

@BobBorges
Copy link
Collaborator

"Duplicate" for me was a wiki id used more than once where the name + iort strings didn't match. I didn't really check if there were duplicate identical rows... moving over to the other issues now.

@fredrik1984
Copy link
Collaborator

One thing regarding the correct number of MPs in parliament. For the unicameral Riksdag there were 350 MPs from 1971–1976, then 349 MPs. The speaker of the house (talmannen) is also an MP – but NOT included in the 349. Incapacity of the speaker of the house, he/she has a replacer in the Riksdag. The three vice speakers of the house are at the same time ordinary MPs – they can vote, but not participate in debates.

I am not sure how this worked during the Bicameral Riksdag.

@BobBorges
Copy link
Collaborator

I was also wondering how we should handle the substitutes – is it a role we want to track (in our metadata and on wikidata), or it's outside the scope? It came up because one guy I looked up yesterday was a sub for part of a period and I guess got promoted to RL. Same guy was a sub in the previous riksdagen year.

@fredrik1984
Copy link
Collaborator

Yes, the ordinary/replacer role of MP is an important part of the MP database. Before 1974 there was no decided replacer for each ordinary MP. Instead, if an MP left before the mandate period had ended a new one was selected, I think based on the previous election. Hence, if I have understood it correctly, the role of replacer is something that is only important for the period 1974–

In the bio books 1971–1993/94 the replacer role is specifically written out. If an MP sat shorter than about 240 days (a riksmöte) he och she did not get a bio.

@MansMeg

This comment was marked as resolved.

@fredrik1984

This comment was marked as resolved.

@BobBorges
Copy link
Collaborator

lot of chaos in this thread -- closing with remaining issue in next generation issue

swerik-project/riksdagen-persons#15
swerik-project/riksdagen-persons#14

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants