Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Check duplicates in the MP test file #276

Closed
MansMeg opened this issue Apr 15, 2023 · 6 comments
Closed

Check duplicates in the MP test file #276

MansMeg opened this issue Apr 15, 2023 · 6 comments
Assignees

Comments

@MansMeg
Copy link
Collaborator

MansMeg commented Apr 15, 2023

In #265 we have identified that there are duplicate entries in the test files. @salgo60 has started to look at this in #265. We need to double-check that these duplicates are correct.

@BobBorges
Copy link
Collaborator

BobBorges commented Apr 15, 2023

Duplicates, in the sense of multiple identical rows isn't a problem, but I found some wiki_ids that don't have identical name strings. these are:

  • Q6215672 (Axel K Wachtmeister af Johannishus i Trolle-Ljungby)
  • -- Q6215672 (Axel K Trolle-Wachtmeister i Trolle-Ljungby)
  • -- Q6215672 (Axel K Wachtmeister af Johannishus i Trolle-Ljungby)
  • Q4964561 (Thyra C Löfqvist i Mörrum)
  • -- Q4964561 (Thyra C Lundberg-Löfqvist i Mörrum)
  • -- Q4964561 (Thyra C Löfqvist i Mörrum)
  • Q6082121 (Axel L Rubbestad i Vrine)
  • -- Q6082121 (Axel L Hansson i Rubbestad senare Vrine)
  • -- Q6082121 (Axel L Rubbestad i Vrine)
  • Q724410 (L Gabriel Romanus i Sollentuna)
  • -- Q724410 (L Gabriel Romanus i Rotebro)
  • -- Q724410 (L Gabriel Romanus i Sollentuna)
  • Q4981208 (Ingrid E Sundberg i Bjuv, senare Skanör)
  • -- Q4981208 (Ingrid E Sundberg i Bjuv)
  • -- Q4981208 (Ingrid E Sundberg i Bjuv, senare Skanör)
  • Q4974616 (Anne Sörensen i Stockholm)
  • -- Q4974616 (Anne Rhenman i Stockholm)
  • -- Q4974616 (Anne Sörensen i Stockholm)
  • Q6340136 (Jan- Erik Wikström i Stockholm, senare Österskär)
  • -- Q6340136 (Jan-Erik Wikström i Stockholm)
  • -- Q6340136 (Jan- Erik Wikström i Stockholm, senare Österskär)
  • Q6250583 (Bertil J Zachrisson i Spånga, senare Kista)
  • -- Q6250583 (Bertil J Zachrisson i Bromma)
  • -- Q6250583 (Bertil J Zachrisson i Spånga, senare Kista)
  • Q5711044 (S Johan Enander i Lillhärdal)
  • -- Q5711044 (A Johan Enander i Borås)
  • -- Q5711044 (S Johan Enander i Lillhärdal)
  • Q5955465 (F William Linder i Stockholm senare Malmö)
  • -- Q5955465 (Nils Linder i Stockholm)
  • -- Q5955465 (F William Linder i Stockholm senare Malmö)
  • Q6012727 (S Martin Nisser i Falun)
  • -- Q6012727 (Ernst M W Nisser i Falun)
  • -- Q6012727 (S Martin Nisser i Falun)
  • Q6014609 (Carl H Nordenfelt i Kristinehamn)
  • -- Q6014609 (Johan Jakob Magnell i Kristinehamn)
  • -- Q6014609 (Carl H Nordenfelt i Kristinehamn)
  • Q6192954 (Wilhelm Stråle af Ekna i Stockholm)
  • -- Q6192954 (G Holdo Stråle af Ekna i Stockholm)
  • -- Q6192954 (Wilhelm Stråle af Ekna i Stockholm)
  • Q6215663 (Carl J Trolle-Bonde i Trolleholm)
  • -- Q6215663 (Carl J Bonde i Trolleholm)
  • -- Q6215663 (Carl J Trolle-Bonde i Trolleholm)

Now there is a file at input/known_MPs/catalog.csv (on branch mpqc), which is a joined / cleaned version of the Emil / Magnus list. In the case when the wiki_id in the above list refers to the same individual, one of the offending lines can just be removed from the csv. When there are multiples, they have to be sorted out some other way.

@BobBorges
Copy link
Collaborator

@salgo60, I got two lists from Emil yesterday, which have been compiled into the catalog.csv file mentioned above. I assumed these files were up to date.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 15, 2023

@BobBorges I have now opened a PR for your work. Please move the files to corpus/quality_assesment/known_mp instead. I have a README there as well with info.

We should store data there that are used for auality control.

@BobBorges
Copy link
Collaborator

OK. Let me move the catalog.csv file to where it ought to be and update path info in tests/db.py before you merge.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Apr 15, 2023

I think we need to fix the stuff that fails before we merge (main branch tests should always pass). But it is nice to keep the work in a PR so we can see the changes and discuss the code.

@BobBorges
Copy link
Collaborator

Closed as duplicate of #316

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants