# Genomic Data of 2155 Dogs

For our third empirical study, we use genomic data of 2155 dogs published by Morrill et al. (2022; https://doi.org/10.1126/science.abk0639).

To run this notebook, you first need to download the supplementary data from the Dryad repository: https://doi.org/10.5061/dryad.g4f4qrfr0.

You only need to download the `DarwinsArk.zip` and `GeneticData.zip` directories, move them into a new directory called `DogGenomes` and unpack them there.

Using the scripts provided under DOI https://doi.org/10.5281/zenodo.5808329, we first filtered the dogs for dogs with genetic data. We then extracted two groups of dogs:
- dogs with confirmed purebred status (n = 601) -> `confirmed_purebred`
- mutt dogs (n = 1200) -> `mutt`

Using PLINK, we extracted the respective genetic data using the IDs of the dogs in the respective group and converted the data to EIGENSTRAT format.

Note that we did not apply additional LD pruning or MAF filtering, as the provided data was already filtered by the authors of the study.

In [1]:
import pandas as pd
import pathlib

base_dir = pathlib.Path("DogGenomes")
darwins_ark = base_dir / "DarwinsArk"
genetic_data = base_dir / "GeneticData"

In [4]:
# Filtering according to the scripts provided under DOI https://doi.org/10.5281/zenodo.5808329
dogs = pd.read_csv(darwins_ark / "DarwinsArk_20191115_dogs.csv")
answers = pd.read_csv(darwins_ark / "DarwinsArk_20191115_answers.csv")
breedcalls = pd.read_csv(darwins_ark / "DarwinsArk_20191115_breedcalls.csv")

dogs_surveyed = answers.dog.unique()
dogs_filtered = dogs.loc[dogs.id.isin(dogs_surveyed) | dogs.id.isin(breedcalls.dog)].copy()
dogs_filtered["surveyed"] = dogs_filtered.id.isin(dogs_surveyed)
dogs_filtered["candidate_purebred"] = dogs_filtered.cand & dogs_filtered.surveyed
dogs_filtered["confirmed_purebred"] = dogs_filtered.conf & dogs_filtered.surveyed
dogs_filtered["mutt"] = dogs_filtered.mutt & dogs_filtered.surveyed
dogs_to_use = dogs_filtered.loc[lambda x: x.id.isin(breedcalls.dog.unique())]
dogs_to_use

Unnamed: 0,id,sex,sterilized,birth_date,flagged_deceased_date,region,environ,origin,size,breed1,...,owner_label,responses,response_rate,mutt,cand,conf,consensus_breed,surveyed,candidate_purebred,confirmed_purebred
0,3,female,yes,2010-10-28,,Northeast (New England),suburban,breeder,1.0,boston terrier,...,,118.0,1.000000,False,True,True,boston terrier,True,True,True
5,8,male,yes,2014-06-17,,Northeast (New England),urban,breeder,2.0,shiba inu,...,,118.0,1.000000,False,True,True,shiba inu,True,True,True
11,14,female,yes,2014-01-01,,Northeast (New England),urban,rescue,3.0,,...,,116.0,0.983051,True,False,False,,True,False,False
16,19,male,yes,2011-03-02,,Northeast (New England),suburban,breeder,,leonberger,...,,,,False,True,True,leonberger,False,False,False
20,23,male,yes,2011-10-15,,Northeast (New England),urban,rescue,1.0,jack russell terrier,...,,117.0,0.991525,True,False,False,,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23158,23577,female,yes,2019-05-20,,Midwest (East North Central),urban,rescue,,,...,,118.0,1.000000,True,False,False,,True,False,False
23169,23588,female,yes,2018-05-23,,West (Pacific),urban,rescue,2.0,,...,,118.0,1.000000,True,False,False,,True,False,False
23400,23822,male,yes,2016-06-30,,,,,,poodle,...,,,,False,True,True,poodle,True,True,True
23502,23925,female,yes,2008-01-01,,,,,,boxer,...,,,,False,False,False,,False,False,False
