# Recreating the Mathieson et al. (2015) dataset analyses
In their analyses, Mathieson et al. place 230 ancient Eurasian samples on the west-eurasian subset of the Human Origins (NearEastPublic) dataset.

Mathieson, I., Lazaridis, I., Rohland, N. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015). https://doi.org/10.1038/nature16152

**Note: run the NearEastPublic.ipynb notebook first as we need the west-eurasian dataset for these analyses**

In [1]:
! wget https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/MathiesonEtAl_genotypes_April2016.tar.gz
! tar -xf MathiesonEtAl_genotypes_April2016.tar.gz && rm MathiesonEtAl_genotypes_April2016.tar.gz
! cd MathiesonEtAl_genotypes && rm -r mtgens && mkdir westEurasian_ancient

In [1]:
from filter_merge_utils import *

base_dir = pathlib.Path("MathiesonEtAl_genotypes")
west_eurasia_prefix = pathlib.Path("NearEastPublic") / "westEurasia" / "HumanOriginsPublic2068.westEurasian"
ancient_prefix = base_dir / "full230"

### Merging the west-eurasian samples with ancient samples

In [17]:
merged_prefix = base_dir / "westEurasian_ancient" / "Mathieson230_ModernWestEurasia"

# The ancient samples contain some samples we want to exclude prior to PCA (e.g. Chimp sequences)
merge_datasets(
    prefix_ds1=west_eurasia_prefix,
    prefix_ds2=ancient_prefix,
    prefix_out=merged_prefix,
    redo=False
)

ancient_ind_file = pathlib.Path(f"{ancient_prefix}.ind")
ancient_ind_data = indfile_to_dataframe(ancient_ind_file)
ancient_populations = ancient_ind_data.population.unique().tolist()

# finally, save the population names of the modern samples in a specific file such that we can later use it for the PCA projection
ind_file = pathlib.Path(f"{merged_prefix}.ind")
ind_df = indfile_to_dataframe(ind_file)

modern = [p for p in ind_df.population.unique() if p not in ancient_populations]
modern_populations = base_dir / "westEurasian_ancient" / "modern.poplist.txt"
modern_populations.open("w").write("\n".join(modern))