# Loading a PED file

Pedigree files come in multiple parts:
 * a DAT file, defining the phentypic and genotypic information contained in ...
 * a PED file, describing the family structure, and the geno/phenotypic information per individual
 * a MAP file, describing the locations of markers defined in the DAT file
 
For a semi-useful description, see http://csg.sph.umich.edu/abecasis/merlin/tour/input_files.html

In [1]:
import biu as biu

## Load a PED file, together with a DAT file
In the example file, there is a single family defined, and for each:
 * Affection status: 'some_disease'
 * Quantitative Trait: 'some_trait'
 * Marker: 'some_marker'
 * Marker: 'another_marker'

In [2]:
ped = biu.formats.PED('example_files/example_pedigree.ped', 'example_files/example_pedigree.dat')

In [3]:
print(ped)

PED object
 Where: example_files/example_pedigree.ped
 DAT file: example_files/example_pedigree.dat
 Families: 1
  Founders: 3
  Total: 6
 Features: 4
  Affections: 1
  Covariates: 0
  Traits: 1
  Markers: 2



## Accessing families in the pedigree

In [4]:
for familyID in ped.families:
    print(ped[familyID])

Pedigree Family
 Members: 6
 Founders: 3



## Accessing members of a family

In [5]:
family = ped["1"]
for memberID in family:
    print(family[memberID])
    print('\n')

Pedigree Individual
 Family ID: 1
 Individual ID: 1
 Mother/Father ID: None/None
 Gender: m
 Affection status:
  some_disease : 1


Pedigree Individual
 Family ID: 1
 Individual ID: 2
 Mother/Father ID: None/None
 Gender: f
 Affection status:
  some_disease : 1


Pedigree Individual
 Family ID: 1
 Individual ID: 3
 Mother/Father ID: None/None
 Gender: m
 Affection status:
  some_disease : 1


Pedigree Individual
 Family ID: 1
 Individual ID: 4
 Mother/Father ID: 2/1
 Gender: f
 Affection status:
  some_disease : 1


Pedigree Individual
 Family ID: 1
 Individual ID: 5
 Mother/Father ID: 4/3
 Gender: f
 Affection status:
  some_disease : 2


Pedigree Individual
 Family ID: 1
 Individual ID: 6
 Mother/Father ID: 4/3
 Gender: m
 Affection status:
  some_disease : 2




## Rename family members
You can rename family members to something that is easier to read.
If the member is a parent, the ID is adjusted also in the other members.

In [6]:
family.changeMemberID("4", "Family_Mother")

print(family["Family_Mother"])
print("\n")
print(family["5"])

Pedigree Individual
 Family ID: 1
 Individual ID: Family_Mother
 Mother/Father ID: 2/1
 Gender: f
 Affection status:
  some_disease : 1


Pedigree Individual
 Family ID: 1
 Individual ID: 5
 Mother/Father ID: Family_Mother/3
 Gender: f
 Affection status:
  some_disease : 2


## Delete a family member
You can remove family members from a family.
If the member is a parent, the reference to them is removed also in their children.

In [7]:
family.delMember("Family_Mother")

print(family["5"])

Pedigree Individual
 Family ID: 1
 Individual ID: 5
 Mother/Father ID: None/3
 Gender: f
 Affection status:
  some_disease : 2


## Adding a new family

In [8]:
# Add a new family
newFamily   = ped.newFamily("Best_Family")

# Populate the family
newFather   = newFamily.newMember("Father", None, None, 'm')
newMother   = newFamily.newMember("Mother", None, None, 'f')
# You can reference the parents with the IDs you provided in the definition
newSon      = newFamily.newMember("Son", "Father", "Mother", 'm')
# You can also provide the individual structures that came from the definition
newDaughter = newFamily.newMember("Daughter", newFather, newMother, 'f')
newSonInLaw = newFamily.newMember("SonInLaw", None, None, 'm')
newGrandSon = newFamily.newMember("GrandSon", newSonInLaw, newDaughter, 'm')

E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.
E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.
E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.
E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.
E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.
E: More DAT features than provided. Filling with unknown values. Verify your PED/DAT file.


## Accessing geno/phenotype information for a specific individual

In [9]:
member = family.members["6"]
print(member.features)

# Access the value
print(member.features["some_disease"])

# Set the value
member.setFeature("some_disease", 1)
print(member.features["some_disease"])

Genotype and Phenotype Object
 some_disease: 2
 some_trait: 4.321
 some_marker: 2/4
 another_marker: 2/2

2
1


### Accessing individual values

In [10]:
# Access the value
print(member.features["some_disease"])

# Set the value
member.setFeature("some_disease", 1)
print(member.features["some_disease"])

1
1


## Adding a new phenotype to all individuals

In [11]:
# Add the feature to all individuals (With missing value as default)
ped.addFeature("T", "Some_other_trait")
# Valid types:
# * T : Trait
# * C : Covariate
# * A : Affection status
# * M : Genetic Marker
# * S : Column to skip

# For a given family member...
member = family.members["6"]
print(member.features)

# Set the feature value.
member.setFeature("Some_other_trait", 123.5)
print(member.features)

Genotype and Phenotype Object
 some_disease: 1
 some_trait: 4.321
 some_marker: 2/4
 another_marker: 2/2
 Some_other_trait: X

Genotype and Phenotype Object
 some_disease: 1
 some_trait: 4.321
 some_marker: 2/4
 another_marker: 2/2
 Some_other_trait: 123.5



## Mask/unmask certain features
You can mask and unmask certain columns (In the written file, these will appear as 'S' columns.)

In [12]:
ped.maskFeature("some_trait")
ped.unmaskFeature("some_trait")

some_trait


## Select a subset of families
You can make a subset of specific families.
The entire structure will be deeply copied, resulting in an entirely independant structure.
Changes made to this structure do not affect the previous one.

In [13]:
# Make subset
subped = ped.subset(["1"])

# Change a value in the subset
subped["1"]["1"].setFeature("some_disease", 2)

# Check the values
print("Value in subped: ", subped["1"]["1"].getFeature("some_disease"))
print("Value in original ped:", ped["1"]["1"].getFeature("some_disease"))

Value in subped:  2
Value in original ped: 1


In [14]:
## 

## Writing PED files

Given a PED structure, you can write it to a file using the write command.
This will produce two files:
 * FILENAME.ped
 * FILENAME.dat

In [15]:
ped.write("testing.ped")
# Or you can specify both:
ped.write("testing.ped", "testing.dat")