Skip to content

Genome diversity

Haibao Tang edited this page Jul 1, 2024 · 5 revisions

We have included a suite of tools including pedigree analysis and variation between varieties. These tools can be useful in re-sequencing projects aiming at the study of genome diversity.

Tip

Download the test dataset here.

Pedigree analysis

One basic analysis is to visualize pedigrees between varieties that illustrate breeding history. The pedigree information can be encoded in a standard .ped file.

#Family ID	Individual ID	Paternal ID	Maternal ID	Sex (1=male; 2=female; other=unknown)	Phenotype	
F001	Variety10	Variety11	Variety12	0	0
F001	Variety8	Variety9	Variety10	0	0
F001	Variety7	Variety9	Variety9	0	0
F001	Variety4	Variety7	Variety8	0	0
F001	Variety2	Variety6	Variety4	0	0
F001	Variety3	Variety4	Variety5	0	0
F001	Variety1	Variety2	Variety3	0	0

We can then easily visualize it.

python -m jcvi.compara.pedigree pedigree pedigree.ped --ploidy=8 --N 10000 \
    --title "Pedigree of Variety1"

pedigree.ped.png

The root nodes (nodes with no parent information) are assumed to be outcrossing. We can then estimate the parentage in the form of piecharts colored by the root nodes. The inbreeding coefficients ($F$) can also be estimated where there is inbreeding.

CNV between varieties

In resequencing projects, it is often useful to visualize the copy-number variations (CNV) between varieties. Let's assume that the lines/varieties have been sequenced and mapped. We can use mosdepth to compute the depth tiled along each of the resequenced genomes, with a binsize of 1000000.

mosdepth --by 1000000 VAR1_srtd.wgs VAR1_srtd.wgs.bam
mosdepth --by 1000000 VAR2_srtd.wgs VAR2_srtd.wgs.bam
mosdepth --by 1000000 VAR3_srtd.wgs VAR3_srtd.wgs.bam

These commands will generated 3 BED files suitable for CNV plotting - VAR1_srtd.wgs.regions.bed.gz etc.

python -m jcvi.graphics.landscape depth \
    VAR?_srtd.wgs.regions.bed.gz \
    --chrinfo chrinfo.txt \
    --titleinfo titleinfo.txt

We can further customize the color schemes of chromosomes in chrinfo.txt.

chr01A, #c51b7d, 1A
chr01B, #4d9221, 1B
chr02A, #c51b7d, 2A
chr02B, #4d9221, 2B
chr03A, #c51b7d, 3A
chr03B, #4d9221, 3B

The titles of each resequencing panels can be customized in titleinfo.txt.

VAR0_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 1’
VAR1_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 2’
VAR2_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 3’

Finally, with all the plotting elements configured, we can visualize the CNV in 3 varieties.

VAR.png

It is interesting to see there are reduced depth ("deletions") on 3A and 5B, which may be investigated further.