Skip to content

creation_of_region_files_iktyose_v2

yvan edited this page Oct 23, 2014 · 1 revision

Creation of region files Iktyose V2

  1. Create a working file on Yvan's machine

  2. Download new file from Tove (see mail Sept.12)

    cd /Volumes/Analysis/genePanel/IktioseV2

    mv ~/Downloads/mart_Iktyose\ v2.0_export.xls ./

  3. open, save as csv (tab separated) and remove the first 5 lines

    mart_Iktyose v2.0_export.csv

  4. substitute end of line by carriage return

    tr '\r' '\n' < mart_Iktyose\ v2.0_export.csv > mart_Iktyose_v2.0_export.edited.csv

  5. Run the python script make_region_files.py

    python /Users/yvans/Home/workspace/amg/scripts/make_region_files.py \
     --biomartfile=./mart_Iktyose_v2.0_export.edited.csv
     --refgene=/Volumes/Analysis/dataDistro_r01_d01_LocalCopy/b37/funcAnnot/refSeq/refGene_131119.tab \
     --outputdir=./ \
     --genepanel=Iktyosis \
     --version=02 \
     --dict=/Volumes/Analysis/dataDistro_r01_d01_LocalCopy/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.dict  
     
  6. Place files in gene panel directory. Copy the bed files , make a copy called Ensembl_biomart_export.xls from the original file (mart_Iktyose v2.0_export.xls) and create a symlink from Iktyosis_OUS_medGen_v02_b37.codingExons.slop2.bed to coverageRegions.bed then copy it to the clinical gene panel folder.

    mkdir ~/Home/workspace/amg/clinicalGenePanels/Iktyose_OUS_medGen_v02_b37/

    cp Iktyosis_OUS_medGen_v02_b37.* ~/Home/workspace/amg/clinicalGenePanels/Iktyose_OUS_medGen_v02_b37/

    cp mart_Iktyose\ v2.0_export.xls ~/Home/workspace/amg/clinicalGenePanels/Iktyose_OUS_medGen_v02_b37/Ensembl_biomart_export.xls

    ln -s Iktyosis_OUS_medGen_v02_b37.codingExons.slop2.bed coverageRegions.bed

    cp coverageRegions.bed ~/Home/workspace/amg/clinicalGenePanels/Iktyose_OUS_medGen_v02_b37/

Warnings

/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_016006 is far away from Ensembl txStart/stop (distances 13 and 3676)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_145068 is far away from Ensembl txStart/stop (distances 0 and 3344)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_006579 is far away from Ensembl txStart/stop (distances 618 and 0)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_000400 is far away from Ensembl txStart/stop (distances 31 and 1554)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_006783 is far away from Ensembl txStart/stop (distances 255 and 9)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_021978 is far away from Ensembl txStart/stop (distances 225 and 14)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
/Users/yvans/Home/workspace/amg/scripts/make_region_files.py:91: UserWarning: Note that Refseq NM_001942 is far away from Ensembl txStart/stop (distances 0 and 401)  
  warnings.warn("Note that Refseq {} is far away from Ensembl txStart/stop (distances {} and {})".format(rsName, abs(rs.start-txStart), abs(rs.stop - txEnd)))  
From 40 refseqNames, found total of 40 records in refGene. 0 names have duplicates.
transcripts.csv and region files written to path './'

Clone this wiki locally