Skip to content
yvancouver edited this page Nov 15, 2013 · 1 revision

How I did it

1. Create Working directory /Users/yvans/Home/workspace/amg/CoverageByTony 2. Need to adjust $PYTHONPATH to export PYTHONPATH=/Users/yvans/Home/workspace/amg ONLY!!! 3. Ask ENSEMBL file for the two refseq number for brca1 and 2 genes use in diagnostic file name save it in the working directory 4. Get the bed file
4.1. Go To uses table http://genome.ucsc.edu/cgi-bin/hgTables?command=start

4.2. Select <b>track:</b>RefSeq Genes

4.3. <b>output format:</b> BED - browser extensible date

4.4. Press get output

4.5. On the next page in the <b>Create one BED record per:</b> select Coding Exons

4.6. Click get BED    
4.7. Save the file in the working directory NM_000059_NM_007294.bed
  1. Run the code
    > python make_region_files.py --biomartfile ../CoverageByTony/mart_export_BRCA_08112013.xls --refgene ../CoverageByTony/NM_000059_NM_007294.bed
     Traceback (most recent call last):
    

File "make_region_files.py", line 220, in sys.exit(main()) File "make_region_files.py", line 190, in main rsName_chosenTranscript = load_biomart_file(f) File "make_region_files.py", line 33, in load_biomart_file eGeneID, eTranscriptID, txStart, txEnd = parts[0:11] ValueError: need more than 7 values to unpack

5.1. Need to change the the format of the file and according to Tony documentation should have these columns    

      `chromosome, ensemblGeneStart, ensemblGeneEnd, geneSymbol, refseq, strand, band, ensemblGeneID, ensemblTranscriptID, ensemblTxStart, ensemblTxend`

       Check in <file:///Users/yvans/Home/workspace/amg/scripts/tests/ensembl_biomart_export> for a correct format
    
        5.1.1. Clear formating, URL and others        
        5.1.2. Remove
                    Ensembl Genes 73
                    Homo sapiens genes GRCh37.p12
        5.1.3. Save as csv, but change field delimiter to **Tabs** and no Text delimiters
    
5.2. Again:
        5.1.1 
            > python make_region_files.py --biomartfile ../CoverageByTony/mart_export_BRCA_08112013.csv --refgene ../CoverageByTony/NM_000059_NM_007294.bed
Traceback (most recent call last):
  File "make_region_files.py", line 249, in 
    sys.exit(main())
  File "make_region_files.py", line 210, in main
    refseqs = refseq.load_refseqs_from_UCSC_refGene_file(refGenePath)
  File "/Users/yvans/Home/workspace/amg/annotation/gene/refseq.py", line 138, in load_refseqs_from_UCSC_refGene_file
    bin, name, chromosome, strand, txStart, txEnd, cdsStart, cdsEnd, exonCount, exonStarts, exonEnds, idpart, name2, cdsStartStat, cdsEndStat, exonFrames = line.strip().split()
ValueError: need more than 6 values to unpack
        5.2.1 Need to reformat the bed file, redonwload i file but this time with <b>all fields from selected table</b> selected and save it as uscs_refseq_export 
    5.3 Again
        5.3.1
yvans@macus40:~/Home/workspace/amg/CoverageByTony
> python ../scripts/make_region_files.py --biomartfile mart_export_BRCA_08112013.csv --refgene NM_000059_NM_007294_uscs_export --genepanel BR
From 2 refseqNames, found total of 2 records in refGene. 0 names have duplicates.
transcripts.csv and region files written to path '.'
        WORKS!!!            
  1. again
  2. again

Clone this wiki locally