-
Notifications
You must be signed in to change notification settings - Fork 0
coverage by Tony
yvancouver edited this page Nov 15, 2013
·
1 revision
1. Create Working directory /Users/yvans/Home/workspace/amg/CoverageByTony
2. Need to adjust $PYTHONPATH to export PYTHONPATH=/Users/yvans/Home/workspace/amg ONLY!!!
3. Ask ENSEMBL file for the two refseq number for brca1 and 2 genes use in diagnostic file name save it in the working directory
4. Get the bed file
4.1. Go To uses table http://genome.ucsc.edu/cgi-bin/hgTables?command=start
4.2. Select <b>track:</b>RefSeq Genes
4.3. <b>output format:</b> BED - browser extensible date
4.4. Press get output
4.5. On the next page in the <b>Create one BED record per:</b> select Coding Exons
4.6. Click get BED
4.7. Save the file in the working directory NM_000059_NM_007294.bed
- Run the code
> python make_region_files.py --biomartfile ../CoverageByTony/mart_export_BRCA_08112013.xls --refgene ../CoverageByTony/NM_000059_NM_007294.bed Traceback (most recent call last):
File "make_region_files.py", line 220, in sys.exit(main()) File "make_region_files.py", line 190, in main rsName_chosenTranscript = load_biomart_file(f) File "make_region_files.py", line 33, in load_biomart_file eGeneID, eTranscriptID, txStart, txEnd = parts[0:11] ValueError: need more than 7 values to unpack
5.1. Need to change the the format of the file and according to Tony documentation should have these columns
`chromosome, ensemblGeneStart, ensemblGeneEnd, geneSymbol, refseq, strand, band, ensemblGeneID, ensemblTranscriptID, ensemblTxStart, ensemblTxend`
Check in <file:///Users/yvans/Home/workspace/amg/scripts/tests/ensembl_biomart_export> for a correct format
5.1.1. Clear formating, URL and others
5.1.2. Remove
Ensembl Genes 73
Homo sapiens genes GRCh37.p12
5.1.3. Save as csv, but change field delimiter to **Tabs** and no Text delimiters
5.2. Again:
5.1.1
> python make_region_files.py --biomartfile ../CoverageByTony/mart_export_BRCA_08112013.csv --refgene ../CoverageByTony/NM_000059_NM_007294.bed
Traceback (most recent call last):
File "make_region_files.py", line 249, in
sys.exit(main())
File "make_region_files.py", line 210, in main
refseqs = refseq.load_refseqs_from_UCSC_refGene_file(refGenePath)
File "/Users/yvans/Home/workspace/amg/annotation/gene/refseq.py", line 138, in load_refseqs_from_UCSC_refGene_file
bin, name, chromosome, strand, txStart, txEnd, cdsStart, cdsEnd, exonCount, exonStarts, exonEnds, idpart, name2, cdsStartStat, cdsEndStat, exonFrames = line.strip().split()
ValueError: need more than 6 values to unpack
5.2.1 Need to reformat the bed file, redonwload i file but this time with <b>all fields from selected table</b> selected and save it as uscs_refseq_export
5.3 Again
5.3.1
yvans@macus40:~/Home/workspace/amg/CoverageByTony > python ../scripts/make_region_files.py --biomartfile mart_export_BRCA_08112013.csv --refgene NM_000059_NM_007294_uscs_export --genepanel BR From 2 refseqNames, found total of 2 records in refGene. 0 names have duplicates. transcripts.csv and region files written to path '.'
WORKS!!!
- again
- again