# Mapping 

## 1. Mapping single & paired reads with `kallisto` using bash scripts on slurm
* adapted Maxim's scripts:
* created symlinks: ` ln -s /path/to/file /path/to/symlink`
* directory of indexes: `/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/indexes/kallisto`
* created indexes using: - with a k-mer length of 31
    - triticum: `kallisto index -i /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/indexes/kallisto/triticum_transcritpt.idx transcript.fasta`
    - hordeum: `kallisto index -i /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/indexes/kallisto/hordeum_transcript.idx transcript.fasta`
* location of kallisto tool: `/home/pgsb/vanda.marosi/anaconda3/envs/seqtools/bin/kallisto`
* bash scripts are available: `~/scripts/triticum OR hordeum/run_04_..._kallisto_single OR paired.sh`

## 1.1 Location of Ref.Genomes
* all of them are symlinked under: `/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/genomes/`
`./Horvu/CDS.fasta:49281
./Horvu/protein.fasta:46294
./Horvu/transcript.fasta:49281
./Horvu/genome.fasta:8
./Triae/protein.fasta:122722
./Triae/genome.fasta:22
./Triae/transcript.fasta:123075
./Triae/CDS.fasta:122722`

* as an example Daniel's snakemake kallisto pipeline: https://ibis-gitlab.helmholtz-muenchen.de/daniel.lang/kallisto/-/tree/master

## 1.2 After mapping
* the default result from `kallisto` for each sample is 3 files: 
    - `abundance.h5` - binary file
    - `abundance.tsv` - tab separated file: `target_id, length, eff_length, est_counts, tpm`
    - `run_info.json` - inf about run command, time, tool version etc.
    - kallisto created a normalized table of the transcripts, which can be used for PCA already
    - but to divide the transcripts into real genes `readr` and `tximport` packages has to be used to combine my results with annotation.gff files and make a real count-table
    - count table can be log-transformed, and further statistical validations, PCAs using `r-sleuth`, `DeSeq2`, `HR`
* manuals & example guidelines:
    - tximport: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto

# 2. Map transcripts to genes and create count-tables
* here first only paired OR single reads were converted into a count-table and later was their merged table made

In [2]:
library(rhdf5, warn.conflicts = FALSE)
library(readr, warn.conflicts = FALSE)
library(tximport, warn.conflicts = FALSE)
library(GenomicFeatures, warn.conflicts = FALSE)
# do not call tidyverse here!!! its `select()` interferes with tximport!
# call tidyverse only after creating all necessary tx2gene objects

## Triticum paired

In [3]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/")
trit_paired <- read.table("wheat_project_table_trimmomatic_paired.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
str(trit_paired)

'data.frame':	180 obs. of  3 variables:
 $ ID          : chr  "SRR10737427" "SRR10737428" "SRR10737429" "SRR10737430" ...
 $ dataset_name: chr  "pistillody of stamen" "pistillody of stamen" "pistillody of stamen" "pistillody of stamen" ...
 $ tissue      : chr  "anther" "anther" "anther" "anther" ...


In [4]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_paired")
files_tp <- file.path(trit_paired$ID, "abundance.h5")
names(files_tp) <- paste0(trit_paired$ID)
head(files_tp)
all(file.exists(files_tp))

Transcripts need to be associated with gene IDs for gene-level summarization:
1. We first make a data.frame called tx2gene with two columns: 
    - 1) transcript ID and 
    - 2) gene ID. 
2. The column names do not matter but this column order must be used. The transcript ID must be the same one used in the abundance files.

In [5]:
# create tx2gene data.frame
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/genomes/Triae/")
txdb_trit <- makeTxDbFromGFF("annotation.gff3", organism = "Triticum aestivum")
#makeTxDbFromGFF(file, format=c("auto", "gff3", "gtf"), dataSource=NA, organism=NA, taxonomyId=NA, circ_seqs=DEFAULT_CIRC_SEQS, chrominfo=NULL, miRBaseBuild=NA, dbxrefTag)
str(txdb_trit)

Import genomic features from the file as a GRanges object ... 
OK

Prepare the 'metadata' data frame ... 
OK

Make the TxDb object ... 
OK



Reference class 'TxDb' [package "GenomicFeatures"] with 5 fields
 $ conn           :Formal class 'SQLiteConnection' [package "RSQLite"] with 7 slots
  .. ..@ ptr                :<externalptr> 
  .. ..@ dbname             : chr ""
  .. ..@ loadable.extensions: logi TRUE
  .. ..@ flags              : int 70
  .. ..@ vfs                : chr ""
  .. ..@ ref                :<environment: 0x55c1e4dc3360> 
  .. ..@ bigint             : chr "integer64"
 $ packageName    : chr(0) 
 $ user_seqlevels : chr [1:22] "chr1A" "chr1B" "chr2A" "chr2B" ...
 $ user2seqlevels0: int [1:22] 1 2 3 4 5 6 7 8 9 10 ...
 $ isActiveSeq    : logi [1:22] TRUE TRUE TRUE TRUE TRUE TRUE ...
 and 16 methods, of which 2 are  possibly relevant:
   finalize, initialize


In [6]:
k_trit <- keys(txdb_trit, keytype = "TXNAME")
tx2gene_trit <- select(txdb_trit, k_trit, "GENEID", "TXNAME")
head(tx2gene_trit)

'select()' returned 1:1 mapping between keys and columns



Unnamed: 0_level_0,TXNAME,GENEID
Unnamed: 0_level_1,<chr>,<chr>
1,TraesCHI1A01G000600.1,TraesCHI1A01G000600
2,TraesCHI1A01G000800.1,TraesCHI1A01G000800
3,TraesCHI1A01G001100.1,TraesCHI1A01G001100
4,TraesCHI1A01G001400.1,TraesCHI1A01G001400
5,TraesCHI1A01G001500.1,TraesCHI1A01G001500
6,TraesCHI1A01G001700.1,TraesCHI1A01G001700


transcripts can be converted into count-table without assigning them to genes: 
* `txi = tximport(files, type = "kallisto", txOut = TRUE)`
* `head(txi$counts)`

In [58]:
# create count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_paired/")
txi_trit_paired <- tximport(files_tp, type = "kallisto", tx2gene = tx2gene_trit)
head(txi_trit_paired$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 


summarizing abundance

summarizing counts

summarizing length

summarizing inferential replicates



Unnamed: 0,SRR10737427,SRR10737428,SRR10737429,SRR10737430,SRR10737431,SRR10737432,CRR088963,CRR088962,CRR088961,CRR088960,⋯,SRR8413505,SRR8413506,SRR8413507,SRR8413508,SRR5186313,SRR5186364,SRR5186375,SRR5186382,SRR5186387,SRR5186416
TraesCHI1A01G000100,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000200,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000300,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,6.0,7.878112,14.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000400,0.0,0.0,0.0,4.893036,4.726423,1.938802,0.0,0.0,0.0,0.0,⋯,140.5432,108.839845,177.3651,112.3966,1.993108,0.0,0.0,0.0,2.004102,0.0
TraesCHI1A01G000500,1.0,8.0,7.0,9.0,2.0,3.0,1.0,8.0,7.0,9.0,⋯,5.0,11.0,12.0,5.0,1.0,0.0,2.0,0.0,0.0,0.0
TraesCHI1A01G000600,193.3952,2.465932,159.4838,51.088736,8.853329,80.785644,193.3877,2.465675,159.5168,126.3058,⋯,387.2507,513.135328,458.2691,355.3096,0.0,2.649372,1.307648,4.297154,4.083578,1.356089


In [77]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/")
write.table(txi_trit_paired, file = "wheat_count_table_paired.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = TRUE, col.names = TRUE)
saveRDS(txi_trit_paired, file = "wheat_count_table_paired.rds")

### Triticum single

In [60]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/")
trit_single <- read.table("wheat_project_table_trimmomatic_single.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
glimpse(trit_single)

Rows: 35
Columns: 3
$ ID           [3m[90m<chr>[39m[23m "CRR078059", "CRR078085", "CRR078084", "CRR078083", "CRR…
$ dataset_name [3m[90m<chr>[39m[23m "tf q", "tf q", "tf q", "tf q", "tf q", "tf q", "tf q", …
$ tissue       [3m[90m<chr>[39m[23m "spike", "spike", "spike", "spike", "spike", "spike", "s…


In [61]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_single")
files_ts <- file.path(trit_single$ID, "abundance.h5")
names(files_ts) <- paste0(trit_single$ID)
head(files_ts)
all(file.exists(files_ts))

In [62]:
# create count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_single/")
txi_trit_single <- tximport(files_ts, type = "kallisto", tx2gene = tx2gene_trit)
head(txi_trit_single$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 


summarizing abundance

summarizing counts

summarizing length

summarizing inferential replicates



Unnamed: 0,CRR078059,CRR078085,CRR078084,CRR078083,CRR078082,CRR078081,CRR078080,CRR078079,CRR078078,CRR078077,⋯,CRR078061,CRR078060,SRR5464524,SRR5464523,SRR5464520,SRR5464519,SRR5464518,SRR5464515,SRR5464508,SRR5464507
TraesCHI1A01G000100,0.0,0.0,1.075752,0.0,0.0,1.188353,0.0,0.0,0.0,0.0,⋯,0.0,0,637.8077,466.84146,616.33167,665.257416,722.3597,831.6903,750.68504,474.0
TraesCHI1A01G000200,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,⋯,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000300,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,⋯,0.0,0,24.0,13.0,8.0,14.926194,4.0,6.0,4.0,3.0
TraesCHI1A01G000400,13.85653,4.800697,10.421303,8.036155,5.824054,14.05039,8.947824,3.314993,7.413128,7.000971,⋯,9.170686,4,53.8823,31.90793,98.87497,120.65211,109.353,119.754,112.31362,88.60057
TraesCHI1A01G000500,0.0,0.0,0.0,0.0,0.0,0.0,2.028355,2.042307,0.0,0.0,⋯,0.0,0,0.0,0.0,0.0,2.073806,0.0,0.0,0.0,0.0
TraesCHI1A01G000600,0.0,0.0,0.0,0.0,0.0,4.247404e-08,0.0,0.0,0.0,0.0,⋯,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,14.29808,0.0


In [78]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/")
write.table(txi_trit_single, file = "wheat_count_table_single.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = TRUE, col.names = TRUE)
saveRDS(txi_trit_single, file = "wheat_count_table_single.rds")

### Hordeum paired

In [66]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/")
hord_paired <- read.table("barley_project_table_trimmomatic_paired.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
glimpse(hord_paired)

Rows: 193
Columns: 3
$ ID           [3m[90m<chr>[39m[23m "ERR1248084", "ERR1248085", "ERR1248086", "ERR1248087", …
$ dataset_name [3m[90m<chr>[39m[23m "ref dataset drought", "ref dataset drought", "ref datas…
$ tissue       [3m[90m<chr>[39m[23m "spike", "spike", "spike", "spike", "spike", "spike", "l…


In [67]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_paired")
files_hp <- file.path(hord_paired$ID, "abundance.h5")
names(files_hp) <- paste0(hord_paired$ID)
head(files_hp)
all(file.exists(files_hp))

In [7]:
# create tx2gene data.frame
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/genomes/Horvu/")
txdb_hord <- makeTxDbFromGFF("annotation.gff3", organism = "Hordeum vulgare")
#makeTxDbFromGFF(file, format=c("auto", "gff3", "gtf"), dataSource=NA, organism=NA, taxonomyId=NA, circ_seqs=DEFAULT_CIRC_SEQS, chrominfo=NULL, miRBaseBuild=NA, dbxrefTag)
glimpse(txdb_hord)

Import genomic features from the file as a GRanges object ... 
OK

Prepare the 'metadata' data frame ... 
OK

Make the TxDb object ... 
OK



ERROR: Error in glimpse(txdb_hord): could not find function "glimpse"


In [8]:
k_hord <- keys(txdb_hord, keytype = "TXNAME")
tx2gene_hord <- select(txdb_hord, k_hord, "GENEID", "TXNAME")
head(tx2gene_hord)

'select()' returned 1:1 mapping between keys and columns



Unnamed: 0_level_0,TXNAME,GENEID
Unnamed: 0_level_1,<chr>,<chr>
1,Horvu_MOREX_1H01G000100.1,Horvu_MOREX_1H01G000100
2,Horvu_MOREX_1H01G000200.1,Horvu_MOREX_1H01G000200
3,Horvu_MOREX_1H01G000300.1,Horvu_MOREX_1H01G000300
4,Horvu_MOREX_1H01G000700.1,Horvu_MOREX_1H01G000700
5,Horvu_MOREX_1H01G000800.1,Horvu_MOREX_1H01G000800
6,Horvu_MOREX_1H01G001400.1,Horvu_MOREX_1H01G001400


In [70]:
# create count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_paired/")
txi_hord_paired <- tximport(files_hp, type = "kallisto", tx2gene = tx2gene_hord)
head(txi_hord_paired$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 


summarizing abundance

summarizing counts

summarizing length

summarizing inferential replicates



Unnamed: 0,ERR1248084,ERR1248085,ERR1248086,ERR1248087,ERR1248088,ERR1248089,ERR1248116,ERR1248117,ERR1248118,ERR1248119,⋯,ERR515188,ERR515189,ERR515190,ERR515191,ERR515192,ERR515193,ERR515194,ERR515195,ERR515196,ERR515197
Horvu_MOREX_1H01G000100,1102.0,2178.0,2399.0,3877.0,925.0,2010.0,1773.0,2407.0,2224.0,3328.0,⋯,1232.0,2237.0,1503.0,1833.0,1695.0,2477.0,2265.0,2578.0,2468.0,2317.0
Horvu_MOREX_1H01G000200,3.0,4.0,38.0,14.0,1.0,65.0,641.0,2034.0,2230.0,2022.0,⋯,1.0,3.0,10.0,1.0,2.0,8.0,6.0,0.0,2.0,2.0
Horvu_MOREX_1H01G000300,346.0,682.0,613.0,1097.0,369.0,903.0,1513.0,2346.0,2464.0,2226.0,⋯,428.0,610.0,519.0,466.0,492.0,652.0,569.0,677.0,710.0,584.0
Horvu_MOREX_1H01G000400,1282.0,1703.0,681.0,1293.0,1645.0,1748.0,490.0,484.0,578.0,685.0,⋯,971.0,1640.0,1312.0,1160.0,2093.0,2738.0,2795.0,3687.0,3044.0,3045.0
Horvu_MOREX_1H01G000500,2693.0,5151.0,2940.0,4303.0,3550.0,3758.0,1589.0,2689.0,2926.0,3255.0,⋯,3656.0,4336.0,4734.0,4469.0,2296.0,2418.0,2349.0,3908.0,3340.0,3221.0
Horvu_MOREX_1H01G000600,1759.535,2989.789,1806.463,2680.405,2580.237,3770.057,2290.44,2551.647,2561.501,2173.419,⋯,2808.661,3316.393,2766.549,2833.275,2984.255,4250.868,3915.629,4172.974,4106.843,4045.146


In [79]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/")
write.table(txi_hord_paired, file = "barley_count_table_paired.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = FALSE, col.names = TRUE)
saveRDS(txi_hord_paired, file = "barley_count_table_paired.rds")

### Hordeum single

In [72]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/")
hord_single <- read.table("barley_project_table_trimmomatic_single.txt", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
glimpse(hord_single)

Rows: 47
Columns: 3
$ ID           [3m[90m<chr>[39m[23m "ERR781039", "ERR781040", "ERR781041", "ERR781042", "ERR…
$ dataset_name [3m[90m<chr>[39m[23m "inflorescence development", "inflorescence development"…
$ tissue       [3m[90m<chr>[39m[23m "apex", "apex", "apex", "apex", "apex", "apex", "apex", …


In [73]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_single")
files_hs <- file.path(hord_single$ID, "abundance.h5")
names(files_hs) <- paste0(hord_single$ID)
head(files_hs)
all(file.exists(files_hs))

In [74]:
# create count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_single/")
txi_hord_single <- tximport(files_hs, type = "kallisto", tx2gene = tx2gene_hord)
head(txi_hord_single$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 


summarizing abundance

summarizing counts

summarizing length

summarizing inferential replicates



Unnamed: 0,ERR781039,ERR781040,ERR781041,ERR781042,ERR781043,ERR781044,ERR781045,ERR781046,ERR781047,ERR781048,⋯,ERR781076,ERR781077,ERR781078,ERR781079,ERR781080,ERR781081,ERR781082,ERR781083,ERR781084,ERR781085
Horvu_MOREX_1H01G000100,336.0,122.0,66.0,208.0,354.0,452.0,388.0,405.0,359.0,657.0,⋯,265.0,434.0,280.0,478.0,216.0,214.0,267.0,574.0,345.0,262.0
Horvu_MOREX_1H01G000200,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,⋯,106.0,13.0,8.0,11.0,113.0,86.0,155.0,70.0,22.0,7.0
Horvu_MOREX_1H01G000300,116.0,56.0,45.0,98.0,108.0,163.0,160.0,133.0,117.0,284.0,⋯,173.0,113.0,73.0,117.0,148.0,145.0,170.0,231.0,107.0,114.0
Horvu_MOREX_1H01G000400,550.0,184.0,133.0,351.0,549.0,657.0,577.0,605.0,540.0,1183.0,⋯,105.0,400.0,294.0,501.0,111.0,120.0,46.0,342.0,178.0,258.0
Horvu_MOREX_1H01G000500,556.0,167.0,116.0,382.0,511.0,635.0,772.0,607.0,630.0,1166.0,⋯,413.0,539.0,353.0,611.0,278.0,252.0,289.0,855.0,465.0,558.0
Horvu_MOREX_1H01G000600,707.3084,260.5543,146.4439,381.1077,572.695,776.8168,827.0809,874.5452,566.1862,1397.799,⋯,428.2955,682.448,428.6905,848.264,295.0219,242.4567,280.6461,950.6211,504.0559,688.9697


In [80]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/")
write.table(txi_hord_single, file = "barley_count_table_single.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = FALSE, col.names = TRUE)
saveRDS(txi_hord_single, file = "barley_count_table_single.rds")

# Merging paired & single reads
#### 1, symlink single & paried reads into a merge folder using `ln -s /path/to/file /path/to/link`:
    * for barley
        - `ln -s /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_single/* /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_allreads_symlinked/`
        - `ln -s /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_paired/* /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_allreads_symlinked/`
    * for wheat
        - `ln -s /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_single/* /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_allreads_symlinked/`
        - `ln -s /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_paired/* /nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_allreads_symlinked/`
#### 2, create file path with all the reads from this folder
### Hordeum

In [10]:
# load tidyverse only after created tx2gene objects, because commands interfere with each-other!!
library(tidyverse, warn.conflicts = FALSE)
setwd("/home/vanda.marosi/floral_development_thesis_vm/datatables/")
barley_meta <- read.table("barley_final.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
barley_meta <- select(barley_meta, Run.ID, Dataset, )
colnames(barley_meta) <- c("ID", "dataset")

In [11]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/04_Kallisto_allreads_symlinked/")
files_h <- file.path(barley_meta$ID, "abundance.h5")
names(files_h) <- paste0(barley_meta$ID)
head(files_h)
all(file.exists(files_h))

In [12]:
# use tx2gene data.frame made earlier as txdb_hord and tx2gene_hord
# create count table
txi_hord <- tximport(files_h, type = "kallisto", tx2gene = tx2gene_hord)
dim(txi_hord$counts)
head(txi_hord$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 
222

Unnamed: 0,ERR781039,ERR781040,ERR781041,ERR781042,ERR781043,ERR781044,ERR781045,ERR781046,ERR781047,ERR781048,⋯,ERR515188,ERR515189,ERR515190,ERR515191,ERR515192,ERR515193,ERR515194,ERR515195,ERR515196,ERR515197
Horvu_MOREX_1H01G000100,336.0,122.0,66.0,208.0,354.0,452.0,388.0,405.0,359.0,657.0,⋯,1232.0,2237.0,1503.0,1833.0,1695.0,2477.0,2265.0,2578.0,2468.0,2317.0
Horvu_MOREX_1H01G000200,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,⋯,1.0,3.0,10.0,1.0,2.0,8.0,6.0,0.0,2.0,2.0
Horvu_MOREX_1H01G000300,116.0,56.0,45.0,98.0,108.0,163.0,160.0,133.0,117.0,284.0,⋯,428.0,610.0,519.0,466.0,492.0,652.0,569.0,677.0,710.0,584.0
Horvu_MOREX_1H01G000400,550.0,184.0,133.0,351.0,549.0,657.0,577.0,605.0,540.0,1183.0,⋯,971.0,1640.0,1312.0,1160.0,2093.0,2738.0,2795.0,3687.0,3044.0,3045.0
Horvu_MOREX_1H01G000500,556.0,167.0,116.0,382.0,511.0,635.0,772.0,607.0,630.0,1166.0,⋯,3656.0,4336.0,4734.0,4469.0,2296.0,2418.0,2349.0,3908.0,3340.0,3221.0
Horvu_MOREX_1H01G000600,707.3084,260.5543,146.4439,381.1077,572.695,776.8168,827.0809,874.5452,566.1862,1397.799,⋯,2808.661,3316.393,2766.549,2833.275,2984.255,4250.868,3915.629,4172.974,4106.843,4045.146


In [14]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/hordeum/")
write.table(txi_hord, file = "barley_count_table_merged.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = FALSE, col.names = TRUE)
saveRDS(txi_hord, file = "barley_count_table_merged.rds")

### Triticum

In [15]:
setwd("/home/vanda.marosi/floral_development_thesis_vm/datatables/")
wheat_meta <- read.table("wheat_final.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
wheat_meta <- select(wheat_meta, Run.ID, Dataset, )
colnames(wheat_meta) <- c("ID", "dataset")
glimpse(wheat_meta)

Rows: 215
Columns: 2
$ ID      [3m[90m<chr>[39m[23m "SRR10737427", "SRR10737428", "SRR10737429", "SRR10737430", "…
$ dataset [3m[90m<chr>[39m[23m "cytoplasmic_male_sterility", "cytoplasmic_male_sterility", "…


In [16]:
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/04_Kallisto_allreads_symlinked/")
files_t <- file.path(wheat_meta$ID, "abundance.h5")
names(files_t) <- paste0(wheat_meta$ID)
head(files_t)
all(file.exists(files_t))

In [17]:
# use tx2gene data.frame made earlier as txdb_hord and tx2gene_hord
# create count table
txi_trit <- tximport(files_t, type = "kallisto", tx2gene = tx2gene_trit)
dim(txi_trit$counts)
head(txi_trit$counts)

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 
93 
94 
95 
96 
97 
98 
99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 


summarizing abundance

summariz

Unnamed: 0,SRR10737427,SRR10737428,SRR10737429,SRR10737430,SRR10737431,SRR10737432,CRR088963,CRR088962,CRR088961,CRR088960,⋯,SRR8413505,SRR8413506,SRR8413507,SRR8413508,SRR5186313,SRR5186364,SRR5186375,SRR5186382,SRR5186387,SRR5186416
TraesCHI1A01G000100,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000200,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000300,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,⋯,6.0,7.878112,14.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
TraesCHI1A01G000400,0.0,0.0,0.0,4.893036,4.726423,1.938802,0.0,0.0,0.0,0.0,⋯,140.5432,108.839845,177.3651,112.3966,1.993108,0.0,0.0,0.0,2.004102,0.0
TraesCHI1A01G000500,1.0,8.0,7.0,9.0,2.0,3.0,1.0,8.0,7.0,9.0,⋯,5.0,11.0,12.0,5.0,1.0,0.0,2.0,0.0,0.0,0.0
TraesCHI1A01G000600,193.3952,2.465932,159.4838,51.088736,8.853329,80.785644,193.3877,2.465675,159.5168,126.3058,⋯,387.2507,513.135328,458.2691,355.3096,0.0,2.649372,1.307648,4.297154,4.083578,1.356089


In [18]:
# save count table
setwd("/nfs/pgsb/projects/comparative_triticeae/phenotype/flower_development/refsets/triticum/")
write.table(txi_trit, file = "wheat_count_table_merged.txt", append = FALSE, quote = FALSE, sep = "\t", dec = ".",
            row.names = TRUE, col.names = TRUE)
saveRDS(txi_trit, file = "wheat_count_table_merged.rds")

In [76]:
sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/vanda.marosi/anaconda3/envs/tximport/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicFeatures_1.40.0 AnnotationDbi_1.50.0   Biobase_2.48.0        
 [4] GenomicRanges_1.40.0   GenomeInfoDb_1.24.0    IRanges_2.22.1        
 [7] S4Vectors_0.26.0       BiocGenerics_0.34.0    forcats_0.5.0         
[10] stringr_1.4.0          dplyr_1.0.0    