Code and data associated with the PastDB web and publication (Martín et al, Genome Biol 2021). For any further enquires, please feel free to contact Manuel Irimia (mirimia@gmail.com) and/or Guiomar Martín (guiomarm@igc.gulbenkian.pt). Additional information can be found in PastDB.
Full citation: Martín, G., Márquez, Y., Duque, P., Irimia, M. (2021). Alternative splicing landscapes in Arabidopsis thaliana across tissues and stress conditions highlight major functional differences with animals. Genome Biol, 22:35.
-
Scripts in bin (all perl scripts contain a help option on how to be run and internal comments):
Get_Event_Stats.pl
: to calculate general statistics per AS event from any INCLUSION table.Get_PanAS_Events.pl
: to define PanAS events from any INCLUSION table.Get_Stress_Cores.pl
: to get abiotic and biotic stress AS core sets, as well as the associated control sets.Get_Tissue_Specific_AS.pl
: to get tissue-specific AS events from any INCLUSION table.Get_Tissue_Specific_GE.pl
: to get genes with tissue-specific expression from any cRPKM/TPM table.Quantify_AS_by_Subsampling.pl
: calculate the fraction of genes that are alternatively spliced by event type from an INCLUSION table.Get_Plots_Stress_vs_Tissues.R
: used to plot Figure 5c (comparing stress vs tissue AS contributions in the four species).Calculate_SS_SCORES_From_PWMs.R
: to calculate PWM-based splice site scores.Pipeline_Get_Chain_Aln.sh
: bash pipeline to obtain liftOver files.Get_Results_From_Liftover.pl
: used to parse the pairwise liftover outputsGet_Results_From_ExOrthist.pl
: used to perform the 4-way overlap between core AS sets.
-
Files from PastDB: the main data files used for the analyses are available for download in PastDB, and are also copied here:
-
AS events:
- EVENTS table: Information about AS event coordinates and sequences. TAIR10 asssembly.
- MAIN PSI table: Inclusion patterns of AS events across tissues, cell types and developmental stages (main PSI plot).
- ABIOTIC PSI table: Inclusion patterns of AS events in ABIOTIC stress experiments (special dataset).
- BIOTIC PSI table: Inclusion patterns of AS events in BIOTIC stress experiments (special dataset).
- LIGHT PSI table: Inclusion patterns of AS events in LIGHT experiments (special dataset).
- SPL_FACTORS PSI table: Inclusion patterns of AS events upon SPLICING FACTOR disruption (special dataset).
-
Event features:
- SPLICE SITES table: Sequences and strength scores of 5' and 3' splice sites of alternative exons.
- PCR VALIDATION table: Suggested primer sequences and expected band lengths for validation of AS events by RT-PCR.
-
Protein impact:
- PROTEIN IMPACT table: Effect of the AS event in the open reading frame of the transcript. Version v3.
- PROTEIN ISOFORMS: Mappings of events to ProteinIDs.
- DOMAINS table (PFAM): Mappings to Pfam domains
- DOMAINS table (PROSITE): Mappings to PROSITE domains.
- PROTEIN DISORDERED REGIONS table: Intrinsic disorder rates for A, C1 and C2 exons, using disopred3.
-
Genes:
- GENES table: Information about gene names, descriptions, genomic coordinates and biotypes.
- MAIN EXPRESSION table: Gene expression across tissues, cell types and developmental stages. Measured in cRPKM and in raw reads (main GE plot).
- ABIOTIC EXPRESSION table: Gene expression in ABIOTIC stress experiments (special dataset).
- BIOTIC EXPRESSION table: Gene expression in BIOTIC stress experiments (special dataset).
- LIGHT EXPRESSION table: Gene expression in LIGHT experiments (special dataset).
- SPL_FACTORS EXPRESSION table: Gene expression upon SPLICING FACTOR disruption (special dataset).
- GENE-EVENTS table: Table relating genes to AS events.
-
Samples:
- SAMPLE_INFO table: SRA identifiers and other information related to RNA-seq data used in this database.
-
-
Files in data/ folder:
-
General files:
- AllEvents_for_comparison-Ath.txt.gz (1.3M)
- Ath.Event-Gene.IDs.txt (9.7M)
- Stress_vs_Tissues-input_table.tab (5.1M)
-
Splice sites to calculate SS scores based on PWMs:
- Annotated_ACCEPTORS-Ath.fasta.gz (1.2M)
- Annotated_DONORS-Ath.fasta.gz (615K)
- REFERENCE-ALL_ANNOT-Ath163-3ss.fasta.gz (3.6M)
- REFERENCE-ALL_ANNOT-Ath163-5ss.fasta.gz (1.8M)
-
Lifted events to Brassicacea species by event type:
- EX-Ath-to-Aal-FILTERED.tab.gz (803K)
- EX-Ath-to-Aly-FILTERED.tab.gz (920K)
- EX-Ath-to-Bra-FILTERED.tab.gz (766K)
- EX-Ath-to-Csa-FILTERED.tab.gz (892K)
- INT-Ath-Aal-FILTERED.tab.gz (498K)
- INT-Ath-Aly-FILTERED.tab.gz (1.0M)
- INT-Ath-Bra-FILTERED.tab.gz (573K)
- INT-Ath-Csa-FILTERED.tab.gz (941K)
- ALTA-Ath-to-Aal-FILTERED.tab.gz (459K)
- ALTA-Ath-to-Aly-FILTERED.tab.gz (648K)
- ALTA-Ath-to-Bra-FILTERED.tab.gz (412K)
- ALTA-Ath-to-Csa-FILTERED.tab.gz (580K)
- ALTD-Ath-to-Aal-FILTERED.tab.gz (245K)
- ALTD-Ath-to-Aly-FILTERED.tab.gz (356K)
- ALTD-Ath-to-Bra-FILTERED.tab.gz (216K)
- ALTD-Ath-to-Csa-FILTERED.tab.gz (315K)
-
Gene and exon orthology clusters:
- gene_cluster_file-araTha10_ce11_dm6_hg38.gz (286K)
- EX_clusters-int2b.tab (3.3M)
-