## Create Scatter plots with DESeq output files

DESeq determines genes or other elements whose expression is significantly changed when comparing two conditions. The ouput is reported in a table. The purpose of this script is to generate scatter plots from multiple DESeq tables that have been pre-processed. Included in the script is a loop that iterated through the entries and assigns colors for all significantly up- and down-regulted genes (in this case retrotransposons). 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import glob

In [8]:
files = glob.glob('files/*.interval')    # create list with filenames (pre-processed output files from DESeq)

index = 0
for file in files:                            # iterate through list with file names
    file_name = files[index]
    colnames = ['Chr', 'Start', 'End', 'Strand', 'Transposon', 'Mean', 'log2fold', 'x1', 'x2', 'pvalue', 'qvalue']
    df = pd.read_csv(file_name, sep = '\t', names=colnames)

    color = []                         # assign colours (red/blue) to up- and downregulated TEs, save as list in 'color'
    for row in df.itertuples():
        if row[7] > 0 and row[11] < 0.05:    # transposons that are significantly upregulated will be shown in red
            color.append('red')
        elif row[7] < 0 and row[11] < 0.05:  # transposons that are significantly downregulated will be shown in blue
            color.append('blue')
        else: 
            color.append(0.5)                # not significantly changed transposons will be shown in grey

    df.plot(x='Mean', y='log2fold', kind='scatter', s=1, c=color)   # crate scatter plot and save as file
    plt.xscale('log') 
    plt.xlabel('Mean TE expression')
    plt.ylabel('Log2-fold change')
    plt.title(file_name[6:-9])                  # plot title is input file name without extension
    plt.axhline(color='black')
    plt.savefig(file_name[6:-9])
    plt.close()
    index += 1
