<a href="https://colab.research.google.com/github/zuzanadostalova/Tutorials/blob/master/2)_Practical_assignment_intersecting_BED_and_GTF_file.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Assignment:**
Intersect BED file containing FUS gene from Homo sapiens K562 data (https://www.encodeproject.org/files/ENCFF861KMV/) with Human
Release 33 (GRCh38.p13) gff3 file (https://www.gencodegenes.org/human/) using Bedtools. 
First install pybedtools and bedtools, save the contents of the BED and gff3 file to new files. Intersect these files, print the first ten lines of the result.

# Bedtools

If the bedtools are not installed properly, you have to uninstall and reinstall bedtools, then restart the virtual machine.

In [0]:
! apt-get remove  bedtools
! pip uninstall pybedtools
! pip uninstall bedtools

Test if pybedtools are working well.

In [0]:
# Clean installation of pybedtools:
! apt-get install bedtools
! pip install pybedtools
! bedtools

Reading package lists... Done
Building dependency tree       
Reading state information... Done
bedtools is already the newest version (2.26.0+dfsg-5).
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
bedtools is a powerful toolset for genome arithmetic.

Version:   v2.26.0
About:     developed in the quinlanlab.org and by many contributors worldwide.
Docs:      http://bedtools.readthedocs.io/
Code:      https://github.com/arq5x/bedtools2
Mail:      https://groups.google.com/forum/#!forum/bedtools-discuss

Usage:     bedtools <subcommand> [options]

The bedtools sub-commands include:

[ Genome arithmetic ]
    intersect     Find overlapping intervals in various ways.
    window        Find overlapping intervals within a window around an interval.
    closest       Find the closest, potentially non-overlapping interval.
    coverage      C

**Optional:**

In [0]:
# Testing installation of pybedtools.
import pybedtools
def test_bedtools():
    a = pybedtools.example_bedtool("a.bed")
    b = pybedtools.example_bedtool("b.bed")
    c = a.subtract(b)
    return a

test = test_bedtools()
test.head(2)

# Testing subtract and intersection function.
import pybedtools
a = pybedtools.example_bedtool("a.bed")
b = pybedtools.example_bedtool("b.bed")
print(a.intersect(b))
print(a.subtract(b, s=True))

chr1	1	100	feature1	0	+
 chr1	100	200	feature2	0	+
 chr1	155	200	feature2	0	+
chr1	155	200	feature3	0	-
chr1	900	901	feature4	0	+

chr1	1	100	feature1	0	+
chr1	100	200	feature2	0	+
chr1	150	155	feature3	0	-
chr1	200	500	feature3	0	-
chr1	901	950	feature4	0	+



**There are two options:**

**I. You can download BED file manually directly from the database.**

In [0]:
import pandas as pd

# Download file from URL
url = "https://www.encodeproject.org/files/ENCFF861KMV/@@download/ENCFF861KMV.bed.gz"
file_name = "ENCFF861KMV.pandas.bed"

# The dataframe is created via reading csv file with pandas tool. You need to unzip 
# the source bed file, separate the chromosomes from one another using tab delimiter 
# as separator, and name the ten columns of the chromosomes.
dataframe = pd.read_csv(url,
                        compression="gzip",
                        sep="\t",
                        names=["chrom", "start", "end", "name", "score", "strand", "7", "8", "9", "10"]
                        )

# Dump dataframe to file.
dataframe.to_csv(file_name, sep="\t", header=False, index=False)

In [3]:
  import ftplib
  ftp = ftplib.FTP("ftp.ebi.ac.uk") 
  # Provide user, password.
  ftp.login("", "") 
  # Move to path.
  ftp.cwd("/pub/databases/gencode/Gencode_human/release_33/")
  # Retrive file and save in local file.
  ftp.retrbinary("RETR " + "gencode.v33.annotation.gtf.gz" ,open("gencode.v33.annotation.gtf.gz", "wb").write)
  # Quit server.
  ftp.quit()

'221 Goodbye.'

**II. Or you can automatize the BED and gtf file reading, if you write corresponding functions.**



A:
BED reader.

In [0]:
# Download BED file from URL via requests module and gzip.
from requests import get
from gzip import decompress

# Define a function that will download the compressed (gz) BED file,
# and it will decompress it to text.
# Helper function get_gzipped_bed_ is inside the main bed_reader function.
def bed_reader(target_url, output_name):
  def get_gzipped_bed_(target_url):
    downloaded_bed = get(target_url).content
    decompressed_bed = decompress(downloaded_bed)
    return decompressed_bed
  # Decode to utf-8
  decompressed = get_gzipped_bed_(target_url)
  bed_file_human_readable = decompressed.decode()
  # Write bed file to new file on the disk
  output_file = open(output_name, "w")
  output_file.write(bed_file_human_readable)
  output_file.close()
  return output_name

# Call the outer function and provide it with two arguments - url and arbitrary output name.
bed_reader("https://www.encodeproject.org/files/ENCFF861KMV/@@download/ENCFF861KMV.bed.gz", 
           "ENCFF861KMV.bed.gz")

# The program prints just the last statement so if these two last statements
# are not commented out, the output name is not going to be outputed.
# Instead, if the path exists, True will be outputed.
import os
os.path.exists("ENCFF861KMV.bed.gz")

True

Respectively, you can create FTP reader function. 

In [0]:
import ftplib

# Username and password are optional.
def ftp_reader(server_address, user, password, path, filename):
  # Connect to server.
  ftp = ftplib.FTP(server_address) 
  # Provide user, password.
  ftp.login(user, password) 
  # Move to path.
  ftp.cwd(path)
  # Retrive file and save in local file.
  ftp.retrbinary("RETR " + filename ,open(filename, "wb").write)
  # Quit server.
  ftp.quit()
  return filename

# The filename is not arbitrary.
ftp_reader("ftp.ebi.ac.uk", "", "", "/pub/databases/gencode/Gencode_human/release_33/", 
           "gencode.v33.annotation.gtf.gz")

'gencode.v33.annotation.gtf.gz'

If you need to concomitantly open more files, you can join them in a list and iterate over it.


B:

In [0]:
list_downloads = ["gencode.v33.long_noncoding_RNAs.gtf.gz", "gencode.v33.annotation.gtf.gz"]

for download in list_downloads:
  print("The name of the downloaded file is:", download)
  ftp_reader("ftp.ebi.ac.uk", "", "", "/pub/databases/gencode/Gencode_human/release_33/", download)
  print("Done")

# The procedure is the same as calling the function for each file
ftp_reader("ftp.ebi.ac.uk", "", "", "/pub/databases/gencode/Gencode_human/release_33/", 
           "gencode.v33.long_noncoding_RNAs.gtf.gz")
ftp_reader("ftp.ebi.ac.uk", "", "", "/pub/databases/gencode/Gencode_human/release_33/", 
           "gencode.v33.annotation.gtf.gz")

The name of the downloaded file is: gencode.v33.long_noncoding_RNAs.gtf.gz
Done
The name of the downloaded file is: gencode.v33.annotation.gtf.gz
Done


'gencode.v33.annotation.gtf.gz'

**Once both BED and gtf files are read, you can intersect them:**


You can load them either from the dataframe created in option II.:


In [0]:
from pybedtools import BedTool
import pandas as pd
a = BedTool.from_dataframe(dataframe)
b = BedTool("gencode.v33.annotation.gtf.gz")
data = a.intersect(b, s=True)

The output is huge so you are going to print only the first ten lines of the result.

In [0]:
data.head(10)

chr9	137102003	137102187	FUS_K562_IDR	1000	+	3.2259544061812297	6.780834778169661	-1	-1
 chr9	137102003	137102187	FUS_K562_IDR	1000	+	3.2259544061812297	6.780834778169661	-1	-1
 chr9	137102003	137102187	FUS_K562_IDR	1000	+	3.2259544061812297	6.780834778169661	-1	-1
 chr9	137102003	137102187	FUS_K562_IDR	1000	+	3.2259544061812297	6.780834778169661	-1	-1
 chr9	137102003	137102187	FUS_K562_IDR	1000	+	3.2259544061812297	6.780834778169661	-1	-1
 chr3	128492692	128492792	FUS_K562_IDR	1000	+	5.07968689911725	10.2944552008675	-1	-1
 chr3	128492692	128492792	FUS_K562_IDR	1000	+	5.07968689911725	10.2944552008675	-1	-1
 chr3	128492692	128492792	FUS_K562_IDR	1000	+	5.07968689911725	10.2944552008675	-1	-1
 chr3	128492692	128492792	FUS_K562_IDR	1000	+	5.07968689911725	10.2944552008675	-1	-1
 chr8	143593204	143593275	FUS_K562_IDR	1000	-	4.68211356648989	6.0097055538715605	-1	-1
 

Or from the files created with the help of BED and ftp reader:

In [0]:
import pybedtools
from pybedtools import BedTool
a = BedTool("gencode.v33.annotation.gtf.gz")
b = BedTool("ENCFF861KMV.bed.gz")
c = a.intersect(b, c=True)

In [0]:
c.head(10)

chr1	HAVANA	gene	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";	0
 chr1	HAVANA	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";	0
 chr1	HAVANA	exon	11869	12227	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; hava

**Alternatively, you can achieve the intersect of two bed files using string data:**

In [0]:
# An example of easy strings containing BED data.
from pybedtools import BedTool

bed_1 = """
chrX  1  100 interval_1a . +
chrX 25  800 interval_2a . +
"""

bed_2 = """
chrX  5 10 interval_1b . -
chrX  801 900 interval_1b . -
"""

# Create BedTool objects with BedTool method (before imported froom pybedtools)
# In the parentheses the name of the file - the name of the file (string in this case) and 
# function enabling the string read
a = BedTool(bed_1, from_string=True)
b = BedTool(bed_2, from_string=True)

# Call BedTool method "intersect" to find overlapped intervals from bed files
# Strandness (s) - if set to True, just the same direction strand overlapped

a_b = a.intersect(b, s=False) # a is a BedTool object; you are calling its method intersect

# Print first N=2 lines
# This is the new BedTool object resulting from the intersection of a and b
a_b.head(2) 



chrX	5	10	interval_1a	.	+
 

However, the same approach is not going to work for intersection of bed and gff3 file since they have different amount of fields.

In [0]:
# Intersecting bed and gff3 file.
# If you import BedTool from pybedtools, you avoid writing pybedtools.BedTool()
# every time and you can use BedTool() instead.
from pybedtools import BedTool

bed = """chr1	11895	11995	interval1	.	+
chr1	11900	14309	interval2	.	+
chr1	12000	12090	interval3	.	+
chr1	12700	12710	interval4	.	+
chr1	13300	14209	interval5	.	+
chr1	12080	13570	interval6	.	+
chr1	12040	12047	interval7	.	+	
chr1	12190	12210	interval8	.	+
chr1	12655	12680	interval9	.	+
chr1	12990	13040	interval10  .	+
chr1	13300	13354	interval11	.	+"""

gff3 = """Chromosome Source Ft Start End Score Strand Phase Gen_id Gene_type Gene_name Transcript_type Transcript_name Level Transcript_support_level hgnc_id tag havana_gene havana_transcript
chr1	HAVANA	gene	11869	14409	.	+	.	ID=ENSG00000223972.5;gene_id=ENSG00000223972.5;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;level=2;hgnc_id=HGNC:37102;havana_gene=OTTHUMG00000000961.2
chr1	HAVANA	transcript	11869	14409	.	+	.	ID=ENST00000456328.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	11869	12227	.	+	.	ID=exon:ENST00000456328.2:1;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=1;exon_id=ENSE00002234944.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	12613	12721	.	+	.	ID=exon:ENST00000456328.2:2;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=2;exon_id=ENSE00003582793.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	13221	14409	.	+	.	ID=exon:ENST00000456328.2:3;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=3;exon_id=ENSE00002312635.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	transcript	12010	13670	.	+	.	ID=ENST00000450305.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12010	12057	.	+	.	ID=exon:ENST00000450305.2:1;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=1;exon_id=ENSE00001948541.1;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12179	12227	.	+	.	ID=exon:ENST00000450305.2:2;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=2;exon_id=ENSE00001671638.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12613	12697	.	+	.	ID=exon:ENST00000450305.2:3;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=3;exon_id=ENSE00001758273.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12975	13052	.	+	.	ID=exon:ENST00000450305.2:4;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=4;exon_id=ENSE00001799933.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	13221	13374	.	+	.	ID=exon:ENST00000450305.2:5;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=5;exon_id=ENSE00001746346.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
"""

a = BedTool(bed, from_string=True)
b = BedTool(gff3, from_string=True)

i = a.intersect(b, s=True) 
s = a.subtract(b, s=True)
print(i)
print(s)

BEDToolsError: ignored

#Pandas overlaps

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Interval.overlaps.html

This approach is suitable for two numerical intervals rather than for a file:

In [0]:
# First, you need to import the corresponding packages.
import pandas as pd

i1 = pd.Interval(11895,11995)
i2 = pd.Interval(11869,14409)
i1.overlaps(i2)

True

Because you do not want to manually compare each pair of intervals from each file and you want to rather compare the files as a whole instead, you cannot use pandas overlaps. Bedtools are more suitable for this purpose.

In [0]:
import pandas as pd
# You need to re-upload the desired files to the colab notebook
# every time the runtime is restarted by clicking to the folder
# icon on the left and subsequently upload.
i1 = pd.Interval("gencode.v33.annotation.gtf.gz")
i2 = pd.Interval("ENCFF861KMV.bed.gz")
i1.overlaps(i2)

TypeError: ignored

#PyRanges
https://biocore-ntnu.github.io/pyranges/index.html

Reading BED or gtf file as .csv does not provide the satisfactory output for the BED file:

In [0]:
# Everytime the runtime is restarted, you need to run the installation again.

# Once the installation is done and you are intending to change/run the content 
# of the cell or any following cell, you can comment it out as the requirement 
# will remain satisfied throughout the runtime.
!pip install pyranges
import pandas as pd
import pyranges as pr

# gtf file
# "Names" are arbitrary but it is strongly recommended to name them according
# to the database consensus.
df = pd.read_csv("test.gff3", header=None, names=["chrom", "origin", "type", "start", 
                                                  "end", "5", "strand", "7", "name"], sep="\t")
# How = "any" deletes all of the rows including at least one NaN = not a number
df.dropna(how="any") 



Unnamed: 0,chrom,origin,type,start,end,5,strand,7,name
0,chr1,HAVANA,gene,11869,14409,.,+,.,ID=ENSG00000223972.5;gene_id=ENSG00000223972.5...
1,chr1,HAVANA,transcript,11869,14409,.,+,.,ID=ENST00000456328.2;Parent=ENSG00000223972.5;...
2,chr1,HAVANA,exon,11869,12227,.,+,.,ID=exon:ENST00000456328.2:1;Parent=ENST0000045...
3,chr1,HAVANA,exon,12613,12721,.,+,.,ID=exon:ENST00000456328.2:2;Parent=ENST0000045...
4,chr1,HAVANA,exon,13221,14409,.,+,.,ID=exon:ENST00000456328.2:3;Parent=ENST0000045...
5,chr1,HAVANA,transcript,12010,13670,.,+,.,ID=ENST00000450305.2;Parent=ENSG00000223972.5;...
6,chr1,HAVANA,exon,12010,12057,.,+,.,ID=exon:ENST00000450305.2:1;Parent=ENST0000045...
7,chr1,HAVANA,exon,12179,12227,.,+,.,ID=exon:ENST00000450305.2:2;Parent=ENST0000045...
8,chr1,HAVANA,exon,12613,12697,.,+,.,ID=exon:ENST00000450305.2:3;Parent=ENST0000045...
9,chr1,HAVANA,exon,12975,13052,.,+,.,ID=exon:ENST00000450305.2:4;Parent=ENST0000045...


In [0]:
!pip install pyranges 
# You do not need to repeat the installation of pyranges if you have installed 
# them previously.
import pandas as pd
import pyranges as pr

# BED file
df = pd.read_csv("test.bed", header=None, names=["chrom", "start", "end", "interval", "score", "strand"], sep="\t")
df.dropna(how="any") 


Unnamed: 0,chrom,start,end,interval,score,strand
0,chr1,13300,14209,interval5,.,+
1,chr1,12080,13570,interval6,.,+
2,chr1,12040,12047,interval7,.,+
3,chr1,12190,12210,interval8,.,+
4,chr1,12655,12680,interval9,.,+
5,chr1,12990,13040,interval10,.,+
6,chr1,13300,13354,interval11,.,+


The best approach when you have a string data is to transform it into a dataframe. 

In [0]:
!pip install pyranges
import pyranges as pr
from pyranges import PyRanges
import pandas as pd
from io import StringIO
gff3 = """Chromosome Source Ft Start End Score Strand Phase Gen_id Gene_type Gene_name Transcript_type Transcript_name Level Transcript_support_level hgnc_id tag havana_gene havana_transcript
chr1	HAVANA	gene	11869	14409	.	+	.	ID=ENSG00000223972.5;gene_id=ENSG00000223972.5;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;level=2;hgnc_id=HGNC:37102;havana_gene=OTTHUMG00000000961.2
chr1	HAVANA	transcript	11869	14409	.	+	.	ID=ENST00000456328.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	11869	12227	.	+	.	ID=exon:ENST00000456328.2:1;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=1;exon_id=ENSE00002234944.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	12613	12721	.	+	.	ID=exon:ENST00000456328.2:2;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=2;exon_id=ENSE00003582793.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	exon	13221	14409	.	+	.	ID=exon:ENST00000456328.2:3;Parent=ENST00000456328.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000456328.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=processed_transcript;transcript_name=DDX11L1-202;exon_number=3;exon_id=ENSE00002312635.1;level=2;transcript_support_level=1;hgnc_id=HGNC:37102;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000362751.1
chr1	HAVANA	transcript	12010	13670	.	+	.	ID=ENST00000450305.2;Parent=ENSG00000223972.5;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12010	12057	.	+	.	ID=exon:ENST00000450305.2:1;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=1;exon_id=ENSE00001948541.1;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12179	12227	.	+	.	ID=exon:ENST00000450305.2:2;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=2;exon_id=ENSE00001671638.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12613	12697	.	+	.	ID=exon:ENST00000450305.2:3;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=3;exon_id=ENSE00001758273.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	12975	13052	.	+	.	ID=exon:ENST00000450305.2:4;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=4;exon_id=ENSE00001799933.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
chr1	HAVANA	exon	13221	13374	.	+	.	ID=exon:ENST00000450305.2:5;Parent=ENST00000450305.2;gene_id=ENSG00000223972.5;transcript_id=ENST00000450305.2;gene_type=transcribed_unprocessed_pseudogene;gene_name=DDX11L1;transcript_type=transcribed_unprocessed_pseudogene;transcript_name=DDX11L1-201;exon_number=5;exon_id=ENSE00001746346.2;level=2;transcript_support_level=NA;hgnc_id=HGNC:37102;ont=PGO:0000005,PGO:0000019;tag=basic;havana_gene=OTTHUMG00000000961.2;havana_transcript=OTTHUMT00000002844.2
"""
# Read the string separated by \s+ into a dataframe.
df = pd.read_csv(StringIO(gff3),  sep="\s+")

pyr1 = pr.PyRanges(df)
print(pyr1)

bed = """Chromosome Start End Name Score Strand
chr1	11895	11995	interval1	.	+
chr1	11900	14309	interval2	.	+
chr1	12000	12090	interval3	.	+
chr1	12700	12710	interval4	.	+
chr1	13300	14209	interval5	.	+
chr1	12080	13570	interval6	.	+
chr1	12040	12047	interval7	.	+	
chr1	12190	12210	interval8	.	+
chr1	12655	12680	interval9	.	+
chr1	12990	13040	interval10  .	+
chr1	13300	13354	interval11	.	+"""

df = pd.read_csv(StringIO(bed),  sep="\s+")

pyr2 = pr.PyRanges(df)
print(pyr2)
print(pyr1.intersect(pyr2))

print(pyr1.subtract(pyr2))

+--------------+------------+------------+-----------+-----------+-------+
| Chromosome   | Source     | Ft         | Start     | End       | +14   |
| (category)   | (object)   | (object)   | (int32)   | (int32)   | ...   |
|--------------+------------+------------+-----------+-----------+-------|
| chr1         | HAVANA     | gene       | 11869     | 14409     | ...   |
| chr1         | HAVANA     | transcript | 11869     | 14409     | ...   |
| chr1         | HAVANA     | exon       | 11869     | 12227     | ...   |
| chr1         | HAVANA     | exon       | 12613     | 12721     | ...   |
| ...          | ...        | ...        | ...       | ...       | ...   |
| chr1         | HAVANA     | exon       | 12179     | 12227     | ...   |
| chr1         | HAVANA     | exon       | 12613     | 12697     | ...   |
| chr1         | HAVANA     | exon       | 12975     | 13052     | ...   |
| chr1         | HAVANA     | exon       | 13221     | 13374     | ...   |
+--------------+---------

This however is not optimal approach if you have a file instead of a string - rather use bedtools in that case.