Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target gene assignment in vignette: Problem with "_PAR_Y" #5

Closed
MelinaKlostermann opened this issue Jan 18, 2023 · 1 comment
Closed
Assignees

Comments

@MelinaKlostermann
Copy link
Contributor

MelinaKlostermann commented Jan 18, 2023

Hello,
I am using your vignette (https://www.bioconductor.org/packages/release/bioc/vignettes/BindingSiteFinder/inst/doc/vignette.html) and found and got an Error in the " Target gene assignment" part (with gencode.v31.annotation.gff3):

> annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
> annoInfo = import(mygff3, format = "gff3")
> 
> # Get genes as GRanges
> gns = genes(annoDb)
> idx = match(gns$gene_id, annoInfo$gene_id)
> elementMetadata(gns) = cbind(elementMetadata(gns),
+                              elementMetadata(annoInfo)[idx,])

Error: subscript contains NAs

The problems seem to be the _PAR_Y gene names.

gns$gene_id[!(gns$gene_id %in% annoInfo$gene_id)]

[1] "ENSG00000002586.20_PAR_Y" "ENSG00000124333.16_PAR_Y" "ENSG00000124334.17_PAR_Y" "ENSG00000167393.17_PAR_Y" "ENSG00000168939.11_PAR_Y" "ENSG00000169084.14_PAR_Y"
[7] "ENSG00000169093.16_PAR_Y" "ENSG00000169100.14_PAR_Y" "ENSG00000178605.13_PAR_Y" "ENSG00000182162.11_PAR_Y" "ENSG00000182378.14_PAR_Y" "ENSG00000182484.15_PAR_Y"
[13] "ENSG00000185203.12_PAR_Y" "ENSG00000185291.11_PAR_Y" "ENSG00000185960.14_PAR_Y" "ENSG00000196433.13_PAR_Y" "ENSG00000197976.12_PAR_Y" "ENSG00000198223.16_PAR_Y"
[19] "ENSG00000205755.11_PAR_Y" "ENSG00000214717.12_PAR_Y" "ENSG00000223274.6_PAR_Y" "ENSG00000223484.7_PAR_Y" "ENSG00000223511.7_PAR_Y" "ENSG00000223571.6_PAR_Y"
[25] "ENSG00000223773.7_PAR_Y" "ENSG00000225661.7_PAR_Y" "ENSG00000226179.6_PAR_Y" "ENSG00000227159.8_PAR_Y" "ENSG00000228410.6_PAR_Y" "ENSG00000228572.7_PAR_Y"
[31] "ENSG00000229232.6_PAR_Y" "ENSG00000230542.6_PAR_Y" "ENSG00000234622.6_PAR_Y" "ENSG00000234958.6_PAR_Y" "ENSG00000236017.8_PAR_Y" "ENSG00000236871.7_PAR_Y"
[37] "ENSG00000237040.6_PAR_Y" "ENSG00000237531.6_PAR_Y" "ENSG00000237801.6_PAR_Y" "ENSG00000265658.6_PAR_Y" "ENSG00000270726.6_PAR_Y" "ENSG00000275287.5_PAR_Y"
[43] "ENSG00000277120.5_PAR_Y" "ENSG00000280767.3_PAR_Y" "ENSG00000281849.3_PAR_Y"

A solution is to remove the _PAR. Maybe include that in the vignette?

annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
annoInfo = import(mygff3, format = "gff3")

# Get genes as GRanges
gns = genes(annoDb)
gns = gns[!grepl(pattern = "_PAR_Y", gns$gene_id)]
idx = match(gns$gene_id, annoInfo$gene_id)
elementMetadata(gns) = cbind(elementMetadata(gns),
                             elementMetadata(annoInfo)[idx,])
@MirkoBr
Copy link
Contributor

MirkoBr commented Jun 22, 2023

This fixed with the latest major update on our way to version 2.0. See commit 80de317

@MirkoBr MirkoBr closed this as completed Jun 22, 2023
@MirkoBr MirkoBr self-assigned this Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants