-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading BED files with less than 12 column. #144
Comments
Thank you for raising this, let me have a look today and I'll follow up. At first blush it makes sense to me. Also, FWIW, I was thinking about adding specific bigbed support along with indexed search (I see your link is bigbed related not just bed). Is that something you would/could use? |
@ghuls I think this should be doable now if you update biobear. In [5]: session.read_bed_file('./test-three.bed', options=bb.BEDReadOptions(n_fields=3)).to_polars()
Out[5]:
shape: (10, 3)
┌─────────────────────────┬───────┬───────┐
│ reference_sequence_name ┆ start ┆ end │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════════════════════════╪═══════╪═══════╡
│ chr1 ┆ 11874 ┆ 12227 │
│ chr1 ┆ 12613 ┆ 12721 │
│ chr1 ┆ 13221 ┆ 14409 │
│ chr1 ┆ 14362 ┆ 14829 │
│ chr1 ┆ 14970 ┆ 15038 │
│ chr1 ┆ 15796 ┆ 15947 │
│ chr1 ┆ 16607 ┆ 16765 │
│ chr1 ┆ 16858 ┆ 17055 │
│ chr1 ┆ 17233 ┆ 17368 │
│ chr1 ┆ 17606 ┆ 17742 │
└─────────────────────────┴───────┴───────┘ Technically things shouldn't fail anymore if you don't specify the number of fields and the BED less than the full complement of fields, it just fills the additional cols with null. In [7]: session.read_bed_file('./test-three.bed').to_polars()
Out[7]:
shape: (10, 12)
┌─────────────────────────┬───────┬───────┬──────┬───┬───────┬─────────────┬─────────────┬──────────────┐
│ reference_sequence_name ┆ start ┆ end ┆ name ┆ … ┆ color ┆ block_count ┆ block_sizes ┆ block_starts │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ str ┆ ┆ str ┆ i64 ┆ str ┆ str │
╞═════════════════════════╪═══════╪═══════╪══════╪═══╪═══════╪═════════════╪═════════════╪══════════════╡
│ chr1 ┆ 11874 ┆ 12227 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 12613 ┆ 12721 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 13221 ┆ 14409 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 14362 ┆ 14829 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 14970 ┆ 15038 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 15796 ┆ 15947 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 16607 ┆ 16765 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 16858 ┆ 17055 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 17233 ┆ 17368 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
│ chr1 ┆ 17606 ┆ 17742 ┆ null ┆ … ┆ null ┆ null ┆ null ┆ null │
└─────────────────────────┴───────┴───────┴──────┴───┴───────┴─────────────┴─────────────┴──────────────┘ I'm gonna close this task, but please reopen if it remains an issue. Thanks! |
I didn't find the |
Cool, thanks for the context. |
It would be nice if BED files with less than 12 columns could be read.
For example if in BEDReadOptions, you can specify how many of the BED columns follow the spec.
Additional columns could be read as String columns.
Similarily to UCSC bigBed:
BED3 or -type=bedN[+[P]], where N is an integer between 3 and 12 and the optional +[P] parameter specifies the number of extra fields, not required, but preferred
http://genome.ucsc.edu/goldenPath/help/bigBed.html
The text was updated successfully, but these errors were encountered: