Skip to content

Add zero-copy array access to ReferenceSequence #1989

@jeromekelleher

Description

@jeromekelleher

The ReferenceSequence.data attribute returns the reference sequence data as a string. For large references we almost definitely don't want to do this, as this will create a new Python string and copy of the data. So, it would be good to have a numpy array view of the data.

We should see first how we might use this, though. The only place we're using this at the moment is in the alignments method. In this case we can definitely sidestep the full Python string because we're immediately turning the data into a numpy array here. So, it'll be quite easy to have an internal API using something like data_array which is a view.

However, it might not be worth doing this because we'll have to implement alignments in C fairly soon anyway (#1589

If it's easy I'll implement the data_array when we're in read-only mode for #1935, which is soon on the menu.

In general, I don't think we'll be accessing the data attribute directly much, as we'll want to present a higher-level interface in Python (for example, we implement __getitem__ to support pulling out a slice of a reference, which can operate on either the data or url - see #1988)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Python APIIssue is about the Python APIfutureIssues that are closed as they are not planned in the medium-term, but which are still desirable.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions