-
Notifications
You must be signed in to change notification settings - Fork 78
Description
The ReferenceSequence.data attribute returns the reference sequence data as a string. For large references we almost definitely don't want to do this, as this will create a new Python string and copy of the data. So, it would be good to have a numpy array view of the data.
We should see first how we might use this, though. The only place we're using this at the moment is in the alignments method. In this case we can definitely sidestep the full Python string because we're immediately turning the data into a numpy array here. So, it'll be quite easy to have an internal API using something like data_array which is a view.
However, it might not be worth doing this because we'll have to implement alignments in C fairly soon anyway (#1589
If it's easy I'll implement the data_array when we're in read-only mode for #1935, which is soon on the menu.
In general, I don't think we'll be accessing the data attribute directly much, as we'll want to present a higher-level interface in Python (for example, we implement __getitem__ to support pulling out a slice of a reference, which can operate on either the data or url - see #1988)