Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - local baseline detection to allow for slow drift in solid-state nanopore baseline #69

Closed
shadowk29 opened this issue Dec 11, 2015 · 9 comments

Comments

@shadowk29
Copy link
Collaborator

Solid-state nanopores often change size over the course of a few hours of current data, making the values of baseline stats calculated at the beginning in applicable to later sections of the same run. An option that allows calculation of local baseline for each new chunk of data requested would be helpful for analysis of long solid-state nanopore runs.

@abalijepalli
Copy link
Member

I set up a new branch devel-1.0-ticket69 that we can use to test this before merging back to devel-1.0.

abalijepalli added a commit that referenced this issue Dec 12, 2015
Recalculate baseline currents when drift checks are disabled and baseline detection is set to automatic.
@abalijepalli
Copy link
Member

When you get a chance, test the fix in commit 101b449 with your data set. You will have to set driftThreshold and maxDriftRate to negative values to turn off drift checking. Also, set the baseline estimation to automatic by setting meanOpenCurr, sdOpenCurr and slopeOpenCurr to -1. The partition function should then update the baseline for each new chunk of data.

@shadowk29
Copy link
Collaborator Author

Not sure yet if this is unique to this branch or not since I have a test running at the moment, but mosaic currently crashes with a ValueError if the length of the data file fits perfectly into an integer number of data blocks.

@shadowk29
Copy link
Collaborator Author

Couple of bugs, I think. I may be misunderstanding how it is set up, but let me know if I have this right and I can fix them:

eventSegment._checkdrift() is not called from eventSegment._eventsegment(), so the update is not performed currently. I think _checkdrift() should be called in _eventsegment() right after
t=self.currData.popleft()
self.globalDataIndex+=1

as self._checkdrift(t).

Within _checkdrift(), after the first time is is called,
if self.meanOpenCurr == -1. or self.sdOpenCurr == -1. or self.slopeOpenCurr == -1.:

will fail because those variables were reset on the last run time _checkdrift was called. I think we can simply remove that condition?

Let me know.

@shadowk29
Copy link
Collaborator Author

I added pull request #70 with a correction to the baseline updates. There are some other bugs I am trying to track down (specifically, AbsEventStart column in my output does not match the location of events in the data file). Not clear if this is specific to this branch yet. It seems like baseline limits might be necessary, though, as the program gets bogged down detecting thousands of events during clogged states which are longer than the BlockSize.

@shadowk29
Copy link
Collaborator Author

I think I screwed up that pull request and did not push my local changes. Will fix tomorrow.

@shadowk29
Copy link
Collaborator Author

Submitted Pull request #72 to partially address the issues here.

Outstanding issues: on clogs that slightly overlap the good baseline, mosaic gets hung up thinking that there are events on every data point. This is true even for the regular mosaic approach that calculates baseline only at the start. Not clear yet what is causing this, but it will be the first thing I debug when I get back in January.

@shadowk29
Copy link
Collaborator Author

Pull request #83 should cover the issues here, pending more tests

@abalijepalli
Copy link
Member

I'll close this for now. We can reopen it if other issues arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants