Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baseline detection fails if the histogram built by ionic_current_stats contains many zeros #82

Closed
shadowk29 opened this issue Jul 26, 2016 · 8 comments

Comments

@shadowk29
Copy link
Collaborator

If the histogram built by ionic current stats contains a large number of zeros, fitting will fail and give enormous values for perr. The solution is simple: weight the residuals by the y values. You can see an explanation here:

http://math.stackexchange.com/questions/1771660/analytical-solution-to-nonlinear-least-squares-problem

I'm not 100% sure how to fit this intro curve_fit, however. They provide a sigma parameter, which weights data points as 1/sigma^2, so it get y-weighting you would have to provide sigma=1/sqrt(y), which is undefined for y=0. I think the proper solution would be to use scipy.optimize.minimize directly and write a custom residuals function.

@abalijepalli
Copy link
Member

Does this become an issue with sparse data? What steps will reproduce this problem?

@shadowk29
Copy link
Collaborator Author

shadowk29 commented Jul 26, 2016

This is mainly an issue with the baseline detection changing in Ticket 69 branch. If the bounds set on minBaseline and maxBaseline are too big, the fitting algorithm fails. I need to work out a way to allow fitting to work even for large bounds since the drift can be significant.

To reproduce it, try building a histogram with an x range of more than 20 standard deviations or so, so that the tail is full of zeros. Fitting will fail and perr will be orders of magnitude larger than popt.

@abalijepalli
Copy link
Member

It sounds like when the bounds are large, fitting doesn't converge due to some combination of initial guesses being off and other factors. Also, since the baseline moves a lot, you need to set limit to 0, rather than 0.5 or -0.5.

To get around this, could we weight the histogram with the counts in each bin like you suggested, and simply add a small epsilon value to the weights to prevent divide by zero errors?

@shadowk29
Copy link
Collaborator Author

That might work. I'll try it for my next data set and get back to you.

@shadowk29
Copy link
Collaborator Author

Setting sigma=1/np.sqrt(y+1e-10) seems to work well and allows for very large window sizes while still getting accurate fits.

@abalijepalli
Copy link
Member

That's great, we should integrate it into devel-1.0 with a PR.

@shadowk29
Copy link
Collaborator Author

Currently I have it on the ticket69 branch, but the changes there broke a few things when the reorg happened. Let me fix things and I'll sumbit a PR there, and that branch should be ready to merge into devel-1.0 after that.

@shadowk29
Copy link
Collaborator Author

Covered by pull request #83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants