-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scaling... again #116
Comments
Hi Lorenzo, The final paper results should have been produced on main I believe so that probably means that yes that line was left commented (I need to check my local version tomorrow to confirm if that it is also commented on my local version) The with scaling versions seem to have fully converged and seem like they require more iterations. I would be interested to see what happens with more but it is still not very clear to me why rescaling has worse performance |
I'm still not sure what's going on. The right one looks like it will converge at some point, but it's odd that it's not as fast as the other one. It is possible that AL is prone to picking points that are at the outskirts of the output manifold and so the variance of the scaler becomes very big and it's not offset enough by the training. It would be helpful to visualise that (i.e. distribution of flux of acquired points). The original scaler instead contains info about points that are less biased, and so it gives better generalisation MSE over a random test set. At the same time I remember that rescaling everything with the new scaler after each acquisition gave much better unscaled MSEs... Just dumping these points here for the moment as I'm sure I'll have forgotten in a week when I'm back. It's also possible that the NN starts from a point that would struggle to generalise when seeing more data. One quick experiment could be to retrain the regressor NN from scratch after every acquisition. In this case using the same seed would be crucial (as we want the same net as before, but trained with extra data). I'm more keen on the first idea though. |
@zcapjdb @thandi1908 Please have a look at my comment before tomorrow if you can, so we can discuss how to proceed for the poster for the AL part. If what I say above makes sense, I think that the AL experiments should be run again with the correct scaler. Hopefully we'll be able to show that we can still be competitive with the full dataset training scenario, with a small amount of labels (now it's 0.4%, but I reckon we'll get to at least ~5% which is not bad, but also not great). We will need more iterations to be competitive with the full dataset training scenario, which will imply that the candidates not selected must all be added back to the unlabelled pool (I don't think it's the case now?) that otherwise gets depleted of potentially interesting points. Judging by my CL figures where I ran AL (with scaler) and random sampling, I think the former should still do better than the latter, so at least we should have that. |
Hey @zcapjdb @thandi1908 , just a quick update to move forward. This has nearly driven me insane, but the y vs yhat plots seem to reveal that the case of no scaling of the regressor lead to predicted negative fluxes: You can reproduce this on branch The MSE depends only on sigma, as I showed in my previous post, which might make it seem that having a fixed scaler is a good thing. This is not true. This is because the ys depend on mu as well. When you compute y = sigma*z+mu with the unchanged regressor scaler you'll get lower (and potentially negative) ys, because you z was scaled during training with much larger parameters (because mu increases, see my previous post). Note that this happens for both y and yhat - so mu cancels out in the MSE, which therefore looks nicer. So, effectively, we are not looking at the fluxes we are actually trying to fit if the regressor scaler is not updated. The better MSEs are a complete artifact of the way the scaling is currently done, and should not be used in the paper. Also, I have tried again to test this discussion: I confirm that the performance is the same in terms of unscaled loss for the two cases: 1. the case where we do the scaling and scale the regressor as well (teal line), 2. the case where the part where we unscale and rescale everything is commented out and the scaler is kept fixed (magenta line), In summary:
I have been busy figuring this out in the last couple of days so haven't done any work on the CL. I'll try to plough through it today and early next week (my parents are visiting this weekend). In particular, I'll use the "scaling regressor" (i.e. uncomment thisline). If you can please update all the plots, tables and results pertaining to AL in the paper it would really be great. You can use the "scaling regressor" option and check whether my suggestion in point 3. above works, if you've got the time. I appreciate this is quite stressful and annoying, but unfortunately it's necessary. Let me know if you need any help! Also could you please acknowledge you've seen this, otherwise I won't know whether you're on it or not. |
Hi Lorenzo, I am still mid transit (ended up missing my connecting flight from NYC yesterday). I get into Baltimore in the afternoon, so it’ll probably be tomorrow when I am able to try running your new changes |
Hi @zcapjdb @thandi1908
I was trying to reproduce your results because some CL results seemed odd, but I'm getting a strange thing with the scaling (again!).
Looking at main (which I assume is where the paper results are coming from), it seems that the line where the regressor's scaler is updated is commented.
Un-commenting that line has quite drastic results on the results of the paper,
The left plot is similar to what is in the paper, but it's obtained with the scaler bit commented, which I thought we had settled to keep uncommented (this is how I have been doing the CL all this time) - but maybe I don't recall correctly? Could you please confirm this when you can?
The text was updated successfully, but these errors were encountered: