Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 launch: are you having problems? #113

Closed
theosanderson opened this issue Apr 22, 2022 · 17 comments
Closed

V2 launch: are you having problems? #113

theosanderson opened this issue Apr 22, 2022 · 17 comments

Comments

@theosanderson
Copy link
Owner

I've just launched Taxonium V2 which is a rebuilt version of Taxonium from the ground up to support a server-side backend. (The backend is not running on the live site, but will be in a couple of days).

While I have attempted to make things quite backwards compatible it may have broken some people's workflows. Do say if so and we can try to resolve things.

The old version of Taxonium is still available at https://cov2tree-git-v1-theosanderson.vercel.app/ for now.

@AngieHinrichs
Copy link

Thanks for all your work on this Theo! I'm trying to load a large tree (9M sequences) from a local file that I have been able to view with v1, but Chrome stops the page with "Aw, Snap! Something went wrong while displaying this webpage. Error code: 5". That happened pretty quickly the first time I tried. Then I brought up the developer tools console in case there might be any informative error messages in the console, but it just seemed to take forever (like over an hour) to read the file, and when I closed dev tools, it got as far as 'sorting on y' or something like that before the "Aw, Snap!". I will try on FireFox in case the problem is that I already have a zillion tabs open in Chrome. :)

@theosanderson
Copy link
Owner Author

Thanks for the report Angie, and sorry you are having issues. I can confirm that I can reproduce higher memory usage. I think that V2 is always (unfortunately) going to have slightly higher mem usage, but I hope we can make it possible to load these big trees. I think it will require changing the Taxonium input format to a new format that is more suited to this new approach so that less conversions are needed on start up (which will also speed things up). More on that very soon.

@theosanderson
Copy link
Owner Author

Separately, and related to issues I was discussing with @kvargha (thanks for getting in touch Koorous), I realise I should be more explicit about some of the other changes. The way URL parameters are used to define searches and colorBy has changed. Where before it was ?search=, it is now srch= and then the object encoded is different. And similarly colourBy has changed to color, with a different format. For searches, each object should have a unique key (it can just be an arbitrary value). I'd encourage people to manually set the searches as they want, and then inspect the URL, decoding with e.g. https://meyerweb.com/eric/tools/dencoder/ and adapt as they want. There is something in place to try to map search onto srch, but it is imperfect. I apologise for the work that this requires.

@theosanderson
Copy link
Owner Author

Hi @AngieHinrichs,

If you wouldn't mind - could you (next week / whenever) try the following, adapted for your big tree use case? One needs to use my GenBank file, as this script can't handle orf1AB as a single entity.

wget https://hgwdev.gi.ucsc.edu/~angie/UShER_SARS-CoV-2/public-latest.metadata.tsv.gz
wget https://raw.githubusercontent.com/theosanderson/taxonium/master/taxoniumtools/test_data/hu1.gb
wget https://hgwdev.gi.ucsc.edu/~angie/UShER_SARS-CoV-2/public-latest.all.masked.pb.gz

pip install taxoniumtools
usher_to_taxonium --input public-latest.all.masked.pb.gz --output ./public.jsonl.gz --metadata public-latest.metadata.tsv.gz --genbank hu1.gb --columns genbank_accession,country,date,pangolin_lineage

That jsonl.gz should go into Taxonium.org and I'd be keen to know if it works any better for you.

Sorry for the hassle

@theosanderson
Copy link
Owner Author

@kvargha, my thinking is evolving a bit and I think what I will do is to redirect old parameter sets to the V1 URL I linked above. I will then drop all support (except for that redirect) for the old protobuf format from V2. Hopefully that redirect should be in tomorrow and should rescue your links, sorry again for confusion.

@AngieHinrichs
Copy link

Thanks Theo, that worked much better! Really nice to have the Revertant search too.

The only thing I'm missing from the V1 / matUtils-generated taxonium protobufs is the nucleotide mutations. Alex added that feature more recently and it's been extremely helpful for inspecting the UShER tree. Any chance you could add those to taxoniumtools? :)

@theosanderson
Copy link
Owner Author

Really glad that works!

Nucleotides should now be in!

And obviously apologies for breaking the wonderful matUtils integration. Taxoniumtools is a lot faster than the Python scripts I used to have, but clearly it's nothing like C++. If there's any desire to get that functionality working again in the future I would of course do what I could to support that. I think the format is settling down now, the only thing that I know will change a little is the first line of the JSON file (which is sort of metadata about the file as a whole -- it will have more added to it).

@jen-martin
Copy link

One feature that we found especially useful (with Cluster Tracker) was labels for the metadata in the Node Info box (e.g., Date, Country, etc). We used the feature that allows adding additional metadata fields to the protobuf, so adding metadata labels would be a nice feature to add back into the new version.

Also, I noticed before you implemented the redirect to V1, that the Node Info display wasn't handling the additional metadata fields with spaces very well. It would show the first word in the string (e.g., it would display "New" when the full metadata field was "New York"). This may have been a function of trying to be backwards compatible with protobuf formatted data, but I thought I'd flag it for your attention just in case. We haven't built the new jsonl file format yet, but I'll let you know if we still see that issue with the new data format.

Thanks for all your work on this! I'm really loving the ability to pan the map left/right! And thank you for putting in the redirect to V1.

@theosanderson
Copy link
Owner Author

Thanks @jen-martin, that's very useful feedback. I should also flag that there is a known issue with zoomToSearch in V2, where it doesn't always zoom to the right location in X, which may prevent ClusterTracker from upgrading for now

@AngieHinrichs
Copy link

Wow that was fast, thanks so much for adding nucleotides!

@theosanderson
Copy link
Owner Author

(that zoomToSearch issue is now resolved)

@theosanderson
Copy link
Owner Author

Thanks for flagging that the field-names were useful @jen-martin - those are now back

@theosanderson
Copy link
Owner Author

theosanderson commented May 5, 2022

Other new features that may be of interest. Taxoniumtools --title "My tree" provides a title for your tree, and --overlay_html myfile.html lets you provide a file containing something like to go in the About/Acknowledgements page.

<p>This tree displays lorem ipsum dolor sic <a href="http://google.com" style="color:blue; text-decoration:underline">amet</a>.</p>

@AngieHinrichs
Copy link

AngieHinrichs commented May 5, 2022

For child branches, horizontal lines connect to the middle of the vertical lines to their right. Did they previously connect to the top of the vertical line? I find that style easier to work with -- when connecting to the vertical center, there are often more vertical lines stacked to the left than there would be when connecting to the top. I think connecting to the top would also make it easier to see where one branch ends and the next one starts when their samples run together.

Also, when I hover over branch lines to see mutations, the mutations shown are often not what I would expect. This would take quite a few words to explain... should I send a video? It would probably be pretty quick to demonstrate in a zoom call with screenshare.

@theosanderson
Copy link
Owner Author

Hi @AngieHinrichs ,

Massive apologies -- I never saw this message. I probably lost it in my very noisy git history.

So I don't think they ever connected to the top on my version - but looking now I see that they do on your matUtils exports (this is a matter that is decided by the script that lays out the tree during the processing step). I'll have a think about how best to do this.

Has anything improved about mutations in the intervening time - there may have been a few bugfixes?

Separately, new features should at least make it clearer which branch you're actually hovering over, and clicking the node details on the right now has a "jump to parent node" button which is also helpful for investigating these things.

@AngieHinrichs
Copy link

Ah, makes sense about the matUtils. The hovering lines and node circles help a lot! Cool, I hadn't noticed the jump to parent node button. :) Thanks as always for all of the improvements, taxonium is invaluable for finding problems with the tree! (Like BA.1.1 is split at the moment, yikes...)

@theosanderson
Copy link
Owner Author

Another feature of which you might not necessarily be aware is the --name_internal_nodes flag (https://taxonium.readthedocs.io/en/latest/taxoniumtools.html#usher-to-taxonium). [Sorry if I already flagged, I forget!]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants