-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating analysis subtypes and switching entity types #241
Comments
I made a dinky module that convert tripal 3 analyses that are unigene to new entity types: Transcriptome and Genome. https://github.com/bradfordcondon/tripal_manage_analyses There are other analyses that I need to convert so ill probably wrap them in here as well. I don't know about the utility of converting entities in general: in theory migration is fine (just in this case, we had manually modified stuff. plus, the analysis migration is hard-coded right now). If its something that would be useful, I can turn the entity transformation module into a managable utility where you pick a source and destintation bundle and a prop qualifier. |
@chunhuaicheng and I have spoken about creating some API functions to allow an extension module to migrate their own content. We had to make a decision about migrating analyses that was generic because we just don't know how someone may want to make their analyses available, and for sites that are new that don't have analyses we don't know what kind they will have so, for consistency Tripal migrates and initiates with just the single Analysis type. I thought that the Unigene module did set an analysis type. I'll have to look into that.... |
Oh, I don't see a problem with having a module that converts a content type or perhaps splits one later after it's already been created. Someone created a very similar thing for Drupal node types. |
Proposed solution agreed with @spficklin that migration, in the case of analysis,
|
An API function 'tripal_chado_migrate_tripal_content_type($type)' has been created. It takes one argument that specify the term to be used (see example below) and will create a bundle for the term if it does not already exist. The function will then try to publish all content for the newly created content type. Example type array: |
Here are my term suggestions as promised. I know the practice is to have the bundle labels == cvterm, but in this case it might be best to use something else. Largely because the term I found for interpro is so ridiculous.... I know I keep asking this, but are we sure there's no way to build a compound cvterm? analysis + (term for the database) would do nicely. Another alternative is to propose new CVterms for the ontology of our choice. Maybe EDAM: Operation -> Analaysis -> Annotation analysis -> GO/KEGG/BLAST/INTERPROSCAN. If we did this we would discuss if we prefered to have the analysis refer to the METHOD or the DATABASE. For example interproscan vs SWISSPROT, or BLAST vs NCBI-NR. As it is, we do a mix (blast/interpro as methods vs go/kegg as databases. well, vocabularies, but based on database with no assertion about method.) Analysis UnigeneI split my unigenes into two terms: Transcriptome assembly and Genome assembly (differentiated by the associated feature cvterm: mrna and mrna_contig). For Tripal 3 core, we can migrate analysis unigenes to Sequence Assembly, which nicely couches both transcriptomes and genomes. I am very confident in this term. In fact the "Unigene" data type is a misnomer, because Unigene is a specific pipeline/data organizational structure within GenBank. Sequence Assembly is much better. Analysis blast
I am 70% confident in this term. It's shortcoming: it does not quite state that its an analysis to generate the evidence. But I cannot find a better term in OLS. Analysis interpromatch to InterPro member signature evidence
Ridiculously verbose term, but the equivalent of BLAST evidence above. Analysis GOCan't find a good one. Analysis KEGGCan't find a good one. |
@chunhuaicheng the addition of Myalchemist module will handle converting entities from one type to another, but if the migration gets rerun, it will still result in duplicates (see #261). Your API call will work very well for content types that core does nothing with (all custom content in chado for example). |
I'm reopening to continue the conversation... |
I think in #261 we established the duplicate entity issue is annoying but not a deal breaker. So we can mark it as low priority for core to migrate analysis to the right sub-analysis because Alchemist can handle that. |
It would be great if analyses could migrate like features migrate. The features table has a type column, so the migration just uses the cvterm in the type column as the Cvterm for the entity to migrate to.
Unfortunately analysis has no such column. Instead analysis type is set by the
analysis_type
prop, and should be equal to the module responsible for the type. I don't know how consistent this is: for example I made a PR to analysis unigene because it wasn't setting this prop. Sinceanalysis_type
points to plain text equal ot the module name instead of a cvterm, this approach won't work for analysis anyway: instead we'll need a method that checks assigns a cvterm based on the module name.Here are the problems as I see them:
My solution
What I plan on doing is creating a
hook_update_N
that will move entities from one chado_bio_data table to another.If I'm missing something I'd love to get feedback. If this sounds like a problem most sites would have, fixing the migration would help (but not help me because of the manual content).
The text was updated successfully, but these errors were encountered: