Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating analysis subtypes and switching entity types #241

Closed
bradfordcondon opened this issue Jan 2, 2018 · 9 comments
Closed

Migrating analysis subtypes and switching entity types #241

bradfordcondon opened this issue Jan 2, 2018 · 9 comments
Labels
Community - Discussion Any issue focused on discussion from the community. It does not apply to enhancements. Community - Enhancement Suggestions for improvements or enhancements to Tripal.

Comments

@bradfordcondon
Copy link
Member

It would be great if analyses could migrate like features migrate. The features table has a type column, so the migration just uses the cvterm in the type column as the Cvterm for the entity to migrate to.
Unfortunately analysis has no such column. Instead analysis type is set by the analysis_type prop, and should be equal to the module responsible for the type. I don't know how consistent this is: for example I made a PR to analysis unigene because it wasn't setting this prop. Since analysis_type points to plain text equal ot the module name instead of a cvterm, this approach won't work for analysis anyway: instead we'll need a method that checks assigns a cvterm based on the module name.

Here are the problems as I see them:

  • Modules can't hook into migrations so core migration would have to know about every module's possible content type.
  • Even if i could migrate my analyses, we've manually moved text that was attached to the node that didn't migrate properly. Remigrating would undo all that work so I'd like to instead remap the analyses to the correct type.
  • Entity types cant be switched, so I can't re-assign my analyses that should be be mapped to Transcriptome/Genome for analysis_unigene for example

My solution

What I plan on doing is creating a hook_update_N that will move entities from one chado_bio_data table to another.

If I'm missing something I'd love to get feedback. If this sounds like a problem most sites would have, fixing the migration would help (but not help me because of the manual content).

@bradfordcondon
Copy link
Member Author

I made a dinky module that convert tripal 3 analyses that are unigene to new entity types: Transcriptome and Genome.

https://github.com/bradfordcondon/tripal_manage_analyses

There are other analyses that I need to convert so ill probably wrap them in here as well.

I don't know about the utility of converting entities in general: in theory migration is fine (just in this case, we had manually modified stuff. plus, the analysis migration is hard-coded right now). If its something that would be useful, I can turn the entity transformation module into a managable utility where you pick a source and destintation bundle and a prop qualifier.

@spficklin
Copy link
Member

spficklin commented Jan 8, 2018

@chunhuaicheng and I have spoken about creating some API functions to allow an extension module to migrate their own content. We had to make a decision about migrating analyses that was generic because we just don't know how someone may want to make their analyses available, and for sites that are new that don't have analyses we don't know what kind they will have so, for consistency Tripal migrates and initiates with just the single Analysis type.

I thought that the Unigene module did set an analysis type. I'll have to look into that....

@spficklin
Copy link
Member

Oh, I don't see a problem with having a module that converts a content type or perhaps splits one later after it's already been created. Someone created a very similar thing for Drupal node types.

@spficklin spficklin added Community - Discussion Any issue focused on discussion from the community. It does not apply to enhancements. Community - Enhancement Suggestions for improvements or enhancements to Tripal. labels Jan 8, 2018
@bradfordcondon
Copy link
Member Author

Proposed solution agreed with @spficklin that migration, in the case of analysis,

  • Look for the property that analysis uses to assign that analysis subtype
  • Migrate to a term that describes that analysis (Bradford will provide the terms)

@chunhuaicheng
Copy link
Member

An API function 'tripal_chado_migrate_tripal_content_type($type)' has been created. It takes one argument that specify the term to be used (see example below) and will create a bundle for the term if it does not already exist. The function will then try to publish all content for the newly created content type.

Example type array:
$type = array(
'vocabulary' => 'OBI',
'accession' => '0100026',
'term_name' => 'organism',
'storage_args' => array (
'data_table' => $table
)
)

@bradfordcondon
Copy link
Member Author

bradfordcondon commented Jan 19, 2018

Here are my term suggestions as promised.

I know the practice is to have the bundle labels == cvterm, but in this case it might be best to use something else. Largely because the term I found for interpro is so ridiculous....

I know I keep asking this, but are we sure there's no way to build a compound cvterm? analysis + (term for the database) would do nicely.

Another alternative is to propose new CVterms for the ontology of our choice. Maybe EDAM: Operation -> Analaysis -> Annotation analysis -> GO/KEGG/BLAST/INTERPROSCAN. If we did this we would discuss if we prefered to have the analysis refer to the METHOD or the DATABASE. For example interproscan vs SWISSPROT, or BLAST vs NCBI-NR. As it is, we do a mix (blast/interpro as methods vs go/kegg as databases. well, vocabularies, but based on database with no assertion about method.)

Analysis Unigene

I split my unigenes into two terms: Transcriptome assembly and Genome assembly (differentiated by the associated feature cvterm: mrna and mrna_contig).

For Tripal 3 core, we can migrate analysis unigenes to Sequence Assembly, which nicely couches both transcriptomes and genomes. I am very confident in this term. In fact the "Unigene" data type is a misnomer, because Unigene is a specific pipeline/data organizational structure within GenBank. Sequence Assembly is much better.

Analysis blast

BLAST evidence

A type of pairwise sequence alignment evidence obtained with basic local alignment search tool (BLAST). [ ECO:MCC ]

I am 70% confident in this term. It's shortcoming: it does not quite state that its an analysis to generate the evidence. But I cannot find a better term in OLS.

Analysis interpro

match to InterPro member signature evidence

A type of match to sequence model evidence resulting from a positive match of a protein, or set of proteins to a predictive model (signature) in the InterPro database. [ PMC:2686546 url:http://www.ncbi.nlm.nih.gov/mesh?term=Nucleic+Acid+Hybridization ]

Ridiculously verbose term, but the equivalent of BLAST evidence above.

Analysis GO

Can't find a good one.

Analysis KEGG

Can't find a good one.

@bradfordcondon
Copy link
Member Author

bradfordcondon commented Jan 19, 2018

@chunhuaicheng the addition of tripal_chado_migrate_tripal_content_type only partially resolves this issue, because core still migrates analysis subtypes to analysis. You will end up with duplicate entities: the ones migrated by core to analysis and hte ones migrated via the api call.

Myalchemist module will handle converting entities from one type to another, but if the migration gets rerun, it will still result in duplicates (see #261).

Your API call will work very well for content types that core does nothing with (all custom content in chado for example).

@spficklin
Copy link
Member

I'm reopening to continue the conversation...

@spficklin spficklin reopened this Jan 19, 2018
@bradfordcondon
Copy link
Member Author

I think in #261 we established the duplicate entity issue is annoying but not a deal breaker.

So we can mark it as low priority for core to migrate analysis to the right sub-analysis because Alchemist can handle that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community - Discussion Any issue focused on discussion from the community. It does not apply to enhancements. Community - Enhancement Suggestions for improvements or enhancements to Tripal.
Projects
None yet
Development

No branches or pull requests

3 participants