Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev seeders #57

Merged
merged 9 commits into from
Jul 31, 2018
Merged

Dev seeders #57

merged 9 commits into from
Jul 31, 2018

Conversation

bradfordcondon
Copy link
Member

@bradfordcondon bradfordcondon commented Jul 24, 2018

  • This PR creates a default seeder with all executions NULL'd out, so it doesnt run by default.
  • The seeder will import all files provided in the standard tripal dev seed dataset.
  • Users should be able to modify the global protected variables and the constructor to easil ycustomize the seeder to their own dataset.
  • Users can also modify the advanced form options if they would like.
  • we use REMOTE FILEs in all cases. This keeps test suite lightweight, and doesnt introduce devseed as a dependency.

implements #56

@bradfordcondon
Copy link
Member Author

So an important thing to consider:

We load our files the hWG way. However, this requires the proteins to have a regexp that links them to the parent.

If that isnt possible, we can still do everything, BUT we have to a) load the protein FASTA without attemping to link, and rely on the GFF for that, and b) load the interproscan annotations differently. We can supposedly use the CDS files themselves to RUN interproscan: im thinking we should consider that. Alternatively, we just load them associated with the peptides, no harm done.

@bradfordcondon
Copy link
Member Author

I just pushed a commit, all seeders are working for the remote files except interproscan (still waiting on annotations :P)

That said....

  • A new analysis is created for the sequence and expression dataset each time. This is no good,. Also, its created regardless of if someone loads them. I propose instead the user sets a "expression analysis name" static variable at the top of the class, and if its provided, the analysis is created.
  • similarly, i have second thoguhts about commenting out the actual data loaders. Maybe instead we comment out the file locations, and do a check to only run the importer if the class's file location is set.

So,

// uncomment to load.  
// Make sure you also uncomment the sequence analysis variable or the loader will fail.
//  protected $landmark_file = ['file_remote' => 'https://raw.githubusercontent.com/statonlab/tripal_dev_seed/master/Fexcel_mini/sequences/empty_landmarks.fasta'];

@bradfordcondon
Copy link
Member Author

interproscan seeder confirmed wiht older XML data. we should be good to go.

@bradfordcondon bradfordcondon mentioned this pull request Jul 31, 2018
@bradfordcondon bradfordcondon merged commit 26bd0a7 into master Jul 31, 2018
@bradfordcondon bradfordcondon deleted the dev_seeders branch August 29, 2018 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant