Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated internal variable names to match that of datapackage #860

Closed
wants to merge 2 commits into from

Conversation

henrykironde
Copy link
Contributor

Updated internal variable names to match that of datapackage spec #765
The following changes were done for the variable names

tags -> keywords
nulls -> missingValues
name -> title
shortname -> name

The changes were done in the following files -

retriever/lib/compile.py
retriever/lib/datapackage.py
retriever/lib/engine.py
retriever/lib/parse_script_to_json.py
retriever/lib/templates.py
retriever/lib/tools.py
scripts/bioclim.py
scripts/biomass_allometry_db.py
scripts/breed_bird_survey.py
scripts/breed_bird_survey_50stop.py
scripts/forest_inventory_analysis.py
scripts/gentry_forest_transects.py
scripts/npn.py
scripts/plant_life_hist_eu.py
scripts/prism_climate.py
scripts/vertnet.py
scripts/wood_density.py
scripts/*.json(almost all datapackages) transition missingValues -> missing_values
test/test_retriever.py
retriever/__main__.py

@henrykironde
Copy link
Contributor Author

@jainamritanshu please work with this branch. reference to #822

@jainamritanshu
Copy link
Contributor

@henrykironde Sure!

@jainamritanshu
Copy link
Contributor

@henrykironde @ethanwhite I have been putting some thought about how to tackle the version breaking of version < 2.0.0 for missingValues. I came up with this -

if parse_version(VERSION) < parse_version("2.1.dev"):
    engine.auto_create_table(Table("weather", pk="RouteDataId",
                                   cleanup=Cleanup(correct_invalid_value, missingValues=['NULL'])),
                             filename="weather_new.csv")
else:
    engine.auto_create_table(Table("weather", pk="RouteDataId",
                                   cleanup=Cleanup(correct_invalid_value, nulls=['NULL'])),
                             filename="weather_new.csv")

But I guess this would look somewhat redundant and wouldn't look clean. I can not think of something else, I would make a PR with these changes, unless you have something more clean in your mind.

@ethanwhite
Copy link
Member

@jainamritanshu How about setting that value in the same chunk of code you're already using for the alternative values. So, e.g., in scripts/biomass_allometry_db.py

       if parse_version(VERSION) < parse_version("2.1.dev"):
            self.shortname = self.name
            self.name = self.title
            self.tags = self.keywords
            self.cleanup_func = Cleanup(correct_invalid_value, nulls=['NA'])
        else:
            self.cleanup_func = Cleanup(correct_invalid_value, missingValues=['NA'])
...
        # creating data from baad_data.csv
        engine.auto_create_table(Table("data", cleanup=self.cleanup_func),
                                 filename="baad_data.csv")

@jainamritanshu
Copy link
Contributor

Thanks @ethanwhite for the suggestion. This looks so much better than mine approach I will follow this one, and make the changes right away.

@jainamritanshu
Copy link
Contributor

@henrykironde I can not see why the commit is failing the checks with the following message -
IOError: [Errno socket error] [Errno 111] Connection refused. Can you please help me here?

@henrykironde
Copy link
Contributor Author

@jainamritanshu I will go through this.

@henrykironde
Copy link
Contributor Author

Tests failing: solved in Moving from Ecological Archieves to Figshare #861.

@henrykironde
Copy link
Contributor Author

Could you provide the results of retriever ls I think there is something on my machine

@jainamritanshu
Copy link
Contributor

@henrykironde this is the result I got when I ran the command retriever ls

Available datasets : 63

abalone-age                                                     
amniote-life-hist                                               
antarctic-breed-bird                                            
Aquatic Animal Excretion                                        
BAAD: a Biomass And Allometry Database for woody plants         
Bioclim 2.5 Minute Climate Data                                 
bird-size                                                       
breast-cancer-wi                                                
butterfly-population-network                                    
car-eval                                                        
community-abundance-misc                                        
elton-traits                                                    
fish-parasite-hosts                                             
forest-biomass-china                                            
forest-fires-portugal                                           
forest-inventory-analysis                                       
forest-plots-michigan                                           
forest-plots-wghats                                             
fray-jorge-ecology                                              
gentry-forest-transects                                         
Global wood density database - Zanne et al. 2009                
Gulf of Maine intertidal density/cover (Petraitis et al. 2008)  
home-ranges                                                     
iris                                                            
leaf-herbivory                                                  
Mammal Super Tree                                               
mammal-community-db                                             
mammal-diet                                                     
mammal-life-hist                                                
mammal-masses                                                   
mammal-metabolic-rate                                           
mapped-plant-quads-co                                           
mapped-plant-quads-id                                           
mapped-plant-quads-ks                                           
mapped-plant-quads-mt                                           
mt-st-helens-veg                                                
nyc-tree-count                                                  
pantheria                                                       
phytoplankton-size                                              
plant-comp-ok                                                   
plant-life-hist-eu                                              
plant-occur-oosting                                             
plant-taxonomy-us                                               
poker-hands                                                     
portal                                                          
portal-dev                                                      
predator-prey-size-marine                                       
predicts                                                        
PRISM Climate Data                                              
Tree growth, mortality, physical condition - Clark, 2006        
tree-demog-wghats                                               
USA National Phenology Network                                  
USGS North American Breeding Bird Survey                        
USGS North American Breeding Bird Survey 50 stop                
veg-plots-sdl                                                   
Vertnet Amphibians                                              
Vertnet Birds                                                   
Vertnet Fishes                                                  
Vertnet Mammals                                                 
Vertnet Reptiles                                                
vertnet:                                                        
wine-composition                                                
wine-quality

@henrykironde
Copy link
Contributor Author

if you debug the script list it shows that the variables are not well initialized for the python scripts

@henrykironde
Copy link
Contributor Author

let me know if you understand the problem that exists currently, else I could try explain it differently.

@henrykironde
Copy link
Contributor Author

parse_version(VERSION) < parse_version():
Okey I think we should consider a different version number parse_version("2.1.dev"):

@henrykironde
Copy link
Contributor Author

may be 2.0.0

@jainamritanshu
Copy link
Contributor

@henrykironde if I am not wrong I guess you are talking about the script variable initialization in __main__.py but I can not totally understand the problem you are referring to.

I will change the version number asap.

@henrykironde
Copy link
Contributor Author

If you checkout master. then run retriever reset all and retriever ls you get

(env35)]> retriever ls
Available datasets : 62

abalone-age                   mammal-metabolic-rate
amniote-life-hist             mammal-super-tree
antarctic-breed-bird          mapped-plant-quads-co
aquatic-animal-excretion      mapped-plant-quads-id
bioclim                       mapped-plant-quads-ks
biomass-allometry-db          mapped-plant-quads-mt
bird-size                     mt-st-helens-veg
breast-cancer-wi              NPN
breed-bird-survey             nyc-tree-count
breed-bird-survey-50stop      pantheria
butterfly-population-network  phytoplankton-size
car-eval                      plant-comp-ok
community-abundance-misc      plant-life-hist-eu
elton-traits                  plant-occur-oosting
fish-parasite-hosts           plant-taxonomy-us
forest-biomass-china          poker-hands
forest-fires-portugal         portal
forest-inventory-analysis     portal-dev
forest-plots-michigan         predator-prey-size-marine
forest-plots-wghats           prism-climate
fray-jorge-ecology            tree-demog-wghats
gentry-forest-transects       veg-plots-sdl
home-ranges                   vertnet
intertidal-abund-me           vertnet-amphibians
iris                          vertnet-birds
la-selva-trees                vertnet-fishes
leaf-herbivory                vertnet-mammals
mammal-community-db           vertnet-reptiles
mammal-diet                   wine-composition
mammal-life-hist              wine-quality
mammal-masses                 wood-density

this only displays the script`s shortsnames

@jainamritanshu
Copy link
Contributor

@henrykironde when I ran retriever ls -l, I got

1. 
Name: Aquatic Animal Excretion

2. 
Name: Gulf of Maine intertidal density/cover (Petraitis et al. 2008)

3. 
Name: Mammal Super Tree

4. 
Name: Tree growth, mortality, physical condition - Clark, 2006
Keywords: ['plants', 'time-series']

5. A database on the life history traits of the Northwest European flora
Name: plant-life-hist-eu
Keywords: ['plants', 'observational']

6. Abalone Age and Size Data
Name: abalone-age

7. Alwyn H. Gentry Forest Transect Dataset
Name: gentry-forest-transects
Keywords: ['plants', 'global-scale', 'observational']

8. Amniote life History database
Name: amniote-life-hist
Keywords: [u'mammals', u'literature-compilation']

9. Antarctic Site Inventory breeding bird survey data, 1994-2013
Name: antarctic-breed-bird
Keywords: [u'birds']

10. BAAD: a Biomass And Allometry Database for woody plants
Name: BAAD: a Biomass And Allometry Database for woody plants
Keywords: ['plants', 'observational']

11. Bioclim 2.5 Minute Climate Data
Name: Bioclim 2.5 Minute Climate Data
Keywords: ['climate']

12. Biomass and Its Allocation in Chinese Forest Ecosystems (Luo, et al., 2014)
Name: forest-biomass-china
Keywords: [u'biomass', u'China', u'climate', u'forest']

13. Biovolumes for freshwater phytoplankton - Colin et al. 2014
Name: phytoplankton-size
Keywords: [u'phytoplankton', u'literature-compilation', u'size']

14. Bird Body Size and Life History (Lislevand et al. 2007)
Name: bird-size
Keywords: [u'birds', u'literature-compilation']

15. Car Evaluation
Name: car-eval
Keywords: [u'categorical', u'multivariate']

16. Database of Vertebrate Home Range Sizes - Tamburello et al., 2015
Name: home-ranges
Keywords: [u'literature-compilation', u'birds', u'mammals', u'reptiles', u'fishes']

17. Fish parasite host ecological characteristics (Strona, et al., 2013)
Name: fish-parasite-hosts

18. Foraging attributes for birds and mammals (Wilman, et al., 2014)
Name: elton-traits
Keywords: [u'mammals', u'birds', u'literature-compilation']

19. Forest fire data for Montesinho natural park in Portugal
Name: forest-fires-portugal

20. Forest Inventory and Analysis
Name: forest-inventory-analysis
Keywords: ['plants', 'continental-scale', 'observational']

21. Fray Jorge community ecology database (Kelt et al. 2013)
Name: fray-jorge-ecology
Keywords: [u'plants', u'local-scale', u'time-series', u'mammals']

22. Global wood density database - Zanne et al. 2009
Name: Global wood density database - Zanne et al. 2009
Keywords: ['Taxon > Plants', 'Spatial Scale > Global', 'Data Type > Observational']

23. Indian Forest Stand Structure and Composition (Ramesh et al. 2010)
Name: forest-plots-wghats
Keywords: [u'plants', u'regional-scale', u'observational']

24. Iris Plants Database
Name: iris
Keywords: [u'plants', u'literature-compilation', u'categorical']

25. Mammal Community DataBase (Thibault et al. 2011)
Name: mammal-community-db
Keywords: [u'mammals', u'global-scale', u'observational', u'literature-compilation']

26. Mammal Life History Database - Ernest, et al., 2003
Name: mammal-life-hist
Keywords: [u'mammals', u'literature-compilation', u'life-history']

27. MammalDIET
Name: mammal-diet
Keywords: [u'mammals', u'literature-compilation']

28. Mapped plant quadrat time-series from Kansas (Adler et al. 2007)
Name: mapped-plant-quads-ks
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

29. Mapped plant quadrat time-series from Montana (Anderson et al. 2011)
Name: mapped-plant-quads-mt

30. Marine Predator and Prey Body Sizes - Barnes et al. 2008
Name: predator-prey-size-marine
Keywords: [u'fish', u'literature-compilation', u'size']

31. Masses of Mammals (Smith et al. 2003)
Name: mammal-masses
Keywords: [u'mammals', u'literature-compilation', u'size']

32. Michigan forest canopy dynamics plots - Woods et al. 2009
Name: forest-plots-michigan
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

33. Miscellaneous Abundance Database (figshare 2012)
Name: community-abundance-misc

34. Mount St. Helens vegetation recovery plots (del Moral 2010)
Name: mt-st-helens-veg
Keywords: [u'plants', u'local-scale', u'time-serie', u'observational']

35. New York City TreesCount
Name: nyc-tree-count
Keywords: [u'trees', u'new-york-city', u'biology', u'observational']

36. Oosting Natural Area (North Carolina) plant occurrence (Palmer et al. 2007)
Name: plant-occur-oosting
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

37. Pantheria (Jones et al. 2009)
Name: pantheria
Keywords: [u'mammals', u'literature-compilation', u'life-history']

38. Percentage leaf herbivory across vascular plant species
Name: leaf-herbivory
Keywords: [u'plants', u'literature-compilation']

39. Phylogeny and metabolic rates in mammals (Ecological Archives 2010)
Name: mammal-metabolic-rate
Keywords: [u'mammals', u'literature-compilation', u'physiology']

40. Poker Hand dataset
Name: poker-hands
Keywords: [u'games', u'poker']

41. Portal Project Data (Ernest et al. 2009)
Name: portal
Keywords: [u'mammals', u'desert', u'time-series', u'experimental', u'observational']

42. PRISM Climate Data
Name: PRISM Climate Data

43. Sagebrush steppe mapped plant quadrats (Zachmann et al. 2010)
Name: mapped-plant-quads-id
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

44. Shortgrass steppe mapped plants quads - Chu et al. 2013
Name: mapped-plant-quads-co
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

45. Sonoran Desert Lab perennials vegetation plots
Name: veg-plots-sdl
Keywords: [u'plants']

46. Spatial Population Data Alpine Butterfly - Matter et al 2014
Name: butterfly-population-network
Keywords: [u'butterflies', u'observational']

47. Tree demography in Western Ghats, India - Pelissier et al. 2011
Name: tree-demog-wghats
Keywords: [u'plants', u'time-series', u'observational']

48. USA National Phenology Network
Name: USA National Phenology Network
Keywords: ['Data Type > Phenology', 'Spatial Scale > Continental']

49. USDA plant list - taxonomy for US plant species
Name: plant-taxonomy-us
Keywords: [u'plants', u'taxonomy']

50. USGS North American Breeding Bird Survey
Name: USGS North American Breeding Bird Survey
Keywords: ['birds', 'continental-scale']

51. USGS North American Breeding Bird Survey 50 stop
Name: USGS North American Breeding Bird Survey 50 stop
Keywords: ['birds', 'continental-scale']

52. Vascular plant composition - McGlinn, et al., 2010
Name: plant-comp-ok
Keywords: [u'plants', u'local-scale', u'time-series', u'observational']

53. Vertnet Amphibians
Name: Vertnet Amphibians
Keywords: ['amphibians']

54. Vertnet Birds
Name: Vertnet Birds
Keywords: ['birds']

55. Vertnet Fishes
Name: Vertnet Fishes
Keywords: ['fishes']

56. Vertnet Mammals
Name: Vertnet Mammals
Keywords: ['mammals']

57. Vertnet Reptiles
Name: Vertnet Reptiles
Keywords: ['reptiles']

58. vertnet:
Name: vertnet:
Keywords: ['Taxon > animals']

59. Wine Composition
Name: wine-composition
Keywords: [u'wine', u'alcohol']

60. Wine Quality
Name: wine-quality
Keywords: [u'wine', u'alcohol']

61. Wisconsin Breast Cancer Database
Name: breast-cancer-wi
Keywords: [u'cancer', u'health', u'disease', u'medicine']

As far as I could understand the arguments for the cli, retriever ls just displays the names of the scripts and the additional argument -l will display the name, title and keywords as well. Sorry if I still couldn't understand the problem or maybe I am missing something.

@henrykironde
Copy link
Contributor Author

henrykironde commented Apr 12, 2017

@jainamritanshu waiting for updates on the version number

@jainamritanshu
Copy link
Contributor

@henrykironde I am getting an error : No such file or directory while committing. Can you help me here?

@henrykironde
Copy link
Contributor Author

@jainamritanshu run rm .git/hooks/pre-commit

@henrykironde
Copy link
Contributor Author

@henrykironde this commit had been giving the previous errors about urlretrieve, I fixed some scripts. It is still giving a couple of errors, can you have a look on it?
HI @jainamritanshu, I have just merged your branch, I know its failing but that is because our branch was not up to date with master.
So I am suggesting that you update the branch, or better if you create a new PR afresh with these changes on a branch that is uptodate with master. The whole changes are fine.

@henrykironde
Copy link
Contributor Author

Let me know what you want to do I could help you,

…ecology#765

    The following changes were done for the variable names

    tags -> keywords
    nulls -> missingValues
    name -> title
    shortname -> name

    The changes were done in the following files -

    retriever/lib/compile.py
    retriever/lib/datapackage.py
    retriever/lib/engine.py
    retriever/lib/parse_script_to_json.py
    retriever/lib/templates.py
    retriever/lib/tools.py
    scripts/bioclim.py
    scripts/biomass_allometry_db.py
    scripts/breed_bird_survey.py
    scripts/breed_bird_survey_50stop.py
    scripts/forest_inventory_analysis.py
    scripts/gentry_forest_transects.py
    scripts/npn.py
    scripts/plant_life_hist_eu.py
    scripts/prism_climate.py
    scripts/vertnet.py
    scripts/wood_density.py
    scripts/*.json(almost all datapackages) transition missingValues -> missing_values
    test/test_retriever.py
    retriever/__main__.py
@henrykironde
Copy link
Contributor Author

@jainamritanshu I have tried to update this locally and there are many changes, So I advice you to create a new branch and transfer changes to the branch. The main problem comes from the recent changes in the scripts.

you can make two commits, that one for the general changes and the other is for the changes in scripts

@jainamritanshu
Copy link
Contributor

Sure @henrykironde. Working on it.

ethanwhite pushed a commit that referenced this pull request May 4, 2017
* Updating Internal Variables in scripts

* Updating retriever general environment to handle the updated internal variables as per the datapackages/scripts in /retriever/lib/

* Cleaning up Updated internal variable names PR

* fixing inconsistency of Cleanup function and missingValues in python scripts
@ethanwhite
Copy link
Member

This was replaced by #897 which has now been merged.

@ethanwhite ethanwhite closed this May 16, 2017
@henrykironde henrykironde deleted the new822 branch September 5, 2017 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants