Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream issue: Individual IDs are not unique for each eventID #6

Open
bw4sz opened this issue Apr 20, 2020 · 3 comments
Open

Upstream issue: Individual IDs are not unique for each eventID #6

bw4sz opened this issue Apr 20, 2020 · 3 comments

Comments

@bw4sz
Copy link

bw4sz commented Apr 20, 2020

We should either warn users, or probably provide a merging solution and not let things trickle into downstream analysis. I do not know why the following is not true, but I believe it is upstream of this package

  BART_data <- retrieve_VST_data(site = "BART")
  
  # Verify the same individual ID in the same year doesn't have more than one height
  multiple_heights<-BART_data[[3]] %>% group_by(individualID,eventID)  %>% summarize(n=length(unique(height))) %>% filter(n>1)

> expect_equal(nrow(multiple_heights), 0)
Error: nrow(multiple_heights) not equal to 0.
1/1 mismatches
[1] 104 - 0 == 104

Debugging, this is coming from NeonUtilities

  vst <- neonUtilities::loadByProduct("DP1.10098.001", check.size=F,
                                      site=site, start, enddate)

> multiple_heights<-vst[[3]] %>% group_by(individualID,eventID)  %>% summarize(n=length(unique(height))) %>% filter(n>1)
> 
> head(multiple_heights)
# A tibble: 6 x 3
# Groups:   individualID [6]
  individualID             eventID           n
  <fct>                    <fct>         <int>
1 NEON.PLA.D01.BART.00094  vst_BART_2016     2
2 NEON.PLA.D01.BART.00105  vst_BART_2018     2
3 NEON.PLA.D01.BART.00111  vst_BART_2015     2
4 NEON.PLA.D01.BART.00210  vst_BART_2015     2
5 NEON.PLA.D01.BART.00226A vst_BART_2016     2
6 NEON.PLA.D01.BART.00306  vst_BART_2015     2
> dim(multiple_heights)
[1] 104   3
> vst[[3]] %>% filter(individualID=="NEON.PLA.D01.BART.00105")
                                   uid         namedLocation       date       eventID domainID siteID   plotID subplotID
1 1d8ae27f-ea73-4c70-bc30-023552748106 BART_047.basePlot.vst 2015-09-03 vst_BART_2015      D01   BART BART_047        NA
2 71f8056f-3aea-4037-a74e-9a8ccf92c56b BART_047.basePlot.vst 2016-08-31 vst_BART_2016      D01   BART BART_047        NA
3 a74a9509-1e3e-46a7-866f-a80e2608bd3c BART_047.basePlot.vst 2017-09-12 vst_BART_2017      D01   BART BART_047        NA
4 aadaeaf0-be3e-4ab3-a3a3-ffe68d304fae BART_047.basePlot.vst 2018-08-20 vst_BART_2018      D01   BART BART_047        NA
5 0964ee3a-e558-4cb1-8689-303773d58514 BART_047.basePlot.vst 2018-08-20 vst_BART_2018      D01   BART BART_047        NA
             individualID tempShrubStemID tagStatus       growthForm plantStatus stemDiameter measurementHeight height
1 NEON.PLA.D01.BART.00105              NA      <NA> single bole tree        Live         27.1               130   16.8
2 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         27.0               130   16.5
3 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         26.8               130   16.4
4 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         27.0               130   15.7
5 NEON.PLA.D01.BART.00105              NA        ok single bole tree        Live         28.1               130   16.6
  baseCrownHeight breakHeight breakDiameter maxCrownDiameter ninetyCrownDiameter canopyPosition shape basalStemDiameter
1              NA          NA            NA               NA                  NA           <NA>                      NA
2              NA          NA            NA               NA                  NA           <NA>                      NA
3              NA          NA            NA               NA                  NA           <NA>                      NA
4              NA          NA            NA               NA                  NA           <NA>                      NA
5              NA          NA            NA               NA                  NA           <NA>                      NA
  basalStemDiameterMsrmntHeight maxBaseCrownDiameter ninetyBaseCrownDiameter remarks                  recordedBy
1                            NA                   NA                      NA               ccahill@field-ops.org
2                            NA                   NA                      NA                  mday@field-ops.org
3                            NA                   NA                      NA              jbreault@field-ops.org
4                            NA                   NA                      NA         jlerner@battelleecology.org
5                            NA                   NA                      NA         jlerner@battelleecology.org
                  measuredBy     dataQF
1      dcrandall@neoninc.org legacyData
2    ramundson@field-ops.org legacyData
3 llukas@battelleecology.org legacyData
4 llukas@battelleecology.org       <NA>
5 llukas@battelleecology.org       <NA>

2018 has two different heights.

@bw4sz
Copy link
Author

bw4sz commented Apr 20, 2020

I wrote NEON about this.

@vscholl
Copy link
Owner

vscholl commented Apr 23, 2020

Good catch, and thanks for contacting NEON about this. Looks like there are indeed duplicates.

One thought I had was that multi-bole trees have multiple entries in the woody vegetation structure data set and that this could potentially be causing an issue, but the individual IDs have a letter on the end to indicate this: NEON.PLA.D01.BART.00226A, NEON.PLA.D01.BART.00226B, etc. Just something to keep in mind, since individual bole entries don't have all of the measurements as a single-bole tree.

@bw4sz
Copy link
Author

bw4sz commented Apr 24, 2020

I'll cc you, they are super aware, the problems are pretty expansive, in the full dataset i see , ~2000 duplicate locations (take most recent), 9000 duplicate IDs with different heights from the same event ID, hundreds of between year height changes of more than 6m. I'm trying to make a vignette here that i'll clean up.

https://github.com/bw4sz/neonVegWrangleR/blob/master/vignettes/Field_Data.Rmd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants