How to tackle uncertainty due to range of year_of_introduction variable #18

damianooldoni · 2018-03-27T14:41:54Z

No description provided.

timadriaens · 2018-04-12T08:35:18Z

Several options are envisaged to tackle uncertainty associated with time periods or uncertainty around the date of first untroduction of a species:

a method for imputation of missing data cf. the methods described in Onkelinx et al.
to randomly select a year within the range to avoid arbitrary peaks cf. Seebens et al.
to fit functions to the time series to test for different shapes of the temporal trends of first record rates and to reduce deviation between predicted and observed series using an optimization algorithm cf. Seebens et al.

Here, we chose to simply show uncertainty around time periods using the miminum and maximum year with time intervals.

peterdesmet · 2018-04-12T09:06:18Z

Here, we chose to simply show uncertainty around time periods using the mininum and maximum year with time intervals.

How?

damianooldoni · 2018-04-12T09:47:38Z

The point is that the columns year_of_introduction_min and year_of_introduction_max in data output as explained in issue #17 (which is also related to issues #19 and #20) has another meaning than what we thought at first sight.
What we called year_of_introduction_min is actually year_of_introduction, while year_of_introduction_max is a kind of year_of_extinction (@peterdesmet: maybe you can find a better name for this column?)
It is therefore difficult to apply previous methods to calculate uncertainty.

peterdesmet · 2018-04-12T10:18:37Z

Indeed, the dates are: first observed / last observed. Quite often those are the same (i.e. we only have a single date).

Plotting first observed

I played a bit with the Alien plant data (only), and this is what we would get if you just plot first observed:

The line is cumulative and will never drop. As you can see, there are some arbitrary peaks in the data, which I don't consider a problem.

Plotting first/last observed

I think it would be worth taking an approach where for each year, you check for each species if its date range includes that year:

It has the advantage that it makes use of first observed and last observed, so those species that are only recorded for a short time also drop from the timeline. It also shows a steep drop at the end, because of the last assessment year in the checklist, which does reflect the actual data we have.

Note: I couldn't find a smart algorithm to generate this chart: it was created in a spreadsheet.

damianooldoni · 2018-04-12T13:09:48Z

using first_observed we get an answer to the cumulative number of species, see issue #20.

peterdesmet · 2018-04-12T13:13:08Z

It would be cool if we could switch between the two charts, but might be best to shelf that for later.

peterdesmet · 2018-04-12T13:58:07Z

If we want a line chart as proposed in #20, then the error margin could be the line in plot 2. Not sure how to explain this without a visual example, but the trend line = cumul first year of introduction and the error line = cumul first year minus last year. The error area will always be below the chart and could be shown as a shaded area. It expresses the lack of recent assessments.

peterdesmet · 2018-04-16T18:10:43Z

@stijnvanhoey @SanderDevisscher can you make a version of #20 (comment) as described in the comment above? The line you currently have would be the error line

stijnvanhoey · 2018-04-17T14:53:02Z

@peterdesmet as a first test:

The data to create the plot, with n_defined the data created for #25 (number defined by all ranges), n_introduced the number of introduced species in that specific year (cfr. #17) and cum_n_introduced the cumulative number of species introduced:

# A tibble: 6 x 4
   year n_defined n_introduced cum_n_introduced
  <int>     <int>        <dbl>            <dbl>
1  1201         1         1                1
2  1202         1         0                1
3  1203         1         0                1 
4  1204         1         0                1 
5  1205         1         0                1 
6  1206         1         0                1 
...
   2016      1263         42               2537
   2017      1096         40               2577
   2018       741         17               2594

and plotting provides following result:

stijnvanhoey · 2018-04-17T15:14:46Z

Should we include this into the cumulative indicator? Maybe someone can provide more appropriate names for both lines?

As a reference, the figure above is created with this code

introduction_count <- df_cleaned %>% 
        group_by(.data$startDate) %>%
        count() %>%
        ungroup() %>%
        rename(year = startDate,
               n_introduced = n)

df_extended <- df_cleaned %>%
    rowwise() %>%
    do(year = .data$startDate:.data$endDate) %>%
    bind_cols(df_cleaned) %>% 
    unnest(year)

totals <- df_extended %>% 
    group_by(year) %>% 
    count() %>%
    ungroup() %>%
    rename(n_defined = n)

start_year_plot <- 1900
plot_info <- left_join(totals, introduction_count, 
                       by = "year") %>%
    replace_na(list(n_introduced = 0)) %>%
    mutate(cum_n_introduced = cumsum(n_introduced))

maxDate <- max(df_extended$year)
plot <- ggplot(plot_info, aes(x = year)) +
    geom_line(mapping = aes(y = n_defined, 
                            color = "described alien species"), 
              label = "described alien species") +
    geom_line(mapping = aes(y = cum_n_introduced, 
                            color = "cumulative number of introductions"), 
              label = "cumulative number of introductions") +
    geom_ribbon(aes(ymin = n_defined, ymax = cum_n_introduced), 
                fill = "grey", alpha = "0.5") +
    xlab("Year") +
    ylab("Number of alien species") +
    scale_x_continuous(breaks = seq(start_year_plot, maxDate, 
                            x_scale_stepsize),
                   limits = c(start_year_plot, maxDate)) +
    theme_inbo()
plot

peterdesmet · 2018-04-17T15:25:19Z

n_introduced and cum_n_introduced look fine as names. For n_defined I would use n_recorded ("Recorded alien species"... for that year)

timadriaens · 2018-05-30T14:13:28Z

after some discussion with Wolfgang Rabitsch and @damianooldoni we conclude it does not make sense to put a cumulative and a non-cumulative graph on the same plot.

damianooldoni · 2018-06-01T12:30:52Z

Thanks @timadriaens ! I think we can close this issue.

This was referenced Mar 27, 2018

Indicator: number of new introductions of alien species per year in Belgium #17

Closed

Indicator: cumulative number of alien species #20

Closed

stijnvanhoey mentioned this issue May 16, 2018

Cumulative implementation #25

Merged

damianooldoni closed this as completed Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to tackle uncertainty due to range of year_of_introduction variable #18

How to tackle uncertainty due to range of year_of_introduction variable #18

damianooldoni commented Mar 27, 2018

timadriaens commented Apr 12, 2018 •

edited

Loading

peterdesmet commented Apr 12, 2018

damianooldoni commented Apr 12, 2018

peterdesmet commented Apr 12, 2018 •

edited

Loading

damianooldoni commented Apr 12, 2018

peterdesmet commented Apr 12, 2018

peterdesmet commented Apr 12, 2018

peterdesmet commented Apr 16, 2018

stijnvanhoey commented Apr 17, 2018 •

edited

Loading

stijnvanhoey commented Apr 17, 2018

peterdesmet commented Apr 17, 2018 •

edited

Loading

timadriaens commented May 30, 2018

damianooldoni commented Jun 1, 2018

How to tackle uncertainty due to range of year_of_introduction variable #18

How to tackle uncertainty due to range of year_of_introduction variable #18

Comments

damianooldoni commented Mar 27, 2018

timadriaens commented Apr 12, 2018 • edited Loading

peterdesmet commented Apr 12, 2018

damianooldoni commented Apr 12, 2018

peterdesmet commented Apr 12, 2018 • edited Loading

Plotting first observed

Plotting first/last observed

damianooldoni commented Apr 12, 2018

peterdesmet commented Apr 12, 2018

peterdesmet commented Apr 12, 2018

peterdesmet commented Apr 16, 2018

stijnvanhoey commented Apr 17, 2018 • edited Loading

stijnvanhoey commented Apr 17, 2018

peterdesmet commented Apr 17, 2018 • edited Loading

timadriaens commented May 30, 2018

damianooldoni commented Jun 1, 2018

timadriaens commented Apr 12, 2018 •

edited

Loading

peterdesmet commented Apr 12, 2018 •

edited

Loading

stijnvanhoey commented Apr 17, 2018 •

edited

Loading

peterdesmet commented Apr 17, 2018 •

edited

Loading