provide option to plot actual (realized) distribution on map #21

timadriaens · 2020-06-01T12:22:09Z

Hi, think it would be useful to give users the option to plot the actual distribution (from the cube of from gbif directly), so they can explore to what extent potential area in Belgium has already been invaded. Need to think about proper date cut-offs (e.g. from 2000), type of data to use (human observations or other).

niconoe · 2020-06-02T07:52:14Z

@timadriaens: yep, I can see the usefulness of this!

I can see two approaches:

Directly and dynamically asks GBIF to generate the tiles for use via their maps API, and add the ability to display such overlays to our maps.
Write some scripts to transform the data cube (and/or additional data) to a "current distribution" GeoTiff that can be shown via the existing machinery.

I lean towards the former approach since it's simpler to implement and seems (at first look) to do all we need. The drawback is that it's a bit less flexible since GBIF is doing all the data preparation work, so we are stuck with their restrictions (only GBIF data and the the cut-offs/filtering options we want should be available).

I'd be tempted to try this solution soon and see how it works. To make sure we are working as efficiently as possible:

@timadriaens and @SoVDH: can you already think of the cut-off and type of data to use (so I check it's indeed available via GBIF APIs). Are you confortable with the fact that all this data will come from GBIF (no other sources)?
@peterdesmet: I'm interested in your opinion on all of this. Do you agree with the suggested approach(es)?

peterdesmet · 2020-06-02T08:52:21Z

I would definitely try the GBIF maps API first. The fact that it is GBIF data only is not a problem, the fact that it isn't processed to a gridded cube might (but it can show more detail). I therefore suggest to create a quick test and have @timadriaens and @SoVDH see if that meets the demand.

timadriaens · 2020-06-02T10:54:49Z

Indeed me too I prefer to plot the real xy data from gbif rather than the "squarified" ones, this will also be clearer distinction between the gridded risk maps and the real observations. I am in favour of using a dot.

@niconoe Selection criteria: BasisofRecord = human observation mostly (excluding records with geospatial issues), but probably you can simply use the code for the indicators discussed in this issue which uses all but fossil specimen.

occ_clean<- occ %>% filter(basisOfRecord!="FOSSIL_SPECIMEN") %>% filter(hasCoordinate =="TRUE") %>% filter(hasGeospatialIssues=="FALSE") %>% filter(is.na(coordinateUncertaintyInMeters)| coordinateUncertaintyInMeters< 708) %>% select(taxonKey,species, scientificName,decimalLatitude,decimalLongitude,eventDate,year,coordinateUncertaintyInMeters,datasetKey,countryCode,establishmentMeans)%>%#select desirable variables filter(!grepl("^[0-9]+(\\.[0-9]{0,1})?$",decimalLatitude))%>% filter(!grepl("^[0-9]+(\\.[0-9]{0,1})?$",decimalLongitude))

I would however perhaps exclude data that have too big coordinate uncertainty. If we eventually plot maps in Harmonia, probably best to make sure these use the same criteria to select occurrences from gbif so there are no discrepancies between TrIAS products anywhere.

niconoe · 2020-06-04T08:44:23Z

I've now implemented a first version of this feature, visible here as usual.

I've taken the simplest approach discussed above (make GBIF render maps for us!). I therefore had to live with some limitations both in terms of data selection and visual rendering and diverge slightly from what was discussed before:

In terms of selection criteria:

(@timadriaens: unfortunately we can't just run random R code in the web API)

I can't unfortunately filter on coordinateUncertaintyInMeters
I can't filter on hasGeospatialIssues (but I assume those records are excluded anyway from the maps API- we can check with GBIF)
I asked for the following values for basisOfRecord: OBSERVATION, HUMAN_OBSERVATION, MACHINE_OBSERVATION, MATERIAL_SAMPLE, PRESERVED_SPECIMEN, LIVING_SPECIMEN, LITERATURE. I assume this is equivalent to exclude FOSSIL_SPECIMEN and UNKNOWN (should I include this one?)

In terms of display

Showing a point per occurrence isn't really visible on the map background, because it's only one pixel per occurrence
I therefore had to show a density map (occurrences aggregated in squares or hexagons - I choose this option for now)
showing those hexagons and simultaneously the squared model is completely unreadable, so I made it possible to show one or the other
I don't think GBIF can give us a color legend ("X number of occurrences gives color Y"), contrary to what we do with Amy's models.

I'm interested in all feedback, but my main question to @timadriaens and @SoVDH is: can we live with the limitations listed above? (other changes such as colors, application interface, display logic, ... can still take place). If yes, the work on this issue is almost done 🥳. Otherwise just tell me and I'll look for a heavier but more flexible approach!

timadriaens · 2020-06-04T08:49:41Z

Looking great, but it would be more handy if we could visualize those superimposed on the risk map. Do you think this is possible?

SoVDH · 2020-06-04T09:08:07Z

This is GREAT ! Regarding Tim's comment here above, I would even suggest to be able to choose 'risk map' or 'realized distribution' or 'superimposed'
Thanks a lot for this Nico. This is a great achievement already :-)

niconoe · 2020-06-04T10:20:57Z

@timadriaens and @SoVDH : I made a few quick tests previously and the result was quite visually messy and poorly readable. But I'll try again with different settings and colours and see what we can get!

niconoe · 2020-06-05T10:25:26Z

@timadriaens and @SoVDH: I implemented Sonia's suggestion and tweaked a few things (colours, how the opacity settings work, ...) and I think we have reached something decent.

Can you have a look?

SoVDH · 2020-06-05T12:01:20Z

Yes ! Giving the choice between the 3 visualizations is indeed a good idea and by playing on the opacity, it's very readable I find. Merci Nico, c'est très chouette :-)

peterdesmet · 2020-06-08T11:17:19Z

Nice work! Noticed an error (surimposed rather than superimposed) in the labels, and would rename them to:

Modelled data
Occurrence data
Both

I'm not sure we should use "realized distribution", because there might be more distribution than there is occurrence data. I think it is fine to not mention GBIF in the label name, as we'll need to explain anyway that the modelled and occurrence data are based on GBIF.

peterdesmet · 2020-06-08T11:17:46Z

Also, why is the modelled data still showing when I select "Occurrence data"?

niconoe · 2020-06-09T08:25:30Z

Thanks @peterdesmet: I fixed a couple of bugs in the display logic and updated the labels, I think the situation is better now.

niconoe · 2020-06-11T09:20:30Z

If the current implementation seems decent to everyone, I suggest closing this issue.

qgroom · 2020-06-14T17:52:15Z

It looks good to me

amyjsdavis · 2020-07-01T14:23:43Z

Hi All: I am sorry I missed this last month. I had asked Nico (while I was unaware of this thread) to only show only the occurrence data I used to make the risk models, because there is sometimes a big difference between the date I used and the occurrences that are shown in the risk mapping application. Now that I've read through this thread, it seems like the biggest differences in the occurrence data he is showing and the one I used for the risk modelling is time and excluding data based on coordinate uncertainty. In order to align the models with the historic climate data, only data from 1976 to 2005 are used in my models. I had prepared shapefiles that show the occurrence data used in the models for Belgium and that is ideally what I like to use since he can't filter based on coordinate uncertainty. The 2nd best option is to filter the data based on time. I think that would make the occurrence data closely approximate what was used. What do you all think? @niconoe @timadriaens @SoVDH @DiederikStrubbe

niconoe · 2020-07-01T15:36:38Z

Indeed, let's continue the discussion here rather than by e-mail. My two cents:

about which data to include, I'll let scientists answer :)
about the webapp implementation: If I understand correctly: the way the occurrences are show and user interface should stay like it is now (same logic, styling, ...), the difference being that the data source is now @amyjsdavis's shapefiles instead of directly loaded via the GBIF API. Is this correct (that's basically all that I need to know to implement properly).

Related questions:

What's the timeline for this change? If I understand correctly, @SoVDH would like to use the tool ASAP?
@amyjsdavis: would it be possible to change the shapefile's naming convention so it's consistent with the modelled geotiff files (for example, using the GBIF taxon ID rather than the scientificname) => this would makes things much simpler and smoother in terms of automation (easier for a machine to match the different files related to a given species/taxon)
@amyjsdavis: as discussed in other threads, we think it would be great and more in line with the project philosophy if your various data transformations were fully available and documented at all time on GitHub and repeatable/comment-able/improved-able
by everyone. I am thinking that maybe the generation process of those shapefiles is a good candidate to use a fully "open" workflow from scratch? I understand working with GitHub and pipelining multiple tools together can be time consuming at first and a bit out of your comfort zone, so if you think a short 4-hands "hackaton" could help, just tell me and I'll free some time for you.

@damianooldoni / @timadriaens / @SoVDH / @peterdesmet / @qgroom / @DiederikStrubbe : as usual, your opinion is appreciated!

timadriaens · 2020-07-01T15:56:56Z

@amyjsdavis The idea is to give the assessor an idea about the realised niche in Belgium. I see no reason why you would show only the data you used for drafting your models. And certainly, it makes little sense to not provide a risk assessor with the last (post 2005) 15 years of data. This would in effect mean you would not show any waarnemingen.be data as that recording platform only started in 2006. But indeed, we need to show verified data only and if this preprocessing is not possible using the gbif api perhaps it's better to use the cube as a data source for the visualization? I feel a hackaton would certainly be useful. Perhaps there could also be a session alongside for dummies like me to actually explain how to run the trias packages and produce graphs and maps for species. As end users, we don't need to crack the functions in depth, but we will want to use it to produce the indicator graphs and risk maps for the species we want.

amyjsdavis · 2020-07-01T16:31:49Z

@niconoe: they don't need to be shapefiles. they can be text files if it makes life easier. And sorry, I meant to rename them using the taxon key before sending.
Also, my entire work flow is already on github with the exception of this last one where the EU occurrences were clipped to the Belgian border I did not publish this latest work flow (or data transformation) because it is still undecided about whether these data will be used.
Update: I used the TriAS workflow to create my global download but this is not evident in my script. I will change that. And I will add the scripts to create all the data products used in the modeling.

amyjsdavis · 2020-07-01T16:43:01Z

@timadriaens : Indeed, I could see the utility of showing more recent occurrences, but not the old ones that predate the climate data. However all the data used to show the realized niche, should follow the same filtering criteria used to make the models with the exception of the time period with the logic being if the data were not good enough to include in the model, they should not be good enough to indicate the realized niche.

damianooldoni · 2020-07-01T16:58:10Z

I have just updated the occurrence cubes. 🥳 🎈 I still wait to upload it to Zenodo. I want to double check it after I return from holiday..
So, maybe @niconoe can use the Belgian cube for the visualization? It is made using verified data only and it can be easy to remove squares based on year or min_coordinate uncertainty. As it makes use of the 1km squares of the EEA, the shapefiles can be read as well. Just an idea.

amyjsdavis · 2020-07-01T17:00:18Z

@damianooldoni : I really like this idea.

niconoe · 2020-07-02T07:42:19Z

@timadriaens / @SoVDH / @DiederikStrubbe: do you agree the Belgian cube should be the source of data displayed in the map viewer?

@ALL: in that case, I suppose the appropriate rendering would be:

to show the EEA grid "squares" (where we have occurrences of the selected species, after filtering per year and minimal coordinates uncertainty)
the color of the square reflects the number of occurrences (darker = more occurrences, or something similar)

DiederikStrubbe · 2020-07-02T13:00:25Z

Hi all, Apologies for my late reply to this interesting discussion. Several ideas have already been brought up, and my interpretation and suggestions would be (a) Only show whether a grid has at least one occurrence. I am not sure whether shading to indicate number of occurrences would be that useful - it is likely to be more representative of observer bias rather than reflecting species abundance. (b) I would not include data that predate 1975: if there has been no more recent observation for a species-grid combination since 1975 such historical data likely are not very informative for current/future distributions (c) For the difference between the 'cube occurrences' and the subset of those occurrences that have actually been used for the models: maybe we can use 'black dots' for the occurrences used for modeling and 'grey dots' for the remaining cube occurrences. I'd be happy to join a Skype to discuss this if needed, so that Amy and Nicolas can proceed with this. Cheers, Diederik Op 2/07/2020 om 9:42 schreef Nicolas Noé:

@timadriaens <https://github.com/timadriaens> / @SoVDH <https://github.com/SoVDH> / @DiederikStrubbe <https://github.com/DiederikStrubbe>: do you agree the Belgian cube <https://github.com/trias-project/occ-cube-alien/blob/master/data/processed/be_alientaxa_cube.csv> should be the source of data displayed in the map viewer?

--> as above, I think yes indeed

@ALL <https://github.com/ALL>: in that case, I suppose the appropriate rendering would be: * to show the EEA grid "squares" (where we have occurrences of the selected species, after filtering per year and minimal coordinates uncertainty)

--> OK

* the color of the square would reflect the number of occurrences (darker = more occurrences, or something similar)

--> Not convinced (see above) :-)

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEY7SS2KZ45KVBJ3WUYYDQTRZQ26TANCNFSM4NPYZC5A>.

timadriaens · 2020-07-02T13:10:18Z

Regarding the point (c): I wonder if it is very useful to show a risk assessor the data that were used for the model versus the data that were not used (unless you provide him/her with lots of explanation why this is so they will not understand why the model did not incorporate all). Could a simple legend showing temporal range of the occurrences not be more informative? For example, black dots for >2000 records, hollow dots for <2000?

As a general remark, I feel we should perhaps avoid showing different distribution maps in TrIAS at different places?

DiederikStrubbe · 2020-07-02T13:19:16Z

Hi, See below, D Op 2/07/2020 om 15:10 schreef Tim Adriaens:

Regarding the point (c): I wonder if it is very useful to show a risk assessor the data that were used for the model versus the data that were not used (unless you provide him/her with lots of explanation why this is so they will not understand why the model did not incorporate all). Could a simple legend showing temporal range of the occurrences not be more informative? For example, black dots for >2000 records, hollow dots for <2000?

--> I think that good be a good compromise :-)

As a general remark, I feel we should perhaps avoid showing different distribution maps in TrIAS at different places?

--> not sure what this remark is referring to? Correspondence between risk maps and temperal extent of occurrence maps?

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEY7SS642TTYTCQCUAYNRCDRZSBMVANCNFSM4NPYZC5A>.

DiederikStrubbe · 2020-07-02T13:21:56Z

Shall I just mention that I'd be interested in participating in such a hackaton too! Cheers, Diederik Op 1/07/2020 om 17:36 schreef Nicolas Noé:

…

Indeed, let's continue the discussion here rather than by e-mail. My two cents: * about which data to include, I'll let scientists answer :) * about the webapp implementation: If I understand correctly: the way the occurrences are show and user interface should stay like it is now (same logic, styling, ...), the difference being that the data source is now @amyjsdavis <https://github.com/amyjsdavis>'s shapefiles instead of directly loaded via the GBIF API. Is this correct (that's basically all that I need to know to implement properly). Related questions: * What's the timeline for this change? If I understand correctly, @SoVDH <https://github.com/SoVDH> would like to use the tool ASAP? * @amyjsdavis <https://github.com/amyjsdavis>: would it be possible to change the shapefile's naming convention so it's consistent with the modelled geotiff files (for example, using the GBIF taxon ID rather than the scientificname) => this would makes things much simpler and smoother in terms of automation (easier for a machine to match the different files related to a given species/taxon) * @amyjsdavis <https://github.com/amyjsdavis>: as discussed in other threads, we think it would be great and more in line with the project philosophy if your various data transformations were fully available and documented at all time on GitHub and repeatable/comment-able/improved-able by everyone. I am thinking that maybe the generation process of those shapefiles is a good candidate to use a fully "open" workflow from scratch? I understand working with GitHub and pipelining multiple tools together can be time consuming at first and a bit out of your comfort zone, so if you think a short 4-hands "hackaton" could help, just tell me and I'll free some time for you. @damianooldoni <https://github.com/damianooldoni> / @timadriaens <https://github.com/timadriaens> / @SoVDH <https://github.com/SoVDH> / @peterdesmet <https://github.com/peterdesmet> / @qgroom <https://github.com/qgroom> / @DiederikStrubbe <https://github.com/DiederikStrubbe> : as usual, your opinion is appreciated! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEY7SS72QEXKYYPIKFY5JX3RZNJZNANCNFSM4NPYZC5A>.

amyjsdavis · 2020-07-02T13:28:20Z

if we can find a date for a hackathon that would be great! :-) In that case, I can go over the model with those interested and we can work together to improve the modeling code. I am signing off for now! Ciao

qgroom · 2020-07-02T15:28:09Z

I'd also be interested in a hackathon, as would some of the team that you usually don't meet.

Regarding the point (c): I wonder if it is very useful to show a risk assessor the data that were used for the model versus the data that were not used (unless you provide him/her with lots of explanation why this is so they will not understand why the model did not incorporate all). Could a simple legend showing temporal range of the occurrences not be more informative? For example, black dots for >2000 records, hollow dots for <2000?

BTW: You have to be cautious aggregating data in the cube across years. Due to the random assignments of observations to grid squares it is not impossible for a single isolated tree to be assigned to more than 5 different grid cells. This is not such a problem for single years, but the risk increases as you aggregate,

timadriaens · 2020-07-02T16:36:33Z

--> not sure what this remark is referring to? Correspondence between risk maps and temperal extent of occurrence maps?

No. I just mean that the distribution maps we'll show on Harmonia eventually (at least, I thought that was the idea) should not deviate from distribution maps in other TrIAS products, such as this tool to explore the risk maps. Unless of course the idea is to integrate this tool entirely.

SoVDH · 2020-07-07T11:49:55Z

I agree with Tim about the maps produced. As much as possible, we should avoid producing a diversity of maps. We should aim for a cartographic tool that is as 'generalist' as possible and that can be used as much as possible for TrIAS and for their future integration in Harmonia or in regional portals. After talking with Nico, I understand that this wish is a bit illusory and that it is rare that a 'recycling' for other purposes can be envisaged. I believe, however, that we must try to maximize the possible uses.

Also, regarding 'Regarding the point (c): I wonder if it is very useful to show a risk assessor the data that were used for the model versus the data that were not used (unless you provide him/her with lots of explanation why this is so they will not understand why the model did not incorporate all).
--> I am unconvinced of the value of showing the PRA assessor the difference between points used for risk mapping and points not used in addition to occurrence data. It is important to keep it simple and only give information that is useful for assessing current and future establishment capacity. They are not asked to assess the quality of the modeling, only to consider the map and its associated uncertainty.
Could a simple legend showing temporal range of the occurrences not be more informative? For example, black dots for >2000 records, hollow dots for <2000?'
--> Indeed, this is may be informative.

niconoe · 2021-05-19T07:53:27Z

I have to admit I am a bit lost in this huge thread going in multiple directions.

Can we close it, or are there still active action/discussion points?

peterdesmet · 2021-05-19T07:58:32Z

I am fine with closing it. The scope of the current application should be kept limited, especially since the RShiny dashboards will likely provide much more.

niconoe self-assigned this Jun 2, 2020

niconoe pushed a commit that referenced this issue Jun 9, 2020

Fix labels (see #21)

149ed30

niconoe closed this as completed Jun 16, 2020

amyjsdavis mentioned this issue Jul 1, 2020

how to show occurrences on risk maps #24

Closed

damianooldoni reopened this Jul 1, 2020

niconoe mentioned this issue Jul 8, 2020

Show list of elements of an overlay + add bi-directional interaction #16

Closed

peterdesmet closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide option to plot actual (realized) distribution on map #21

provide option to plot actual (realized) distribution on map #21

timadriaens commented Jun 1, 2020

niconoe commented Jun 2, 2020

peterdesmet commented Jun 2, 2020

timadriaens commented Jun 2, 2020

niconoe commented Jun 4, 2020

timadriaens commented Jun 4, 2020

SoVDH commented Jun 4, 2020

niconoe commented Jun 4, 2020

niconoe commented Jun 5, 2020

SoVDH commented Jun 5, 2020

peterdesmet commented Jun 8, 2020

peterdesmet commented Jun 8, 2020

niconoe commented Jun 9, 2020

niconoe commented Jun 11, 2020

qgroom commented Jun 14, 2020

amyjsdavis commented Jul 1, 2020 •

edited

Loading

niconoe commented Jul 1, 2020

timadriaens commented Jul 1, 2020

amyjsdavis commented Jul 1, 2020 •

edited

Loading

amyjsdavis commented Jul 1, 2020 •

edited

Loading

damianooldoni commented Jul 1, 2020

amyjsdavis commented Jul 1, 2020

niconoe commented Jul 2, 2020 •

edited

Loading

DiederikStrubbe commented Jul 2, 2020 via email

timadriaens commented Jul 2, 2020

DiederikStrubbe commented Jul 2, 2020 via email

DiederikStrubbe commented Jul 2, 2020 via email

amyjsdavis commented Jul 2, 2020

qgroom commented Jul 2, 2020

timadriaens commented Jul 2, 2020

SoVDH commented Jul 7, 2020 •

edited

Loading

niconoe commented May 19, 2021

peterdesmet commented May 19, 2021

provide option to plot actual (realized) distribution on map #21

provide option to plot actual (realized) distribution on map #21

Comments

timadriaens commented Jun 1, 2020

niconoe commented Jun 2, 2020

peterdesmet commented Jun 2, 2020

timadriaens commented Jun 2, 2020

niconoe commented Jun 4, 2020

In terms of selection criteria:

In terms of display

timadriaens commented Jun 4, 2020

SoVDH commented Jun 4, 2020

niconoe commented Jun 4, 2020

niconoe commented Jun 5, 2020

SoVDH commented Jun 5, 2020

peterdesmet commented Jun 8, 2020

peterdesmet commented Jun 8, 2020

niconoe commented Jun 9, 2020

niconoe commented Jun 11, 2020

qgroom commented Jun 14, 2020

amyjsdavis commented Jul 1, 2020 • edited Loading

niconoe commented Jul 1, 2020

timadriaens commented Jul 1, 2020

amyjsdavis commented Jul 1, 2020 • edited Loading

amyjsdavis commented Jul 1, 2020 • edited Loading

damianooldoni commented Jul 1, 2020

amyjsdavis commented Jul 1, 2020

niconoe commented Jul 2, 2020 • edited Loading

DiederikStrubbe commented Jul 2, 2020 via email

timadriaens commented Jul 2, 2020

DiederikStrubbe commented Jul 2, 2020 via email

DiederikStrubbe commented Jul 2, 2020 via email

amyjsdavis commented Jul 2, 2020

qgroom commented Jul 2, 2020

timadriaens commented Jul 2, 2020

SoVDH commented Jul 7, 2020 • edited Loading

niconoe commented May 19, 2021

peterdesmet commented May 19, 2021

amyjsdavis commented Jul 1, 2020 •

edited

Loading

amyjsdavis commented Jul 1, 2020 •

edited

Loading

amyjsdavis commented Jul 1, 2020 •

edited

Loading

niconoe commented Jul 2, 2020 •

edited

Loading

SoVDH commented Jul 7, 2020 •

edited

Loading