no background points #81

pegsus · 2018-03-05T14:44:29Z

Hi
I have Presence/Absence data and am trying to run a model with only presence points to compare it with but for some reason no background points are being generated.

Here is the code I am running:
#subsetting presences
VespulaVulgarisPO<-VespulaVulgarisPA[VespulaVulgarisPA$Vespula_vulgaris==1,]
write.csv(VespulaVulgarisPO, "data/04_VespulaVulgarisPO.csv")
#workflow
VulgarisPO <- workflow(occurrence =
LocalOccurrenceData(filename = "data/04_VespulaVulgarisPO.csv",
occurrenceType = "presence",
columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
process = Background(1000),
model = LogisticRegression,
output = Chain(InteractiveOccurrenceMap, PrintMap))

The error message is claiming that the 'algorithm does not converge'
Any ideas as to what might be happening here?
Thanks
Peggy

03_VespulaVulgarisPA.txt
04_VespulaVulgarisPO.txt

timcdlucas · 2018-03-05T15:11:29Z

Hi, thanks for getting in touch. I'm having a brief look at this now.

Notes for myself or whoever comes along to look at this.
I guess it was implicit but you didn't mention that the PA model works.

# works
w = workflow(
  LocalOccurrenceData(filename = "data/03_VespulaVulgarisPA.csv",
                      occurrenceType = "presence/absence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
  covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
  process = NoProcess,
  model = LogisticRegression,
  output = Chain(PrintOccurrenceMap))

# Doesn't work
w = workflow(
LocalOccurrenceData(filename = "data/04_VespulaVulgarisPO.csv",
                    occurrenceType = "presence",
                    columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
process = OneThousandBackground,
model = LogisticRegression,
output = Chain(PrintOccurrenceMap, PrintMap))
)

timcdlucas · 2018-03-05T15:19:13Z

Hi @pegsus
OK, yes this is a bug on our side. Or untidyness at least.

For now you can fix it by making sure your csv files have no row names.

VespulaVulgarisPA <- read.table('data/03_VespulaVulgarisPA.csv')
VespulaVulgarisPO <-VespulaVulgarisPA[VespulaVulgarisPA$Vespula_vulgaris==1,]

write.csv(VespulaVulgarisPO, "data/05_VespulaVulgarisPO_norownums.csv", row.names = FALSE)


w = workflow(
  LocalOccurrenceData(filename = "data/05_VespulaVulgarisPO_norownums.csv",
                      occurrenceType = "presence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
  covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
  process = Background(n = 100),
  model = LogisticRegression,
  output = Chain(PrintOccurrenceMap))
)

For zoon people

I don't have time to fix this right now but the diagnosis is:

The occurrence dataframe starts with a column V1 (in this case caused by rownames.

      V1   longitude latitude value       type fold
1      1 -2.10036200 51.58466     1   presence    1
2      7 -2.53606700 53.30412     1   presence    1

V1 get's set as NA in the background points and therefore they all get removed in the line

https://github.com/zoonproject/modules/blob/c0c92f6e9f04d57f054446c2731dbfc0fbf82ee6/R/Background.R#L129

I assume we want some way to clean this up. I'm not sure we want to na.omit the whole occurrence data frame. There might be other column people want passed through Background with the NAs being retained.

On a related note, I've written a removeNAs process module for some of my work. I'll be uploading it soon. I think we want to NOT na.omit inside modules as a general rule except modules whose explicit job is to remove NAs.

pegsus · 2018-03-05T17:52:50Z

Excellent thanks very much!

…

On 5 Mar 2018, at 15:19, Tim Lucas ***@***.***> wrote: Hi @pegsus <https://github.com/pegsus> OK, yes this is a bug on our side. Or untidyness at least. For now you can fix it by making sure your csv files have no row names. VespulaVulgarisPA <- read.table('data/03_VespulaVulgarisPA.csv') VespulaVulgarisPO <-VespulaVulgarisPA[VespulaVulgarisPA$Vespula_vulgaris==1,] write.csv(VespulaVulgarisPO, "data/05_VespulaVulgarisPO_norownums.csv", row.names = FALSE) w = workflow( LocalOccurrenceData(filename = "data/05_VespulaVulgarisPO_norownums.csv", occurrenceType = "presence", columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")), covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)), process = Background(n = 100), model = LogisticRegression, output = Chain(PrintOccurrenceMap)) ) For zoon people I don't have time to fix this right now but the diagnosis is: The occurrence dataframe starts with a column V1 (in this case caused by rownames. V1 longitude latitude value type fold 1 1 -2.10036200 51.58466 1 presence 1 2 7 -2.53606700 53.30412 1 presence 1 V1 get's set as NA in the background points and therefore they all get removed in the line https://github.com/zoonproject/modules/blob/c0c92f6e9f04d57f054446c2731dbfc0fbf82ee6/R/Background.R#L129 <https://github.com/zoonproject/modules/blob/c0c92f6e9f04d57f054446c2731dbfc0fbf82ee6/R/Background.R#L129> I assume we want some way to clean this up. I'm not sure we want to na.omit the whole occurrence data frame. There might be other column people want passed through Background with the NAs being retained. On a related note, I've written a removeNAs process module for some of my work. I'll be uploading it soon. I think we want to NOT na.omit inside modules as a general rule except modules whose explicit job is to remove NAs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#81 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVehKRGlZ460cZzEt6x1n97oUKsTFUhwks5tbVdygaJpZM4ScZt8>.

goldingn · 2018-03-05T20:57:44Z

Glad that helps @pegsus!

Thanks @timcdlucas. I think we need to check for extra columns in LocalOccurrenceData, and either errors or remove them with a message/warning, so that the dataframe is in the correct format for the rest of the workflow.

We also need to work out if we really need that is.na() there.

goldingn · 2018-03-06T04:03:16Z

BTW @pegsus another solution to the problem would be to use the module LocalOccurrenceDataFrame, which takes the dataframe directly:

LocalOccurrenceDataFrame(VespulaVulgarisPO,
                      occurrenceType = "presence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris"))

AugustT · 2018-03-06T09:31:35Z

Great work team! God damn row names. @pegsus let me know if you need any more help. Closing as the user issue has been addressed

This was referenced Mar 6, 2018

LocalOccurrenceData* leaving unwanted columns in zoonproject/zoon#414

Open

chill out, Background zoonproject/zoon#415

Open

AugustT closed this as completed Mar 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no background points #81

no background points #81

pegsus commented Mar 5, 2018

timcdlucas commented Mar 5, 2018

timcdlucas commented Mar 5, 2018

pegsus commented Mar 5, 2018 via email

goldingn commented Mar 5, 2018

goldingn commented Mar 6, 2018

AugustT commented Mar 6, 2018

no background points #81

no background points #81

Comments

pegsus commented Mar 5, 2018

timcdlucas commented Mar 5, 2018

timcdlucas commented Mar 5, 2018

For zoon people

pegsus commented Mar 5, 2018 via email

goldingn commented Mar 5, 2018

goldingn commented Mar 6, 2018

AugustT commented Mar 6, 2018