Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no background points #81

Closed
pegsus opened this issue Mar 5, 2018 · 6 comments
Closed

no background points #81

pegsus opened this issue Mar 5, 2018 · 6 comments

Comments

@pegsus
Copy link

pegsus commented Mar 5, 2018

Hi
I have Presence/Absence data and am trying to run a model with only presence points to compare it with but for some reason no background points are being generated.

Here is the code I am running:
#subsetting presences
VespulaVulgarisPO<-VespulaVulgarisPA[VespulaVulgarisPA$Vespula_vulgaris==1,]
write.csv(VespulaVulgarisPO, "data/04_VespulaVulgarisPO.csv")
#workflow
VulgarisPO <- workflow(occurrence =
LocalOccurrenceData(filename = "data/04_VespulaVulgarisPO.csv",
occurrenceType = "presence",
columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
process = Background(1000),
model = LogisticRegression,
output = Chain(InteractiveOccurrenceMap, PrintMap))

The error message is claiming that the 'algorithm does not converge'
Any ideas as to what might be happening here?
Thanks
Peggy

03_VespulaVulgarisPA.txt
04_VespulaVulgarisPO.txt

@timcdlucas
Copy link

Hi, thanks for getting in touch. I'm having a brief look at this now.

Notes for myself or whoever comes along to look at this.
I guess it was implicit but you didn't mention that the PA model works.

# works
w = workflow(
  LocalOccurrenceData(filename = "data/03_VespulaVulgarisPA.csv",
                      occurrenceType = "presence/absence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
  covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
  process = NoProcess,
  model = LogisticRegression,
  output = Chain(PrintOccurrenceMap))

# Doesn't work
w = workflow(
LocalOccurrenceData(filename = "data/04_VespulaVulgarisPO.csv",
                    occurrenceType = "presence",
                    columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
process = OneThousandBackground,
model = LogisticRegression,
output = Chain(PrintOccurrenceMap, PrintMap))
)

@timcdlucas
Copy link

Hi @pegsus
OK, yes this is a bug on our side. Or untidyness at least.

For now you can fix it by making sure your csv files have no row names.

VespulaVulgarisPA <- read.table('data/03_VespulaVulgarisPA.csv')
VespulaVulgarisPO <-VespulaVulgarisPA[VespulaVulgarisPA$Vespula_vulgaris==1,]

write.csv(VespulaVulgarisPO, "data/05_VespulaVulgarisPO_norownums.csv", row.names = FALSE)


w = workflow(
  LocalOccurrenceData(filename = "data/05_VespulaVulgarisPO_norownums.csv",
                      occurrenceType = "presence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris")),
  covariate = Bioclim(extent = c(-7.5, 2, 48, 60), resolution = 10, layers = c(1,12)),
  process = Background(n = 100),
  model = LogisticRegression,
  output = Chain(PrintOccurrenceMap))
)

For zoon people

I don't have time to fix this right now but the diagnosis is:

The occurrence dataframe starts with a column V1 (in this case caused by rownames.

      V1   longitude latitude value       type fold
1      1 -2.10036200 51.58466     1   presence    1
2      7 -2.53606700 53.30412     1   presence    1

V1 get's set as NA in the background points and therefore they all get removed in the line

https://github.com/zoonproject/modules/blob/c0c92f6e9f04d57f054446c2731dbfc0fbf82ee6/R/Background.R#L129

I assume we want some way to clean this up. I'm not sure we want to na.omit the whole occurrence data frame. There might be other column people want passed through Background with the NAs being retained.

On a related note, I've written a removeNAs process module for some of my work. I'll be uploading it soon. I think we want to NOT na.omit inside modules as a general rule except modules whose explicit job is to remove NAs.

@pegsus
Copy link
Author

pegsus commented Mar 5, 2018 via email

@goldingn
Copy link
Member

goldingn commented Mar 5, 2018

Glad that helps @pegsus!

Thanks @timcdlucas. I think we need to check for extra columns in LocalOccurrenceData, and either errors or remove them with a message/warning, so that the dataframe is in the correct format for the rest of the workflow.

We also need to work out if we really need that is.na() there.

@goldingn
Copy link
Member

goldingn commented Mar 6, 2018

BTW @pegsus another solution to the problem would be to use the module LocalOccurrenceDataFrame, which takes the dataframe directly:

LocalOccurrenceDataFrame(VespulaVulgarisPO,
                      occurrenceType = "presence",
                      columns = c(long = "Longitude", lat = "Latitude", value = "Vespula_vulgaris"))

@AugustT
Copy link
Member

AugustT commented Mar 6, 2018

Great work team! God damn row names. @pegsus let me know if you need any more help. Closing as the user issue has been addressed

@AugustT AugustT closed this as completed Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants