Modify all summary functions to take into account design information (i.e., weights). #3

krivit · 2017-05-20T08:23:13Z

The simplest way to do that is to simply use weights(egor) to get the list of ego weights according to the design. Note that the weights are not normalized: they do not necessarily sum to 1.

A more advanced use is svymean(x, ego.design(egor)), which takes a vector or a matrix x and evaluates its sample mean in accordance with the survey design.

The text was updated successfully, but these errors were encountered:

krivit · 2017-05-20T20:26:44Z

@tilltnet , @raffaelevacca , this one might have to be you: I only understand the data structure and survey design parts of egor.

…lating average netzsize and average density.

tilltnet · 2017-05-20T22:37:14Z

svymean(x, ego.design(egor))

I included this in summary.egor.

All other analysis functions calculate per ego values from alters/ alter_ties, so I think there is no need to include weights there, but I should include examples using the weights for summarising, i.e. compositional results with weighted.mean() EDIT: or the svymean() thing, of course!

krivit · 2020-01-15T10:44:38Z

I've changed the code so that egor$ego is now a tbl_svy object with all that that entails. Do the functions that call for weights still work?

Edit: The changes are in the ego_srvyr branch.

krivit · 2020-09-10T05:46:49Z

As far as I can gather, we have two kinds of summary functions:

Those that calculate some statistic for each ego and return a tibble with the egoID and the statistic.
Those that calculate the summary over all egos together.

I am not actually sure which, except for the dplyr functions, are of the second type.

Based on this, I think that in order to properly incorporate sampling weights, we should probably have functions of the first type return a tbl_svy object instead of a tibble, so that the user could then use sampling-aware summaries. Any thoughts?

tilltnet · 2020-09-10T17:01:56Z

I think the first type is the main kind we have to take care of. I think it'd be easiest to have an internal function that can be inserted at the end of each of these summary/analytics functions, that would check if there is an ego.design and if there is it
converts the result into a tbl_svy that includes the ego.design.
Currently there really is only one function that provides summary stats for the whole data set: summary.egor(). It computes the mean network size and the mean density. I can update this, so that it takes the ego.design into regard again.

krivit · 2020-09-15T06:44:50Z

With #53 being resolved, we can now modify the first type of functions to return data of the appropriate type. I am increasingly of the opinion that we should not output different formats depending on whether the underlying egor has ego design information, because, unfortunately, tbl_svy does not provide a very transparent interface to tibble---in particular, $ indexes the underlying data structure, not the table. For this reason, I ultimately decided to have as_tibble and as_alters_df always return a tibble, as_survey and as_alters_survey always a tbl_svy, and so on.

In light of this, we need some way to control the output format for the first type, such as EI. Two additional arguments come to mind:

survey=logical: TRUE, FALSE, or NA. If TRUE, always return a tbl_svy (even if the underlying egor does not have design information), if FALSE, always drop it, and if NA, inherit.
output=c("tibble","df","survey","inherit"): "df" is an alias for "tibble". Others are self-explanatory.

Any thoughts?

tilltnet · 2020-09-15T17:54:50Z

Both options are good I think. I have a slight preference for the first. I would use TRUE/FALSE/NULL not NA. Following your argument the default would be FALSE?

Stepping back a bit though I am not sure if this is offers the right workflow to people that have an ego.design. If I have an ego.design I probably want to use it for all of my results, so I'd have to always be explicit about that, choosing specific functions and setting arguments.

The way I teach it in the workshop is that ego level results should be joined to the $ego tibble. If the same could be easily done with the results that come from an egor object with an ego.design a summarization of the results could than easily incorporate the ego.design. Since srvyr doesn't allow joins maybe we could have a join_results() function that works like a left_join(), has the egor object as the first argument and the results as the second.

Long story short, I think if we go in this direction we should also consider implementing a left_join()-like function, as you sggested in #53, that works with tbl_svy objects.

krivit · 2020-09-16T00:14:38Z

Both options are good I think. I have a slight preference for the first. I would use TRUE/FALSE/NULL not NA. Following your argument the default would be FALSE?

Either works.

Stepping back a bit though I am not sure if this is offers the right workflow to people that have an ego.design. If I have an ego.design I probably want to use it for all of my results, so I'd have to always be explicit about that, choosing specific functions and setting arguments.

By that argument, should the default be NA/NULL?

The way I teach it in the workshop is that ego level results should be joined to the $ego tibble. If the same could be easily done with the results that come from an egor object with an ego.design a summarization of the results could than easily incorporate the ego.design. Since srvyr doesn't allow joins maybe we could have a join_results() function that works like a left_join(), has the egor object as the first argument and the results as the second.

Do you mean that they should return an egor object with the egor$ego tibble augmented with additional columns? Then we already have methods for that. Suppose that x is our original egor object, and res is a tibble with an .egoID column and whatever ego-level index we've computed, and there are no duplicated .egoIDs. Then,

x %>% activate("ego") %>% left_join(res, ".egoID")

should get the desired result.

If we just want a survey with two variables, the ego ID and the result, it becomes

out <- egor$ego
out$variables <- left_join(out$variables[".egoID"], res, ".egoID")
out

Long story short, I think if we go in this direction we should also consider implementing a left_join()-like function, as you sggested in #53, that works with tbl_svy objects.

I opened a ticket on srvyr back in January (gergness/srvyr#65). In principle, it should be possible to implement the left_join() and the inner_join() methods for tbl_svy since the underlying survey package handles indexing intelligently.

tilltnet · 2020-09-23T19:26:41Z

Ok, I looked into it and joining results to the egor object works fine, as demonstrated by your example above.

Given this I think we should add the survey argument to all summary/analysis functions that return ego level results and the options would be TRUE/FALSE/NULL. Defaulting to FALSE. Or would you prefer another default?

krivit · 2020-10-12T01:05:29Z

Apologies for the slow reply... I like the neutral (NULL/NA) as the default better in principle, but in practice, I suspect most people will be expecting FALSE.

krivit · 2020-10-12T01:05:55Z

Actually, could we make it an options() option?

tilltnet · 2020-10-22T20:11:00Z

For the maintenance of the package using a global option with options() sounds like a good idea to me. So we could have a global option that is called maybe "egor.return.results.with.design". And we could handle this the same as described above. When not set or NULL we inherit, TRUE returns the results as srvyr object, and when FALSE the design is discarded when returning the results. As for the default value, I am a bit unsure currently, but I think we can finalize the decision later on. I'll start working on this today and for now, I think I'll go with 'NULL' as the default.

…ct with ego_design #3

mbojan · 2020-10-22T22:54:06Z

Just a quick note: NULL is equivalent to an option being absent:

options( foobleh = NULL )
o <- options()
"foobleh" %in% names(o)
## FALSE
getOption("foobleh")
## NULL

krivit · 2020-10-22T23:06:45Z

Good point. That's a part of why I prefer NA.

mbojan · 2020-10-22T23:18:14Z

If the option is needed in a (hopefully) single place in the code then simply getOption("egor.return_with_design", FALSE) can be used to set the default. Otherwise, to have the defaults stored in a single place we can have a non-exported list, say egor_option_defaults <- list(egor.return_with_design = FALSE) and in the functions use sth like getOption("egor.return_with_design", egor_option_defaults$egor.return_with_design)...

tilltnet · 2020-10-22T23:29:13Z

Just a quick note: NULL is equivalent to an option being absent:

I think that is an advantage in this case. If we want 'inherit` to be the standard we don't need to set the option anywhere and only the user will have to set it if they want something else than the default.

The implementation I pushed in a666c93 differs a bit from what was said above. The options now are to inherit the tbl_svy or not. Returning a tbl_svy when there is no ego_design is present seemed useless to me, but I might be wrong?!?

Currently I am not setting the global option and the default is 'inherit'. But as I said above, I am not completely sure, what the default should be.

To summarize pros and cons from the previous discussion:

Default = `inherit`

Pros:

summary stats on results can be fed into functions that take weights into regard

Cons:

egor workflow of binding results to the ego dataframe is a bit more complicated (currently requires copy = TRUE) but that could be smoothed out
results are harder to inspect, as the tbl_svy does not print the actual values but rather only the sampling design info

Default = plain tibble

Pros:

results can be easily joined to ego data

Cons:

sampling design not present in results at all

Is there anything else?

tilltnet · 2020-10-22T23:33:02Z

If the option is needed in a (hopefully) single place in the code then simply getOption("egor.return_with_design", FALSE) can be used to set the default. Otherwise, to have the defaults stored in a single place we can have a non-exported list, say egor_option_defaults <- list(egor.return_with_design = FALSE) and in the functions use sth like getOption("egor.return_with_design", egor_option_defaults$egor.return_with_design)...

The option is only accessed in one place currently. I was thinking to use .onLoad or .onAttach if we want to set it. But, we could just change the behavior for when the option is NULL, to whatever we want to be the default.

mbojan · 2020-10-23T00:22:53Z

Using the hooks for setting options is a bad idea. .onAttach will not set it if the package is just imported (eg by ergm.ego). Both will override potential user settings in eg Rprofile.

krivit · 2020-10-23T00:37:53Z

The options set in .onLoad() will work even if the package is not attached.

It's possible to set the options without clobbering existing ones using some code along the following lines (currently used in ergm):

  OPTIONS <- list(ergm.eval.loglik=TRUE,
                  ergm.loglik.warn_dyads=TRUE,
                  ergm.cluster.retries=5)
  current <- names(options())
  for(opt in names(OPTIONS)){
    if(! opt%in%current){
      do.call(options, OPTIONS[opt])
    }
  }

mbojan · 2020-10-23T00:45:46Z

If the user sets the option in say Rprofile (as is common to set options per user or per project), .onLoad will overwrite it.

Edit: ok your code will not overwrite it. Still, i think having the defaults in the package namespace as I shown earlier is much cleaner.

krivit · 2020-10-23T00:52:10Z

The defaults can go anywhere .onLoad() can see them.

By the way, here's an even simpler implementation, assuming PKGOPTIONS is set somewhere:

do.call(options, PKGOPTIONS[setdiff(names(PKGOPTIONS), names(options()))])

krivit · 2020-10-23T04:38:30Z

I've just implemented a function statnet.common::default_options() which wraps options() to avoid overwriting existing settings. It might make sense to import that, or maybe copy to egor.

tilltnet · 2020-11-03T17:10:07Z

tilltnet · 2020-11-03T17:41:33Z

I've just implemented a function statnet.common::default_options() which wraps options() to avoid overwriting existing settings. It might make sense to import that, or maybe copy to egor.

Thanks, I used that as inspiration.

The defaults are now set with .onLoad() and I added additional options to influence the behavior of print.egor() (see #54).

The default for egor.return.results.with.design currently is TRUE. But I feel like setting it to FALSE makes more sense until srvyr, prints tbl_svys in a way where the values are visible.

krivit · 2020-11-10T02:05:28Z

Thanks! The name seems a bit verbose. Also, I would suggest replacing the dots after the first with underscores.

tilltnet · 2020-11-10T14:27:44Z

egor.results_with_design

would that be better? since those are not typed out regularly but at a maximum typed at the beginning of a session it should be as verbose as necessary to get across what it does.

krivit added enhancement help wanted labels May 20, 2017

tilltnet added a commit that referenced this issue May 20, 2017

#3 Updated summary.egor taking design weights into accout, when calcu…

6734682

…lating average netzsize and average density.

tilltnet self-assigned this May 20, 2017

tilltnet added a commit that referenced this issue Oct 22, 2020

feat: results are now returned as tbl_svy when derived from egor obje…

a666c93

…ct with ego_design #3

tilltnet added a commit that referenced this issue Nov 3, 2020

feat: added egor_options() and set values for global options defaults #3

de540cc

tilltnet mentioned this issue Nov 10, 2020

Update and test behavior of print.egor() #54

Closed

tilltnet closed this as completed Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify all summary functions to take into account design information (i.e., weights). #3

Modify all summary functions to take into account design information (i.e., weights). #3

krivit commented May 20, 2017 •

edited

Loading

krivit commented May 20, 2017

tilltnet commented May 20, 2017 •

edited

Loading

krivit commented Jan 15, 2020 •

edited

Loading

krivit commented Sep 10, 2020

tilltnet commented Sep 10, 2020

krivit commented Sep 15, 2020

tilltnet commented Sep 15, 2020 •

edited

Loading

krivit commented Sep 16, 2020

tilltnet commented Sep 23, 2020

krivit commented Oct 12, 2020

krivit commented Oct 12, 2020

tilltnet commented Oct 22, 2020

mbojan commented Oct 22, 2020 •

edited

Loading

krivit commented Oct 22, 2020

mbojan commented Oct 22, 2020

tilltnet commented Oct 22, 2020

tilltnet commented Oct 22, 2020

mbojan commented Oct 23, 2020

krivit commented Oct 23, 2020

mbojan commented Oct 23, 2020 •

edited

Loading

krivit commented Oct 23, 2020

krivit commented Oct 23, 2020

tilltnet commented Nov 3, 2020 •

edited

Loading

tilltnet commented Nov 3, 2020 •

edited

Loading

krivit commented Nov 10, 2020

tilltnet commented Nov 10, 2020

Modify all summary functions to take into account design information (i.e., weights). #3

Modify all summary functions to take into account design information (i.e., weights). #3

Comments

krivit commented May 20, 2017 • edited Loading

krivit commented May 20, 2017

tilltnet commented May 20, 2017 • edited Loading

krivit commented Jan 15, 2020 • edited Loading

krivit commented Sep 10, 2020

tilltnet commented Sep 10, 2020

krivit commented Sep 15, 2020

tilltnet commented Sep 15, 2020 • edited Loading

krivit commented Sep 16, 2020

tilltnet commented Sep 23, 2020

krivit commented Oct 12, 2020

krivit commented Oct 12, 2020

tilltnet commented Oct 22, 2020

mbojan commented Oct 22, 2020 • edited Loading

krivit commented Oct 22, 2020

mbojan commented Oct 22, 2020

tilltnet commented Oct 22, 2020

Default = inherit

Pros:

Cons:

Default = plain tibble

Pros:

Cons:

tilltnet commented Oct 22, 2020

mbojan commented Oct 23, 2020

krivit commented Oct 23, 2020

mbojan commented Oct 23, 2020 • edited Loading

krivit commented Oct 23, 2020

krivit commented Oct 23, 2020

tilltnet commented Nov 3, 2020 • edited Loading

tilltnet commented Nov 3, 2020 • edited Loading

krivit commented Nov 10, 2020

tilltnet commented Nov 10, 2020

krivit commented May 20, 2017 •

edited

Loading

tilltnet commented May 20, 2017 •

edited

Loading

krivit commented Jan 15, 2020 •

edited

Loading

tilltnet commented Sep 15, 2020 •

edited

Loading

mbojan commented Oct 22, 2020 •

edited

Loading

Default = `inherit`

mbojan commented Oct 23, 2020 •

edited

Loading

tilltnet commented Nov 3, 2020 •

edited

Loading

tilltnet commented Nov 3, 2020 •

edited

Loading