Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vinecop structure selection #49

Merged
merged 44 commits into from
Mar 9, 2017
Merged

Vinecop structure selection #49

merged 44 commits into from
Mar 9, 2017

Conversation

tnagler
Copy link
Collaborator

@tnagler tnagler commented Mar 1, 2017

So now that there is a function version (unit tests still missing), we can start discussing a couple of design choices:

  1. I removed the options allow_rotations and preselect_families in Vinecop::structure_select() and always use their default (true). They became additional arguments in VineCopula to ensure backwards compatibilty, but we don't have to worry about that.

    1. Excluding specific rotations doesn't make a lot of sense, because if you want to model the dependence of (X,Y), the structure of (-X,Y) should be feasible as well. If somebody wants to be more specific than that, he or she will probably specify the exact model and use fit() (not yet implemented for Vinecop).
    2. The pre-selection is usually leading to the same results than a full selection. Exceptions mostly happen when all parametric models are misspecified anyway. I don't think that having an on/off switch for that feature is very useful for users, so I removed it to keep the API simple.
    3. Also, I reordered the arguments according to how often they're used (in my experience). This makes it a bit more convenient since you cannot pass named arguments in C++, but can omit all trailing arguments that have default values.

    Do you think this is reasonable? Then I'd suggest we make the same modifications to Bicop::select().

  2. I'm not sure if we should follow VineCopula on having two functions for StructureSelect and CopSelect. I find the name StructureSelect a bit misleading because it also selects all families. What do you think about a unified interface select(data, matrix = MatXd(0, 0), ...) similar to kdevine, where specifying the matrix is optional?

tnagler and others added 30 commits January 11, 2017 23:18
* Vincop's data member `vine_matrix_` is now of class RVineMatrix.
* `construct_d_vine_matrix` has been moved to RVineMatrix and declared
as static member function.
Vine matrix utilities refactored
* add simulation method to Vinecop class

* use unaryExpr to generate uniforms in Bicop::simulate

* fixes for requests in in #25

* Declare invert_order in vinecop_class.hpp
* Move to_upper_tri from vinecop_class.cpp to vinecop_class.hpp
* Move is_element from bicop_class.cpp to bicop_class.hpp
* Rename boost_tools.hpp to distribution_tools.hpp
* New function simulate_uniform

* remove misplaced return in simulate_uniform

* vectorized version of Vinecop.simulate()

We avoid exceeding the memory by recursively splitting
the problem into sub-problems of half the size. This only comes into
play when the memory required exceeds 1GB.
@tvatter
Copy link
Collaborator

tvatter commented Mar 1, 2017

  1. Concerning this argument issue:
    1. Agreed.
    2. I'm against this idea. Since the preselection is an arguable heuristic, I would still like to let the choice to remove it. In practical applications, misspecification of all parametric models is plausible.
    3. OK.
  2. Answered to R-Vine matrix order #48.
  3. I completely agree. I always found weird the logic in VineCopula (although I used the same for gamCopula). However, I would use set matrix as the last argument of select for the frequency of usage reason mentioned above.

I will review the code in details ASAP!

@tnagler
Copy link
Collaborator Author

tnagler commented Mar 1, 2017

Concerning ii: The same could be said about removing negatively oriented families whenever tau > 0. Anyway, how about we leave it in, but move it to the last position?

@tvatter
Copy link
Collaborator

tvatter commented Mar 7, 2017

Concerning ii: that is fine with me.

Now:

  1. Although you don't like it, I think that adding a truncate argument is important.
  2. I'm not sure about the name make_pc_store, because you explicitly use pair_copula and pair_copulas in other places. Similarly comment for pc_data in structselect_tools.hpp.
  3. Concerning get_family, could an overloading to return the whole matrix be of interest?
  4. We should find a special place for template functions such as intersect, find_position and others. IMO, is_member, currently in bicop_class.hpp, falls in the same category. Maybe we could create vector_tools.hpp ?
  5. In get_pc_data , what is int n = std::max(tree[v0].hfunc1.size(), tree[v0].hfunc2.size()); doing?
  6. In as_vinecop, is there a way to start from the first tree rather than the last? It would be far more convenient for truncated PCCs... I've tried to think about a solution, but couldn't find one.
  7. In print_pair_copulas, I would refrain from printing the parameter. IMO, information such as rank correlations, tail dependence coefficients and asymmetry measures are far more interesting (especially in the nonparametric case).

@tnagler
Copy link
Collaborator Author

tnagler commented Mar 8, 2017

  1. I do like it, truncation is certainly necessary, I just forgot about it ;) Will add it.
  2. Yeah I also wasn't that happy with it. It was just that make_pair_copula_store and pair_copula_data are quite unhandy. I'll change the first. The second I would keep aspc_data - it's only used internally and the code would become rather clunky otherwise.
  3. Good idea, I'll add it after the pull request in a larger refactoring-PR (including the change of matrix style).
  4. How about std_tools to keep it more flexible?
  5. It's just extracting the sample size. It's a relict of an earlier version though, it was just hedging against one of hfunc1 and hfunc2 being empty. But in the final version, hfunc1 is always specified, so I'll shorten it to int n = tree[v0].hfunc1.size().
  6. I don't think there is a choice. The first column always has to be filled with the numbers of the pair copula in the last tree, which in turn implies which copula of the second-to-last tree fills the second row (and so on). If we start from the bottom we would need to constantly keep track of the columns and switch entries of the matrix. That's probably much more cumbersome.
  7. That was actually just a tool for debugging. It's currently hidden in the structure_select namespace. For now the priority should be to have a nice summary output in the R and python interfaces. I'd worry about the C++ method later.

@tvatter
Copy link
Collaborator

tvatter commented Mar 8, 2017

  1. OK
  2. Although more cumbersome, there is no other way if we want to "break the loop" at a given level for truncated vines (as opposed to the current implement in VineCopula that still "selects" the structure but set the families to independence). This may not be the most urgent priority, but "breaking the loop" will be crucial for high-dimensional applications and we need to keep that in mind.

I've also wondered: is there a way to compute the likelihood of the joint model sequentially as part of the algorithm? I'm asking because, instead of being an argument, the truncation level could/should IMO be selected as part of the model (either by AIC/BIC or by a Vuong-like test). However, this is something that's, to the best of my knowledge, completely missing in VineCopula, although very useful in practice (again, especially for high-dimensional applications).

@tnagler
Copy link
Collaborator Author

tnagler commented Mar 8, 2017

  1. Before we do that, we would need to figure out a way to fill the incomplete matrix. I guess then it would be possible to build the matrix from the truncation level down and fill the missing entries later.

Regarding the other question: I have a non-public version of RVineStructureSelect that does that (with mBIC as selection criterion) and even more (selection of a threshold parameter for Kendall's tau). It is part of an upcoming paper that Claudia will talk about in Munich beginning of April. I intend to implement this in the not-so-far future in C++, if time permits even before the conference, so we can advertise the new library. But it causes an óverhead because the loglikelihood has to be calculate for every pair (which is currently not the case when method = "itau"), so I would prefer having this as a separate function sparse_select() or similar.

tnagler added 3 commits March 8, 2017 17:28
* rename make_pc_store() -> make_pair_copula_store()
* rename structure_select() -> select()
* add truncation_level argument (working)
* add matrix argument (cannot be used for now)
* remove n = std::max(...) line
* matrix argument default had wrong type
* as_vinecop had wrong entry in the trivial column
* as_vinecop sometimes reused the same pair-copulas in another column
tests for select() will be added after changing matrix style
@tnagler tnagler changed the base branch from vinecop-class to master March 8, 2017 20:10
@tnagler tnagler mentioned this pull request Mar 8, 2017
3 tasks
@tnagler
Copy link
Collaborator Author

tnagler commented Mar 8, 2017

I think we should be good to go now (if checks pass). Open issues are summarized in #51, and I'll work on them in a new branch.

@tvatter
Copy link
Collaborator

tvatter commented Mar 9, 2017

Well, we can merge for now, but we are still doing a lot (really, A LOT) of useless stuff when we truncate... IMO, solving this should have a high priority on the "to-do" list and I added this to #51.

@tvatter
Copy link
Collaborator

tvatter commented Mar 9, 2017

I just noticed something weird with the includes:

vinecop_class.hpp is included in structselect_tools.hpp and in vinecop_class.cpp, but then structselect_tools.hpp is also included in vinecop_class.cpp. Maybe you can remove vinecop_class.hpp from vinecop_class.cpp?

@tnagler
Copy link
Collaborator Author

tnagler commented Mar 9, 2017

I'll comment further on the truncation in #51.

Regarding the headers: I could remove it, but I don't think it makes it better. Including the same header twice is almost the same as including it once (because of the ifndef guards). But it's less intuitive to not have vinecop_class.hpp included at the top of vinecop_class.cpp.

@tvatter
Copy link
Collaborator

tvatter commented Mar 9, 2017

Alright, we can merge then!

@tnagler tnagler merged commit bcc9368 into master Mar 9, 2017
@tnagler tnagler deleted the vinecop-structsel branch March 10, 2017 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants