Description
I am using patsy
as a key dependency in a stats project, and I found myself needing to identify which variables are categorical after constructing a dataframe using patsy formulas.
After an attempt using regexps ("...now you have two problems..."), I read Model specification for experts and computers a few times, and spent a lot of time poking around in X.design_info
(where y, X=dmatrices(formula, data, return_type='dataframe')
). Thankfully I ended up with something much shorter and more robust than my regexps attempt.
I have two questions:
-
I'm still not sure if I've used the interiors details of
X.design_info
correctly -- it does what I want but there are places where multiple things provide the same info. I'd love to have someone "in the know" look at the function and tell me if I should make a different choice. Is there a way to do this? (Counting comments the function is ~60 lines; not counting comments it is about 30 lines). -
Is there any interest in having something like this contributed back to the project? I've commented and unit tested the function already, and happy to make sure final comments/tests conform to your norms & standards. I skimmed the issues before posting, and for example it appears this issue patsy equivalent of R's all.vars #155: patsy equivalent of R's all.vars might benefit from my function (not exactly the same but perhaps close enough).