New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of change_on_boundary
in ceiling_date
#443
Comments
As for defaults, I don't mind changing that. I find it ugly as well. But what is so "very^3" bad about it? It's just as bad or good as any other defaults in other functions that change the computation. |
I think it's a suboptimal name currently because it's very long, and it doesn't connect to any bigger concepts. Sure Having an option that changes the behaviour of computation is a terrible idea because it means an option set a very long way away in the code (potentially in another package) can affect computation. (i.e. if you ever use one of these functions in a package, you have to set it every time in order to ensure consistent behaviour). I feel very strongly about this because of many bad experiences with |
I have removed the global option. As to the name, that change has been released and I know for a fact that quite a few people are already using it (there are 5 related issues only on our github page). The naming was discussed in #390 and no-one came with a better name. I don't think long name is a problem. It hardly matters with completion and you can always partial it in interactive work. I also think that if we decide to add similar option to Your interpretation of open/close interval is also not quite a right model of what's happening. For instance when ceiling To wrap up, if we are to change the name it must be much, much better to be worth the pain of change. So I am closing this one for now; feel free to re-open if you have further ideas. |
I don't use the global option personally. It's ok for me. Another solution would be to define a new function name with change-on-boundary being setting to TRUE… |
@hadley, how about |
Hmmm, let's think about this a bit more from first principles. It seems reasonable for repeated invocations of floor or ceiling to always return the same value, i.e.
So I think this does come down to the side of the interval that's closed. Ceiling is usually defined as min(n in Z, n >= x), but when |
Oh that's just In which case does it even need to be an option? |
That would also fix the lack of symmetry with So I'd recommend deprecating |
Hmm. I wonder why didn't this occur to me :/. I was always computing it from the first principles. Deprecation seems to be the right way to go. Some efficiency will be lost but that's not a big deal. |
There is an inconvenience in this approach. First, not all ceiling units have a constructor counterpart ("bimonth", "halfyear"). Second, with multi-unit rounding you would need something like I am also doubting the readability of the code. For a person who is not aware of the boundary issue, this code might not be at all clear. |
Well, the insight should at least make it easier to implement, and I think it argues that However, it's not clear to me that this is such a common operation that lubridate needs to bend over backwards to make it happen, and I wonder if forcing people to realise this is a floor + increment might improve their mental model of the underlying data manipulation. |
I think this makes sense and is very much related to #459. There is no reason to stick to multiples of a unit either; one should be able to do
The use case is primarily for units that don't have natural 0 - months, weeks and multi-days. If your observation is on Monday, you would probably want to shift it towards next Monday together with all other observations within that week. Not changing on boundary rarely makes sense for weeks and months.
I don't see why that would be. "closed" is a shortcut for "closed_restriction", "open" is for "open_restriction", "change" is for "change_on_boundary". I think your chain of thought starts with the ceiling formula as Note that open/closed in this definition refers to the restriction that starts with |
Not sure if my idea is relevant to your topic... One of my observation is that, at least for days, weeks, or months, when we talk about the monthly ceiling of 2016.08.01, what we really want to say is the ceiling of the middle of 2016.08.01, instead of the 2016.08.01 00:00:00. However, what I think this might be the root of the issue... |
That's a valid point and it applies to smaller units as well. Hour is also an interval not an instant. When you think about events occurring in the first hour after midnight you also think about the "middle" between 00:00 and 01:00. So if you would need to round |
@hadley, how about always change on boundary for "months", "weeks" and "days" and removing the argument altogether? I fail to see a single use case when current behavior for these units would be useful. The mathematical argument of I think this would solve the issue, if anyone would ever want the current behavior it can always be achieved as |
I think if you want to do that, you have to give the function a different name, because it's no longer implementing the ceiling. Thinking of it as an (] or a [) interval seems totally naturally to me, but obviously not to anyone else 😓 |
A new function is a good idea, and it's what I do for myself, actually. The name could be something like |
This is open to debate and entirely depends on our confounding of If you define
I wouldn't be that sure about that but we might not have a better solution indeed. Two virtually identical functions one of which seem to be doing a wrong thing and is unusable for practical applications will likely bring more confusion. |
Well technically speaking floating point numbers are also intervals that we discuss as if they were points 😉 |
Not sure what you are trying to say by this. The set of machine floating numbers is well defined and each of them is a point on a real line. Date In any case, I am afraid we need to reach a conclusion on this. Would you have a proposal for a new function? Unless we can find a good name for an extra function I will be changing the semantics. It's all about tradeoffs and my balance is heavily leaning towards changing the semantics. The month ceiling issue has been running for too long. I would like to settle it once and for all. |
I feel very strongly that your proposed change would not be a ceiling function. |
The current version is not a ceiling function either. Technically In any case, this issue is directly connected with the new POSIX <> Date comparison so I am delaying the release till we are fully clarified on the semantics of Date <> POSIX interface. |
I have checked for how other languages deal with rounding. In Java world, C++ boost.date_time and cctz, and python's dateutils don't deal with rounding either. So, leaving alone the thorny issue of converting Dates to Instants, even for ceiling of instants there is little to build upon. |
change_on_boundary
in ceiling_date
I agree. Also, maybe the way of treating |
I think @shrektan, your intuition is right. The stuff is confusing due to an implicit convolution of several things. I am preparing a formal statement on the semantics of lubridate operations which I will be including into the official docs. The problem that you are referring to is the forward looking nature of 1-based units (day,months). If months and dates were 0-based then |
Why is 2001-01-01 00:00 not part of 2001-01-01? |
If something has started at 0 and ended at 0, did it happen? If you include 0 into your time measurement then an abstract event that lasted for 0 seconds must have happened. This doesn't make sense IMO. On the other hand, inclusion of the upper bound into the measurement always makes sense. When you measure 1 meter, do you think of it as upper open interval? When you run for 10sec would you think of it as 9.9999999... seconds? I couldn't find any reference to open/close time intervals on the web. ISO8601 seem to explicitly avoid defining it in mathematical terms. Leaving out 0 provides a clean way to generalize ceiling functions for sets, and it would work both for higher day/month units and smaller HMS units. I don't see such a definition for right closed interval. While R/lubridate don't support other partials than Date, thinking about more "friendly" partials like hours and minutes can help getting more insight. Would you round up first hour of the day to 00:00:00 or to 01:00:00? If the latter, then same should hold for first day of the month. |
I don't understand the reasoning. A day starts at particular time and ends before the start of the next day. 9.999999 repeating is usually consider to be the same as 10 |
I think it would be useful if you used standard interval notation. I think we might be arguing about whether a day is (00:00, 23:59] or [00:00, 23:59). Maybe to make it more clear: is a day (0, 86400] or [0, 86400)? |
You are measuring days (just like kilos, meters or whatever else). Starting from the unix origin you have to measure the passage of first day, then second. etc. So if you accept that first day is (0, 24:00:00] and not [0, 24:00:00) then you should accept the same for all the days that followed.
Right, so it doesn't even make sense to say that mass of water in 1liter is open from above - 1kg). If you say you measured 1kg, it's 1kg. Why cannot you say same for days? 1 day is exactly 84600 seconds from 0 with upper bound included, just like 1kg is 1000g from 0.
Yes, or (00:00, 24:00:00] vs [00:00:00 , 24:00:00). This mathematical aside adds rigor but it ads a small point to the argument. I think intuitive and practical arguments should have already made it clear what the right solution should be. |
Or, a day ends at particular time and next day starts after the end of previous day ;) We are used to treat 00:00:00 is part of the day but that doesn't necessary make it useful or meaningful. Can it be part of the day if the first second of that day hasn't started ticking yet? |
Let say INT is a partial like
This above definition has some interesting properties:
The 3rd implication is defenitely lubridate's stretch, but I guess it's exactly what users would expect.It also provides a way to reconcile ceiling with with the Date<>POSIX comparison for which we are confounding date with 00:00 instant. I think this is the most uniform solution to all our problems. So I would suggest setting change_on_boundary to NULL and make it follow the above semantics. For backward compatibility I would still leave TRUE and FALSE options. Who knows who would ever need them. |
I have converged on a simple algorithmic solution that IMO achieves the right tradeoff between rigor, intuitiveness and usefulness. Due to current date<>instant conversions and R's Date class limitations whichever way we would define ceiling is bound to be problematic in one regard or another.
|
Why is the date interval open ended on both sides ( |
That's intentional. It's just a representation of an interval without focus on open/closed boundaries. Now I think That asymmetric open/closed interval idea brought more complications than it solved. From measure theoretic prospect it doesn't matter anyways - measure of (0,1) and [0, 1] is 1, but defining ceiling in terms of asymmetric interval causes problems with the definition of I think that mental model that most people have in mind is that days are part of the body of the month like in
So rounding up of |
Ok, that sounds reasonable to me. |
This patch caused some tears. "Bug compatibility" matters. But please when you change the behavoir of a function either, Sure, I can read the changelog. Of course, I should not trust lubridate and validate its correct behavoir with a test suite. Remember all that the day when R people decide that read.csv can delete the file after reading, because it is already in memory. And finally thanks for providing a handy library. |
I'm reasonably certain that
change_on_bounary
is really about whether the intervals are[)
or(]
. By correspondence withcut()
,right
would be a better name.I also think it's a really really really bad idea to make the default value an option.
The text was updated successfully, but these errors were encountered: