Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getLowerTimestamp() and getHigherTimestamp() #140

Closed
wants to merge 21 commits into from

Conversation

davidabian
Copy link

@davidabian davidabian commented Jun 25, 2018

Add public functions getLowerTimestamp() and getHigherTimestamp() together with their tests and private functions. They will be used in WikibaseQualityConstraints to fix https://phabricator.wikimedia.org/T195226.

Add functions getLowerTimestamp() and getHigherTimestamp(). They'll be used in WikibaseQualityConstraints.
PS: I've never programmed in PHP nor I'm familiar with this repository, so please feel free to tell me what I may be doing wrong.
@lucaswerkmeister
Copy link
Member

Can you add some tests for this in TimeValueCalculatorTest.php? I haven’t looked at this in detail yet, but I doubt we’d merge this without tests anyways, and they’ll also make it easier to understand the patch and review it :)

@JeroenDeDauw
Copy link
Contributor

JeroenDeDauw commented Jun 25, 2018

This does not follow the WMDE code style which is used in this repo. If you use PhpStorm, then you can find the definition of the style here: https://github.com/wmde/intellij-settings. If not, perhaps someone can run their own PhpStorm formatting on the code for you.

@davidabian
Copy link
Author

Thanks!

@davidabian
Copy link
Author

davidabian commented Jul 2, 2018

Please let's not forget this proposal, :-) we need it for https://phabricator.wikimedia.org/T195226, and I need to resolve that task to start https://phabricator.wikimedia.org/T141859.

What changes should I make now? Ping @thiemowmde.

@thiemowmde
Copy link
Contributor

Can you please link the relevant Phabricator tickets in the commit message, as well as link from the relevant Phabricator tickets to this patch?

As far as I can see this patch contains a lot of changes that are unrelated to what the commit message talks about:

  • Please do reformatting of unrelated code in a separate patch. This helps a lot in reviewing the two independently.
  • Do the constants TIMESTAMP_ZERO and HIGHEST_TIMESTAMP need to be public?
  • Technically the latest possible timestamp is +9999999999999999-12-31T23:59:61Z, as we allow up to two leap seconds.
  • What was wrong with the documentation of getTimestamp? As far as I can see all you did was adding a newline and reflowing the paragraph. Why?
  • Please replace strcmp( substr( $timestamp, 0, 1 ), '-' ) === 0 with something that reads better, e.g. $timestamp[0] === '-'.

I find the "WARNING: Day 31 will be applied to all months" problematic. What it essentially means is that this code is not yet ready, as it will calculate bad timestamps that don't reflect the precision properly. For example, a timestamp at 2018-03-31 with a range of 1 month will probably calculate a range that starts at – I think – 2018-02-31. PHP converts this to 2018-03-04, which is not 1 month before the original date.

As I don't know where exactly this code is supposed to be used, it's hard to suggest any better approach. Why don't you do the calculations where you need them to be done?

@davidabian
Copy link
Author

Thanks for your comments, @thiemowmde!

As I don't know where exactly this code is supposed to be used, it's hard to suggest any better approach. Why don't you do the calculations where you need them to be done?

Because, to calculate the lower and the higher bounds given a Unix timestamp and a certain precision, you would need to go back to the ISO 8601 format. The constraints use Unix timestamps (currently, getTimestamp()); however, these calculations can't be done with Unix timestamps since, with them, you couldn't know how many seconds you would have to add or substract. After the proposed changes on the TimeValueCalculator, I would replace the use of getTimestamp() with getLowerTimestamp() and getHigherTimestamp() (see RangeCheckerHelper.php). Does this make sense? I think we could even deprecate getTimestamp(), but that's a different story.

I'll check and fix the rest of the issues that you write. Thanks again!

@davidabian
Copy link
Author

Sorry for so many commits. I can't use the function cal_days_in_month() on HHVM. Is that a problem?

Copy link
Contributor

@thiemowmde thiemowmde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note I'm no longer responsible for this codebase. However, I will happily provide input, as long as it is helpful.

I also realized the TimeValueCalculator class is no longer used for the project it was originally written for, but exclusively used in WikibaseQualityConstraints. This means the team responsible for WikibaseQualityConstraints should feel free to do any change they like to this library. :-)

1,
-$this->charsAffectedByPrecision( TimeValue::PRECISION_YEAR ) - 1
);
$daysInMonth = \cal_days_in_month( CAL_GREGORIAN, $month, $year );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cal_days_in_month will fail and return false for many years the Wikibase datatype allows, but PHP does not understand (e.g. the year -100000). I suggest to rely on the same logic as in getSecondsSinceUnixEpoch: $dummyYear = $this->isLeapYear( $year ) ? 1972 : 1970.

Also please add test cases for extreme positive and negative years.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll do it. Thanks!

* Highest positive timestamp strictly earlier than the lowest positive timestamp with
* a length of $MAX_LENGTH_TIMESTAMP + 1.
*/
private $HIGHEST_TIMESTAMP = '+9999999999999999-12-31T23:59:59Z';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel this needs to be 61 seconds, even if I'm not sure if this is relevant for anything, because wikidata.org currently does not allow precisions smaller than a day anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works as expected using 59 seconds. Next minute is one second from this moment according to Unix timestamps.

// WARNING: Day 31 will be applied to all months
$timestampCeiling = substr( $timestamp, 0, -$numCharsToModify ) .
substr( $this->HIGHEST_TIMESTAMP, -$numCharsToModify );
if ( $precision === TimeValue::PRECISION_MONTH ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases for this month-specific code path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test for this in testGetLowerTimestamp() and another in testGetHigherTimestamp(). But yes, it would be good to have more, I'll add them.

*
* @return int
*/
private function charsAffectedByPrecision( $precision ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand this method tells me how many characters can be cut off from the end of the timestamp string. Lets say the given timestamp is +2017-12-31T23:59:59Z:

  • To get a timestamp with 1 minute precision, you must cut 3 or 4 characters from the end.
  • To get a timestamp with 1 year precision, you must cut 15 or 16 characters, which results in +2017- or +2017.
  • To get a timestamp with 10 years precision, you must cut 17 characters, which results in +201. No option here.

I found the numbers confusing first, but can confirm them with this analysis. :-)

Maybe add some explanation to the documentation header of this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the number of characters from the timestamp string that we should replace with the ones in $TIMESTAMP_ZERO or $HIGHEST_TIMESTAMP. In other words, the lowest number of irrelevant characters in the timestamp string (being irrelevant for exceeding the precision).

I'll add this to the documentation header. Congrats for the analysis, but you could have asked first! ;-)

TimeValue::CALENDAR_GREGORIAN
);
$unixHigherTimestampAfter2 = $timeValueCalculator->getHigherTimestamp( $timeValueAfter2 );
$this->assertLessThan( $unixHigherTimestampAfter2, $unixHigherTimestampAfter1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see the vast majority of the tests relies on "greater/less than" calculations, but never does strict comparisons. I think this is a problem and hides a lot of possible issues. For example, a timestamp with a precision of a million years might as well give you a bunch of Unix timestamps that are all only a few seconds apart. No assertion here can ever catch this. :-(

Copy link
Member

@lucaswerkmeister lucaswerkmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don’t have merge rights in this repository, so someone else will still have to take a look at this :/

* Highest positive timestamp strictly earlier than the lowest positive timestamp with
* a length of $MAX_LENGTH_TIMESTAMP + 1.
*/
private $HIGHEST_TIMESTAMP = '+9999999999999999-12-31T23:59:59Z';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* @param string $timestamp
* @param int $precision
*/
private function auxTestGetHigherTimestamp( $timestamp, $precision ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used anywhere? I can’t find any uses of it in this pull request using in-browser search.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I should have removed it.

'245-04-30T00:00:00Z',
'1054-02-11T14:00:02Z',
'2012-02-29T23:59:59Z',
#'2012-03-01T00:00:00Z',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one commented out?

Copy link
Author

@davidabian davidabian Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests fail when checking two consecutive BCE dates with the same year. This must be fixed, but I can't easily deal with BCE Unix timestamps, they weren't designed for that. I'm currently not sure what's wrong, the tests or the main functions. Or both. D:

* - higher before values for the same timestamp always correspond to lower or equal
* getLowerTimestamp() values than lower before values,
* - a few TimeValue values are equal to their expected getLowerTimestamp() ones.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if these tests were split up into a lot more functions. For example:

  • One test for checking specific TimeValues and their expected timestamps, with a @dataProvider (note that MediaWiki prefers names like provideTimeValuesWithLowerTimestamps) for the constants that are used below.
  • One test for comparing the direct timestamp with the lower timestamp, with a data provider for timestamps.
  • One test for comparing timestamps at different precisions, with the same data provider.
  • One test for comparing timestamps at different before values, with the same data provider.
  • One test for checking the ordering of timestamps, with a data provider that returns pairs of timestamps.

Note that the data providers can call each other; for example, given a provideTimestampsWithoutSign (same as your timestampWithoutSignProvider), you could have:

public function provideTimestamps() { // any order
    foreach ( [ '+', '-' ] as $sign ) {
        foreach ( $this->provideTimestampsWithoutSign() as $timestamp ) {
            yield [ $sign . $timestamp ];
        }
    }
}

public function provideTimestampsAndPrecisions() {
    foreach ( $this->provideTimestamps() as $timestamp ) {
        foreach ( $this->providePrecisions() as $precision ) {
            yield [ $timestamp, $precision ];
        }
    }
}

public function provideTimestampPairs() {
    foreach ( [ '+', '-' ] as $sign ) {
        $oldTimestamp = null;
        foreach ( $this->provideTimestampsWithoutSign() as $timestamp ) {
            if ( $oldTimestamp !== null ) {
                yield $sign === '+' ? [ $oldTimestamp, $timestamp ] : [ $timestamp, $oldTimestamp ];
            }
            $oldTimestamp = $timestamp;
        }
    }
}

Same thing goes for the getHigherTimestamp tests (they could probably reuse the same data providers).

Copy link
Member

@lucaswerkmeister lucaswerkmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. I’ve also checked the values in provideTimeValuesAndLowerAndHigherTimestamps against GNU date, and the results look correct. If the tests pass, I’m reasonably confident the code is correct :)

Still not happy with the test setup though :P

* @return \Generator of arrays of:
* - a TimeValue,
* - its lower Unix timestamp, and
* - its Unix higher timestamp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be consistent, either “lower/higher Unix timestamp” or “Unix lower/higher” timestamp, but not both :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. xD

public function provideTimeValuesAndLowerAndHigherTimestamps() {
$timeValuesAndLowerAndHigherTimestamps = [
[
new TimeValue( // BCE calculations can differ one year from external ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure what this comment means…

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on whether you consider year 0 or not, the same BCE timestamp could correspond to a year later. I'm assuming the behavior which is already used in Wikidata and implemented in getTimestamp() is the expected, but just for the record.

5999999999999999
],
];
yield $timeValuesAndLowerAndHigherTimestamps;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of yielding the whole array and then iterating over it in the test functions, why not return the array (or yield the individual elements)?

* @dataProvider provideTimeValuesAndLowerAndHigherTimestamps
*/
public function testSpecificLowerTimestamps() {
$timeValuesAndLowerTimestamps = func_get_args();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func_get_args() shouldn’t be necessary. With the change in the provider suggested above, the function should take three arguments: $timeValue, $unixLowerTimestamp and $unixHigherTimestamp. (The third would be ignored, unless you merge testSpecificHigherTimestamps() into this test. Either are fine by me.)

* @dataProvider provideTimeValuesAndLowerAndHigherTimestamps
*/
public function testSpecificHigherTimestamps() {
$timeValuesAndHigherTimestamps = func_get_args();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in testSpecificLowerTimestamps() here.

$precision,
TimeValue::CALENDAR_GREGORIAN
);
yield [ [ $oldTimeValue, $timeValue ] ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should yield [ $oldTimeValue, $timeValue ] directly, I think, without wrapping it in another array.

public function provideTimestampsAndPrecisions() {
foreach ( $this->provideTimestamps() as $timestamp ) {
foreach ( $this->provideAllPrecisions() as $precision ) {
yield [ [ $timestamp, $precision ] ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here…

if ( $oldTimestamp !== null ) {
yield $sign === '+' ?
[ [ $oldTimestamp, $timestamp ] ] :
[ [ $timestamp, $oldTimestamp ] ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…and here.

public function testOrderingLowerTimestamps() {
$calculator = new TimeValueCalculator();
$timestampPairs = func_get_args();
foreach ( $timestampPairs as $timestampPair ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, func_get_args() and the loop should not be necessary. The generator should yield a single array of function arguments, and the test function should accept those directly. The same comment applies to the other test functions below as well.

@lucaswerkmeister
Copy link
Member

I’ve updated the data providers, but I can’t push to this pull request, so I’ve uploaded the patch here: https://gist.github.com/lucaswerkmeister/14a0b55d400345ac3374ee6c2fe14a5d

@davidabian
Copy link
Author

Thanks a lot for your time! Maybe @mariushoch can merge?

@addshore
Copy link
Contributor

Closing very old PRs

@addshore addshore closed this Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants