Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divsufsort64 is slower than divsufsort #21

Open
louisabraham opened this issue Feb 23, 2020 · 10 comments
Open

divsufsort64 is slower than divsufsort #21

louisabraham opened this issue Feb 23, 2020 · 10 comments

Comments

@louisabraham
Copy link

louisabraham commented Feb 23, 2020

For very small strings (10), the difference is about 2x. For strings of moderate size (1e7), the difference in time is about 10%.
I have no idea why and the difference is significant.

@kloetzl
Copy link
Contributor

kloetzl commented Feb 24, 2020

For the long strings that might be a simple case of caching effects. Two times larger indices require twice the memory, filling up the CPU caches faster. However, that should not affect strings of length 10.

@louisabraham
Copy link
Author

Hi, my original observation were made in Python (using ctypes).

I can confirm with this code in C:

#include "divsufsort.h"
#include "divsufsort64.h"

#include <time.h>
#include <stdio.h>

int main(){
    char s[6] = "banana";
    
    int64_t suf64[6];
    int32_t suf[6];
    
    int n = 100;
    
    clock_t beg, t1=0, t2=0;
    
    
    for(int rep=0;rep<10; rep++){

        beg = clock();
        for(int _=0;_<n;_++)
            divsufsort(s, suf, 6);   
        t1 += clock() - beg;
        
        beg = clock();
        for(int _=0;_<n;_++)
            divsufsort64(s, suf64, 6);   
        t2 += clock() - beg;
    }
    
    printf("divsufsort:   %d\ndivsufsort64: %d\n", t1, t2);
}

An example run gives:

divsufsort:   792718
divsufsort64: 1031990

I'm using a very slow machine, you might want to change n to reproduce.

@sisong
Copy link

sisong commented Feb 25, 2020

“a very slow machine” “difference in time”
because of 32bit cpu?

@louisabraham
Copy link
Author

No, no, I know a bit how computers work :)

I have a 64 bit ARM processor.

@sisong
Copy link

sisong commented Feb 25, 2020

test your code on my macos:

divsufsort:   122951
divsufsort64: 166432

swap to sais.hxx:

saisxx(s,suf,(int32_t)6):   799
saisxx(s,suf64,(int64_t)6): 801

@kloetzl
Copy link
Contributor

kloetzl commented Feb 25, 2020

swap to sais.hxx:

saisxx(s,suf,(int32_t)6): 799
saisxx(s,suf64,(int64_t)6): 801

These numbers smell fishy. My guess is that the sais function is only in the header, but after being called its result is not used. Thus those numbers represent the overhead of clock() not the SA construction. For reliable measurements you may want to try https://github.com/google/benchmark

@louisabraham
Copy link
Author

I don't understand

the sais function is only in the header, but after being called its result is not used

Why would it behave differently from divsufsort?

@kloetzl
Copy link
Contributor

kloetzl commented Feb 25, 2020

The divsufsort function is in a different compilation unit, actually in a dynamic library. Thus, at the time of compilation, the compiler does not know what it does. It could print the whole string, for all it knows, so it cannot remove the call without changing the observable behavior of the program.

@louisabraham
Copy link
Author

I see, you mean that since sais is a header only library, the compiler is able to optimize the calls.
The issue is still the same for libdivsufsort.

@louisabraham
Copy link
Author

For what it's worth, I just reproduced this benchmark on my M1 machine.

#include "divsufsort.h"
#include "divsufsort64.h"

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

int LEN = 1000000;
int main()
{
    char *s = (char *)malloc(LEN * sizeof(char));
    for (int i = 0; i < LEN; i++)
        s[i] = 'a';

    int64_t *suf64 = (int64_t *)malloc(LEN * sizeof(int64_t));
    int32_t *suf = (int32_t *)malloc(LEN * sizeof(int32_t));

    int n = 10;

    clock_t beg, t1 = 0, t2 = 0;

    for (int rep = 0; rep < 10; rep++)
    {

        beg = clock();
        for (int _ = 0; _ < n; _++)
            divsufsort(s, suf, LEN);
        t1 += clock() - beg;

        beg = clock();
        for (int _ = 0; _ < n; _++)
            divsufsort64(s, suf64, LEN);
        t2 += clock() - beg;
    }

    printf("divsufsort:   %d\ndivsufsort64: %d\n", t1, t2);

    // Don't forget to free the memory!
    free(s);
    free(suf64);
    free(suf);

    return 0;
}

The compiler command is gcc -o test test.c -Iinclude -Llib -ldivsufsort -ldivsufsort64.

divsufsort:   444204
divsufsort64: 451407

The gap disappears for at LEN = 1e6. For LEN = 1e5, there is still a small difference:

divsufsort:   50847
divsufsort64: 55236

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants