Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small modification of the constructor in ranvec1.h #6

Closed
ghost opened this issue Dec 27, 2021 · 3 comments
Closed

Small modification of the constructor in ranvec1.h #6

ghost opened this issue Dec 27, 2021 · 3 comments

Comments

@ghost
Copy link

ghost commented Dec 27, 2021

Hi,

I was forced to make a small addition to the default constructor of the Ranvec1 class. I couldn't use the provided init(int seed,...) ... functions to initialize with the desired seeds because I used it as a static thread_local object. So the seed initialization must happen in this case in the constructor. Is this something that could be changed in your repository and can be of any use in other projects?

thanks

#if defined(DNN_AVX512BW) || defined(DNN_AVX512)
	typedef Vec16f VecFloat;
	constexpr auto VectorSize = 16ull;
#elif defined(DNN_AVX2) || defined(DNN_AVX)
	typedef Vec8f VecFloat;
	constexpr auto VectorSize = 8ull;
#elif defined(DNN_SSE42) || defined(DNN_SSE41)
	typedef Vec4f VecFloat;
	constexpr auto VectorSize = 4ull;
#endif

inline static auto BernoulliVecFloat(const Float p = Float(0.5)) noexcept
{
	static thread_local auto generator = Ranvec1(Seed<int>(), static_cast<int>(std::hash<std::thread::id>()(std::this_thread::get_id())), 3);
#if defined(DNN_AVX512BW) || defined(DNN_AVX512)
	return select(generator.random16f() < p, VecFloat(1), VecFloat(0));
#elif defined(DNN_AVX2) || defined(DNN_AVX)
	return select(generator.random8f() < p, VecFloat(1), VecFloat(0));
#elif defined(DNN_SSE42) || defined(DNN_SSE41)
	return select(generator.random4f() < p, VecFloat(1), VecFloat(0));
#endif
}

small addition to the default constructor (ranvec1.h beginning at line 290) :

/******************************************************************************
        Ranvec1: Class for combined random number generator

Make one instance of Ranvec1 for each thread.
Remember to initialize it with a seed. 
Each instance must have a different seed if you want different random sequences
******************************************************************************/

// Combined random number generator. Derived class with various output functions
// (Total size depends on INSTRSET and MAX_VECTOR_SIZE)
class Ranvec1 : public Ranvec1base {
public:
    // Constructor
     Ranvec1(int gtype = 3) : Ranvec1base(gtype), buf32(this), buf64(this), buf128(this)
#if MAX_VECTOR_SIZE >= 256
    , buf256(this)
#endif
#if MAX_VECTOR_SIZE >= 512
    , buf512(this)
#endif
    {
        randomixInterval = randomixLimit = 0;
    }
    Ranvec1(int seed1, int gtype = 3) : Ranvec1base(gtype), buf32(this), buf64(this), buf128(this)
#if MAX_VECTOR_SIZE >= 256
        , buf256(this)
#endif
#if MAX_VECTOR_SIZE >= 512
        , buf512(this)
#endif
    {
        randomixInterval = randomixLimit = 0;
        Ranvec1base::init(seed1);
        resetBuffers();
    }
    Ranvec1(int seed1, int seed2, int gtype = 3) : Ranvec1base(gtype), buf32(this), buf64(this), buf128(this)
#if MAX_VECTOR_SIZE >= 256
    , buf256(this)
#endif
#if MAX_VECTOR_SIZE >= 512
    , buf512(this)
#endif
    {
        randomixInterval = randomixLimit = 0;
        Ranvec1base::init(seed1, seed2);
        resetBuffers();
    }
    Ranvec1(int32_t const seeds[], int numSeeds, int gtype = 3) : Ranvec1base(gtype), buf32(this), buf64(this), buf128(this)
#if MAX_VECTOR_SIZE >= 256
        , buf256(this)
#endif
#if MAX_VECTOR_SIZE >= 512
        , buf512(this)
#endif
    {
        randomixInterval = randomixLimit = 0;
        Ranvec1base::initByArray(seeds, numSeeds);
        resetBuffers();
    }
@AgnerF
Copy link
Contributor

AgnerF commented Dec 27, 2021

It is not efficient to make a thread-local static object. It is more efficient to construct the object in the thread function as explained in the manual (ranvec1_manual.pdf example 2.2).

Your BernoulliVecFloat function can have a reference to the ran object as a parameter, or BernoulliVecFloat could be member of a class that inherits from ran.

Your solution is OK if the performance is satisfactory. My design does not have initialization in the constructor because there are three different initialization methods, and it may be convenient for the programmer to calculate the seed after constructing the ran object.

I can add an extra constructor with a seed parameter in the next version.

@ghost
Copy link
Author

ghost commented Jan 11, 2022

Thank you very much for your reply and sorry for the long wait!
I followed your advice and changed my code as explained in the ranvec1 manual.
It works perfectly and the performance is very good but I must admit I don't notice a real increase in execution speed either compared to the thread_local static object. But if I should write some specific testing code between the two I probably will notice some difference in efficiency. It's certainly not slower either.

@AgnerF
Copy link
Contributor

AgnerF commented Jul 16, 2022

done

@AgnerF AgnerF closed this as completed Jul 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant