Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the DB columns be indexed? I noticed none of the example code was. #7

Closed
Skylion007 opened this issue Apr 25, 2016 · 3 comments

Comments

@Skylion007
Copy link

Skylion007 commented Apr 25, 2016

So, I am trying to experiment on using RecDB to deal with a MASSIVE database with about 170 million entries and have had some rather slow query times, as you can imagine. I was wondering if indexing userID or itemID columns of my database would provide any noticeable speed advantages since none of the examples scripts appeared to create indexes on the DB. Also for a DB of a 170 million reviews what would you expect the runtime for creating the APRIORI recommender? Hours? Days? Weeks?

Any information would be extremely helpful.

@Sarwat
Copy link
Collaborator

Sarwat commented Apr 25, 2016

Have you tried creating a recommender apriori. This will reduce the response time tremendously. Creating a recommender on a 1 million ratings takes about ~200 seconds on a machine with 3.6 Ghz Quad-Core processor, 16 GB RAM, 500 GB storage, and running Ubuntu Linux 12.04. Make sure to run the experiments on a machine with large memory for 170 million ratings. Otherwise, PostgreSQL will run out of memory.

@Skylion007
Copy link
Author

I see. Are you aware of which settings in PostgreSQL I can use to optimize my DB for the certain eventuality where we will have to rely heavily on PostgreSQL tmp files? One of the real attractions of RecDB is the ability to complete queries, all be it rather slowly, by writing data to disk meaning the massive amounts RAM would be unnecessary for non-time sensitive recommendation queries. Maybe I'll try to create the recommenders APRIORI using a high ram EC2 instance and then copy the DB to run the recommenders locally. Any other suggestions? Thank you so much for your informative and speedy reply.

@Sarwat
Copy link
Collaborator

Sarwat commented Apr 25, 2016

You need to set up the PostgreSQL buffer size to be as large as possible.
Once the recommender is created, the recommendation queries are not supposed to take too much memory or time to run. Plus, you can totally pre-compute the recommendation queries answer and store them in a different table for your application to access them directly (and faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants