-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
March data dump #881
Comments
We'll probably do another dump in March. On Tue, Feb 16, 2016 at 1:11 PM, klop notifications@github.com wrote:
|
Maybe with skill data this time!
|
With skill data would be awesome. |
do we want to make this a quarterly or semiannual thing? |
Pushing back because we're doing import right now. |
Posting to say this would be good quarterly (unless you get the BigQuery thing updating live). Will you post a blog post when the next dump happens? |
If it were up to me I'd probably do semiannual but if @albertcui wants to do it quarterly I won't say no (he's the one having to export/upload the data anyway). Regarding future dumps: We could possibly also get away with not keeping snapshots in Google as well (that would save nearly $100 a month). ETA for import is 10-15 days. |
That'd be great: I'd love to be able to query a db about matches (like the official api, but not limited to the last x hundred games). If I have to download a massive file first that's not really a problem. I take it opening up an api of your own would have too high a bandwidth overhead? |
Yeah, APIs are expensive to operate. |
I'm very interested in using the MMR data for machine learning, is this added in this data dump? I suspect one can estimate a players MMR up to a very high accuracy. |
@albertcui are you planning to dump player_ratings? Or perhaps export a "snapshot" of current MMR data? |
I think would be nice if dumps are somewhat synchronized with Majors. This way they could be release in "know" intervals and somewhat related with big updates. |
import is done. Been talking with @albertcui about doing a full dump this time (with every match ever played). We'd dump matches, player_matches, and match_skill as CSV. Users would have to join the data themselves. |
Sounds good |
@albertcui I put sample queries in the OP. You may want to try them locally on your devbox first to make sure they work properly. |
|
Just got too frustrated and wanted to take a break from this, especially since I was traveling. Getting back home tomorrow, going to build a new PC with a bigger disk, will loop back around on this and learn more about torrents sometime this month. I might even try to host my own torrent tracker, might be a good learning experience. |
If you just put the blobs on Google Cloud Storage and shared the download links, would you be able to pay for the download bandwidth/storage on your personal account? Or should we wait until we get a torrent working before making a blog post/public announcement? |
I know you guys haven't finish dealing with the original dump, but any chance for a fresh one? Like the last month or something. It would be really useful given the dramatic changes of 700. |
unfortunately the old code we used for dumps doesn't work anymore since the move to Cassandra. No telling when we'll be able to get a new migration script working. I think @waprin wants to eventually get something set up where match data is directly streamed to BigQuery. If we get that working then it would probably be the best place to obtain fresh data dumps. |
I'm unfamiliar with Cassandra, but there seems to be CAPTURE command that would export the result of queries. Couldn't find any details about performance. Maybe that could do it? I really like the idea of streaming data to BigQuery, but does seem to be happening any time soon. The original issues was about just a slice of data. I'm looking for the same thing a year later and it seems to be even further from happening. |
Depending on how many matches you need, you can use the API to fetch the
match data for randomly selected matches in the time window you want.
…On Thu, Jan 26, 2017 at 7:07 AM, Rossen Georgiev ***@***.***> wrote:
I'm unfamiliar with Cassandra, but there seems to be CAPTURE command that
would export the result of queries. Couldn't find any details about
performance. Maybe that could do it?
I really like the idea of streaming data to BigQuery, but does seem to be
happening any time soon. The original issues was about just a slice of
data. I'm looking for the same thing a year later and it seems to be even
further from happening.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#881 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC_UOKx2R0dRSLrT0QDA3fIdnFeusA4Oks5rWLawgaJpZM4HbZFE>
.
|
I just ordered a new HDD so I can just seed from home as soon as it arrives. Torrent does seem like the best option. Would like to stream matches directly from API into Bigquery so will look into that. |
I've recently scrapped all dota matches for January and I'm making it available as a torrent. It's 33mil matches without the dark moon ones. |
Downloaded and created torrent files from #881 (comment). Files:
|
@albertcui, I believe my links should be good to go as long as you upload the files to academictorrents. It should resolve this error in my client: |
Is there anyone seeding the files from @bippum , and is there some sample data available? I've got plenty of space and a home server that can seed 24/7 at 200mbit/s, but I first need to download the data :) . I will leave the torrents provided above running for now, hoping that someone can share them. If there are other torrents available, please let me know. |
The OP has small sample datasets. @albertcui can you please upload the torrents to academic torrents? |
I've uploaded matches + match_skill. It won't let me update player_matches: "Sorry, the piece length is too small. The torrent file must be less than 2MB. Increase your piece length to lower the file size" :( For reference, all the torrents are in this collection: http://academictorrents.com/collection/opendota-formerly-yasp-data-dumps |
I'll attempt to create the torrent again within the day. |
Thanks, sorry for the delay. Did uploading the other ones fix the error? |
Yes it did. |
Great, thanks guys, I'm currently downloading the files and will continue to seed them :). Good work! |
Glad to hear you are able to download them OK. I switched torrent clients (from Transmission to Deluge) after having difficulty getting Transmission to do anything, let alone upload. Now I wake up to see that match_skill and matches are seeding! I updated the player_matches link with a 4 MiB piece torrent, @albertcui if you could upload that one to academictorrents? Thanks. If this one doesn't work I can try creating it again with 8 MiB pieces. |
Awesome, once they're all up we can write a release blog post and then we can finally close this! :) |
I've uploaded it here: http://academictorrents.com/details/1a0c5736bb54610ad00a45306df2b33628301409 |
Blog post published: |
Hi there guys! I'm currently downloading the "OpenDota - All Matches from March 2016 - Matches" |
Hello, would anyone be willing to seed player_matches.gz? Haven't been able to find a seeder for a week now and could really use this data for a project! Thanks in advance! |
The copy I had got corrupted during a transfer between hard drives. I no longer have the original files from the amazon cloud drive location, and they aren't obtainable either. Sorry for the inconvenience. |
Alright no problem, thanks for the response! |
I wonder if anyone can still seed matches.gz and player_matches.gz? I'm interested in the data for a research project. Thanks. |
Is there any way to get a dump of 6.86 matches only? All I could find were the 500k Dec 2015 and 3.5M dumps.
https://storage.googleapis.com/dota-match-dumps/matches_small.csv
https://storage.googleapis.com/dota-match-dumps/player_matches_small.csv
https://storage.googleapis.com/dota-match-dumps/match_skill.csv
The text was updated successfully, but these errors were encountered: