Custom Replay Parsing (uploads) #830

sozuoka · 2016-01-15T05:56:38Z

Back in early days, YASP had the option for users to upload their own replay, but that feature was removed later. Is it possible to re-add it? It's a valuable feature for small communities, so they can parse their private lobby matches. I think if it's a exclusive feature for Cheese users, it won't put too much stress on YASP's servers.

TODO:

build frontend form on request page to accept files
accept uploads on server side (web.js), set max upload size to prevent abuse
figure out storage of replay blobs (Redis)
extend parser.js/runParse.js to accept uploaded replays (not downloaded from a URL)
figure out storage of result blobs and modify parser to insert into Redis
figure out reconstruction of API fields
figure out how to route to the match and display using same templates as regular matches (avoid duplicating code), but reading from Redis

howardchung · 2016-01-15T06:39:53Z

It's not as much the load as it is that the API doesn't have data for these matches. We use the API for the "Overview" tab which we expect to be there on all matches.

Even when we allowed upload it did not work on lobbies. It was mainly useful for parsing replays that had already expired but someone downloaded it first.

Options:

Client-side parse:
- We have the replay parsed in the browser via JavaScript and displayed in JS views. The data would reside completely on the user's computer (no upload), so we don't have to worry about inaccurate/weird match data going into the DB.
- However, you wouldn't be able to share a link to the match. You'd have to do something like save the HTML or take screenshots.
- The annoying part is having to rebuild all the views since the Jade templates will not work for client-parsed data. The parser would also have to be rewritten to use rapier (the JS parsing library I never finished).
Server-side parse:
- Have to regenerate the API data from the replay parse and probably wouldn't be completely accurate
- Bit of a privacy concern since the players lose anonymity (option, make all the account_ids anonymous so we don't screw up aggregations/reveal identities)
- We have to handle bad inputs gracefully (people trying to upload really big files, etc).
- If we did this I would like to sandbox these from the rest of the matches so we don't mess up aggregations, etc.
  - Example, someone could play and win a ton of lobby matches by getting their opponents to throw. Then they mass-upload these to inflate their stats.

I'll leave it in the backlog, feel free to share ideas.

howardchung · 2016-01-15T06:50:43Z

Server side is probably more practical.

howardchung · 2016-01-16T16:44:43Z

After some more thought, we can probably do the following:

Upload replay as binary blob into redis
Extend parse jobs to accept inputs stored as binary blobs
Save uploaded parse into temporary storage (Redis also?)
Make these parses reachable through their own url, maybe something like (/uploads/random string)
Do best-effort computation of the overview data from the parse

The uploaded parse would only be available for however long we set the retention to, and will be sandboxed from aggregations.

Does that meet your needs?

sozuoka · 2016-01-16T17:28:30Z

That would be great! At first I didn't think that this task requires that much work on you guys,and I really appreciate your reply. Setting a limited storage time for the parsed custom replays is the right choice too, since YASP has to manage a lot of public replays already.

howardchung · 2016-01-18T02:25:10Z

We have to reconstruct some fields from the replay data since we won't have API access for lobby matches:

{
result: {
error: "Practice matches are not available via GetMatchDetails"
}
}

Things we have to patch back into the match from the replay
hero_id (otherwise you cant figure out what hero goes in each row)
duration (purchase/kill timelines rely on this to know how many columns to render)

Things we could do optionally:
kills
deaths
assists
last_hits
denies
gpm
xpm
items (more work since parser does not handle items atm)

Things we probably won't:
account_id (privacy issue)
game_mode
lobby_type
cluster
ability_upgrades (hard)
tower_status/barracks_status

sozuoka · 2016-01-18T13:10:41Z

I don't know much about this, but logically account_id, game_mode, lobby_type are out question due to the nature of practice match/private lobby. But is ablility_upgrades really that hard to parse?

howardchung · 2016-01-18T13:16:15Z

Account ID is actually possible. The replay data contains the information. I believe it has game mode as well.

For ability upgrades presumably the data is there but we would be starting from scratch since that's something we haven't had to implement before. @spheenik may know more.

howardchung · 2016-01-20T17:37:26Z

Almost done. Need to make the form look nice and integrate with the existing request form. Also needs kubernetes migration in order to give parse nodes access to Redis.

spheenik · 2016-01-20T21:23:42Z

Do you still need me? I don't know what you mean by ability upgrade...

howardchung · 2016-01-20T21:24:39Z

Skill build order
On Jan 20, 2016 1:23 PM, "Martin Schrodt" notifications@github.com wrote:

Do you still need me? I don't know what you mean by ability upgrade...

—
Reply to this email directly or view it on GitHub
#830 (comment).

spheenik · 2016-01-20T21:29:20Z

You mean the time when an ability was skilled? Should be easy...
Listen for OnEntityUpdated, check if the entity is an ability, if so, check if one of the properties updated is the level of the ability, if so, emit a record.

sozuoka · 2016-01-21T00:23:03Z

That's good to hear, I'm really looking to testing this new feature myself!

howardchung · 2016-01-21T00:51:50Z

I think for now we will just leave most of the overview blank. I'm guessing you're more interested in the parse data and you can get the overview from the client anyway

howardchung · 2016-01-22T02:08:26Z

Pull request made. Let's get some eyes on the code? We cannot deploy to production until kube migration is complete.

sozuoka · 2016-01-22T14:25:43Z

I have two small questions: 1) as I understand, normally hero damage/tower damage is included in data given by game API. Does that mean if we want to have it for custom replay parsing, hero damage/tower damage have to be "manually" calculated? 2) Is it possible to make a movement heatmap for the whole game's duration (instead of only first 10 mins right now)? This is actually my suggestion for the public parser too (unless if it put too much load/stress on your servers, of course).

howardchung · 2016-01-22T15:16:17Z

Yes
Full game position data is expensive to store because it takes up a lot of space.

howardchung · 2016-01-28T04:30:18Z

I was able to add some of the data such as level, K, D, A, gpm, xpm. Some is still missing such as HD, TD, HH, items, ability upgrades, and tower status.

howardchung · 2016-01-28T04:34:45Z

Current design:

We take the uploaded file and MD5 hash it to generate a key.
We use this key for storing the replay blob in Redis, the JSON match blob, and for routing the user to the match.
Replay blob expires after an hour.
JSON match blob expires after 7 days.
parser.js "mocks" the URL by serving the Redis blob though an Express server (localhost)
Example: /matches/99bd628c72e82b5e8524ae661ca417d7

sozuoka · 2016-01-28T14:48:58Z

From my knowledge, that seems like quite a lot of progress so far :D But imo only 1 hour of replay blob storage is a bit too tight, do you think extending it to 2 or 3 hours is possible?

howardchung · 2016-01-28T17:24:57Z

We don't need the blob for very long, just long enough to parse it.

howardchung · 2016-01-28T22:08:30Z

http://colab-sbx-244.oit.duke.edu:5000/request if you want to beta test

sozuoka · 2016-01-29T17:21:12Z

Oh wow, this is much better than I expected. I would say you can release this feature as it is right now, but regardless, I still have a few questions:
1 - I don't know about Hero Healing and Tower Damage, but isn't it easier to calculate hero damage by combining the stats provided by the combat log - like in Combat tab?
2 - Is it possible to add drafting info (the ban/pick order and specific heroes) of Captain's Mode? This is actually what I had in mind when I first requested the feature.
3 - Are you planning to add the abilities upgrade, or just omit it completely? Personally I think it's great if we can have it, but it's not top priority either.

howardchung · 2016-01-29T17:26:26Z

This requires Redis access on parse nodes so we need to complete the Kubernetes migration first. I think we're going to try to do that this weekend.

I guess we can attempt to sum the hero damage from the combat log, but in general that differs from the damage reported in the API for some reason. I'd rather be consistent, and people can manually sum the damage if they want a ballpark figure.
I think the replay does have drafting information in the "epilogue" packet but it needs to processing to extract since it's in a weird format. Maybe I'll look into it later.
I don't really want to add the ability upgrades data. It adds quite a bit of overhead to the parser since we need to check ability entities (which we don't do at all right now).

muratsu · 2016-01-29T17:32:05Z

Did you guys think about malicious demo files? I'm a bit concerned about trusting 3rd parties completely. Im pretty sure demo parsers werent coded with security in mind.

howardchung · 2016-01-29T17:33:07Z

Might be a question for @spheenik. I don't think there is any risk in running any binary through clarity. It'll just crash if it's malformed--we're not treating the uploaded file as an executable.

muratsu · 2016-01-29T17:34:48Z

Im more concerned about denial type of attacks (infinite loops, etc)

howardchung · 2016-01-29T17:36:51Z

As far as I know clarity will keep advancing the file stream pointer as it parses, so this can't happen unless the file is of infinite size.

I set a limit of 100MB on uploads. Some replays are larger, but those are generally 2h+ games, which are pretty rare.

spheenik · 2016-01-29T19:20:35Z

I am certain clarity can be crashed in a gazillion ways when handing it malicious data.
A dota replay is a very fragile thing: a single bit off, and you probably can't recover.

I have a hard time thinking of real exploits. Most times, it'll just run into an error, and stop processing.
You could certainly craft data that will make clarity read as much data as you specified, but with the upload limit, each attempt is limited to 100MB. Making it loop forever and eat cpu uselessly might also be possible, but one would have to take a look at how to craft it.

howardchung · 2016-02-07T06:38:43Z

I think we're going to roll this out next week.

howardchung added this to the Backlog milestone Jan 15, 2016

howardchung changed the title ~~Custom Replay Parsing~~ Custom Replay Parsing (uploads) Jan 16, 2016

howardchung modified the milestones: 2015-5, Backlog Jan 17, 2016

howardchung self-assigned this Jan 17, 2016

howardchung modified the milestones: 2015-3, 2015-5 Jan 19, 2016

howardchung mentioned this issue Jan 21, 2016

Upload #843

Merged

howardchung added the in progress label Jan 21, 2016

howardchung added feature and removed in progress labels Feb 8, 2016

albertcui closed this as completed in #843 Feb 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Replay Parsing (uploads) #830

Custom Replay Parsing (uploads) #830

sozuoka commented Jan 15, 2016

howardchung commented Jan 15, 2016

howardchung commented Jan 15, 2016

howardchung commented Jan 16, 2016

sozuoka commented Jan 16, 2016

howardchung commented Jan 18, 2016

sozuoka commented Jan 18, 2016

howardchung commented Jan 18, 2016

howardchung commented Jan 20, 2016

spheenik commented Jan 20, 2016

howardchung commented Jan 20, 2016

spheenik commented Jan 20, 2016

sozuoka commented Jan 21, 2016

howardchung commented Jan 21, 2016

howardchung commented Jan 22, 2016

sozuoka commented Jan 22, 2016

howardchung commented Jan 22, 2016

howardchung commented Jan 28, 2016

howardchung commented Jan 28, 2016

sozuoka commented Jan 28, 2016

howardchung commented Jan 28, 2016

howardchung commented Jan 28, 2016

sozuoka commented Jan 29, 2016

howardchung commented Jan 29, 2016

muratsu commented Jan 29, 2016

howardchung commented Jan 29, 2016

muratsu commented Jan 29, 2016

howardchung commented Jan 29, 2016

spheenik commented Jan 29, 2016

howardchung commented Feb 7, 2016

Custom Replay Parsing (uploads) #830

Custom Replay Parsing (uploads) #830

Comments

sozuoka commented Jan 15, 2016

howardchung commented Jan 15, 2016

howardchung commented Jan 15, 2016

howardchung commented Jan 16, 2016

sozuoka commented Jan 16, 2016

howardchung commented Jan 18, 2016

sozuoka commented Jan 18, 2016

howardchung commented Jan 18, 2016

howardchung commented Jan 20, 2016

spheenik commented Jan 20, 2016

howardchung commented Jan 20, 2016

spheenik commented Jan 20, 2016

sozuoka commented Jan 21, 2016

howardchung commented Jan 21, 2016

howardchung commented Jan 22, 2016

sozuoka commented Jan 22, 2016

howardchung commented Jan 22, 2016

howardchung commented Jan 28, 2016

howardchung commented Jan 28, 2016

sozuoka commented Jan 28, 2016

howardchung commented Jan 28, 2016

howardchung commented Jan 28, 2016

sozuoka commented Jan 29, 2016

howardchung commented Jan 29, 2016

muratsu commented Jan 29, 2016

howardchung commented Jan 29, 2016

muratsu commented Jan 29, 2016

howardchung commented Jan 29, 2016

spheenik commented Jan 29, 2016

howardchung commented Feb 7, 2016