Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Replay Parsing (uploads) #830

Closed
7 tasks done
sozuoka opened this issue Jan 15, 2016 · 29 comments · Fixed by #843
Closed
7 tasks done

Custom Replay Parsing (uploads) #830

sozuoka opened this issue Jan 15, 2016 · 29 comments · Fixed by #843
Assignees
Labels
Milestone

Comments

@sozuoka
Copy link

sozuoka commented Jan 15, 2016

Back in early days, YASP had the option for users to upload their own replay, but that feature was removed later. Is it possible to re-add it? It's a valuable feature for small communities, so they can parse their private lobby matches. I think if it's a exclusive feature for Cheese users, it won't put too much stress on YASP's servers.

TODO:

  • build frontend form on request page to accept files
  • accept uploads on server side (web.js), set max upload size to prevent abuse
  • figure out storage of replay blobs (Redis)
  • extend parser.js/runParse.js to accept uploaded replays (not downloaded from a URL)
  • figure out storage of result blobs and modify parser to insert into Redis
  • figure out reconstruction of API fields
  • figure out how to route to the match and display using same templates as regular matches (avoid duplicating code), but reading from Redis
@howardchung
Copy link
Member

It's not as much the load as it is that the API doesn't have data for these matches. We use the API for the "Overview" tab which we expect to be there on all matches.

Even when we allowed upload it did not work on lobbies. It was mainly useful for parsing replays that had already expired but someone downloaded it first.

Options:

  • Client-side parse:
    • We have the replay parsed in the browser via JavaScript and displayed in JS views. The data would reside completely on the user's computer (no upload), so we don't have to worry about inaccurate/weird match data going into the DB.
    • However, you wouldn't be able to share a link to the match. You'd have to do something like save the HTML or take screenshots.
    • The annoying part is having to rebuild all the views since the Jade templates will not work for client-parsed data. The parser would also have to be rewritten to use rapier (the JS parsing library I never finished).
  • Server-side parse:
    • Have to regenerate the API data from the replay parse and probably wouldn't be completely accurate
    • Bit of a privacy concern since the players lose anonymity (option, make all the account_ids anonymous so we don't screw up aggregations/reveal identities)
    • We have to handle bad inputs gracefully (people trying to upload really big files, etc).
    • If we did this I would like to sandbox these from the rest of the matches so we don't mess up aggregations, etc.
      • Example, someone could play and win a ton of lobby matches by getting their opponents to throw. Then they mass-upload these to inflate their stats.

I'll leave it in the backlog, feel free to share ideas.

@howardchung howardchung added this to the Backlog milestone Jan 15, 2016
@howardchung
Copy link
Member

Server side is probably more practical.

@howardchung
Copy link
Member

After some more thought, we can probably do the following:

  • Upload replay as binary blob into redis
  • Extend parse jobs to accept inputs stored as binary blobs
  • Save uploaded parse into temporary storage (Redis also?)
  • Make these parses reachable through their own url, maybe something like (/uploads/random string)
  • Do best-effort computation of the overview data from the parse

The uploaded parse would only be available for however long we set the retention to, and will be sandboxed from aggregations.

Does that meet your needs?

@howardchung howardchung changed the title Custom Replay Parsing Custom Replay Parsing (uploads) Jan 16, 2016
@sozuoka
Copy link
Author

sozuoka commented Jan 16, 2016

That would be great! At first I didn't think that this task requires that much work on you guys,and I really appreciate your reply. Setting a limited storage time for the parsed custom replays is the right choice too, since YASP has to manage a lot of public replays already.

@howardchung howardchung modified the milestones: 2015-5, Backlog Jan 17, 2016
@howardchung howardchung self-assigned this Jan 17, 2016
@howardchung
Copy link
Member

We have to reconstruct some fields from the replay data since we won't have API access for lobby matches:

{
result: {
error: "Practice matches are not available via GetMatchDetails"
}
}

Things we have to patch back into the match from the replay
hero_id (otherwise you cant figure out what hero goes in each row)
duration (purchase/kill timelines rely on this to know how many columns to render)

Things we could do optionally:
kills
deaths
assists
last_hits
denies
gpm
xpm
items (more work since parser does not handle items atm)

Things we probably won't:
account_id (privacy issue)
game_mode
lobby_type
cluster
ability_upgrades (hard)
tower_status/barracks_status

@sozuoka
Copy link
Author

sozuoka commented Jan 18, 2016

I don't know much about this, but logically account_id, game_mode, lobby_type are out question due to the nature of practice match/private lobby. But is ablility_upgrades really that hard to parse?

@howardchung
Copy link
Member

Account ID is actually possible. The replay data contains the information. I believe it has game mode as well.

For ability upgrades presumably the data is there but we would be starting from scratch since that's something we haven't had to implement before. @spheenik may know more.

@howardchung howardchung modified the milestones: 2015-3, 2015-5 Jan 19, 2016
@howardchung
Copy link
Member

Almost done. Need to make the form look nice and integrate with the existing request form. Also needs kubernetes migration in order to give parse nodes access to Redis.

@spheenik
Copy link

Do you still need me? I don't know what you mean by ability upgrade...

@howardchung
Copy link
Member

Skill build order
On Jan 20, 2016 1:23 PM, "Martin Schrodt" notifications@github.com wrote:

Do you still need me? I don't know what you mean by ability upgrade...


Reply to this email directly or view it on GitHub
#830 (comment).

@spheenik
Copy link

You mean the time when an ability was skilled? Should be easy...
Listen for OnEntityUpdated, check if the entity is an ability, if so, check if one of the properties updated is the level of the ability, if so, emit a record.

@sozuoka
Copy link
Author

sozuoka commented Jan 21, 2016

That's good to hear, I'm really looking to testing this new feature myself!

@howardchung
Copy link
Member

I think for now we will just leave most of the overview blank. I'm guessing you're more interested in the parse data and you can get the overview from the client anyway

@howardchung
Copy link
Member

Pull request made. Let's get some eyes on the code? We cannot deploy to production until kube migration is complete.

@sozuoka
Copy link
Author

sozuoka commented Jan 22, 2016

I have two small questions: 1) as I understand, normally hero damage/tower damage is included in data given by game API. Does that mean if we want to have it for custom replay parsing, hero damage/tower damage have to be "manually" calculated? 2) Is it possible to make a movement heatmap for the whole game's duration (instead of only first 10 mins right now)? This is actually my suggestion for the public parser too (unless if it put too much load/stress on your servers, of course).

@howardchung
Copy link
Member

  1. Yes
  2. Full game position data is expensive to store because it takes up a lot of space.

@howardchung
Copy link
Member

I was able to add some of the data such as level, K, D, A, gpm, xpm. Some is still missing such as HD, TD, HH, items, ability upgrades, and tower status.

@howardchung
Copy link
Member

Current design:

  • We take the uploaded file and MD5 hash it to generate a key.
  • We use this key for storing the replay blob in Redis, the JSON match blob, and for routing the user to the match.
  • Replay blob expires after an hour.
  • JSON match blob expires after 7 days.
  • parser.js "mocks" the URL by serving the Redis blob though an Express server (localhost)
  • Example: /matches/99bd628c72e82b5e8524ae661ca417d7

@sozuoka
Copy link
Author

sozuoka commented Jan 28, 2016

From my knowledge, that seems like quite a lot of progress so far :D But imo only 1 hour of replay blob storage is a bit too tight, do you think extending it to 2 or 3 hours is possible?

@howardchung
Copy link
Member

We don't need the blob for very long, just long enough to parse it.

@howardchung
Copy link
Member

http://colab-sbx-244.oit.duke.edu:5000/request if you want to beta test

@sozuoka
Copy link
Author

sozuoka commented Jan 29, 2016

Oh wow, this is much better than I expected. I would say you can release this feature as it is right now, but regardless, I still have a few questions:
1 - I don't know about Hero Healing and Tower Damage, but isn't it easier to calculate hero damage by combining the stats provided by the combat log - like in Combat tab?
2 - Is it possible to add drafting info (the ban/pick order and specific heroes) of Captain's Mode? This is actually what I had in mind when I first requested the feature.
3 - Are you planning to add the abilities upgrade, or just omit it completely? Personally I think it's great if we can have it, but it's not top priority either.

@howardchung
Copy link
Member

This requires Redis access on parse nodes so we need to complete the Kubernetes migration first. I think we're going to try to do that this weekend.

  1. I guess we can attempt to sum the hero damage from the combat log, but in general that differs from the damage reported in the API for some reason. I'd rather be consistent, and people can manually sum the damage if they want a ballpark figure.
  2. I think the replay does have drafting information in the "epilogue" packet but it needs to processing to extract since it's in a weird format. Maybe I'll look into it later.
  3. I don't really want to add the ability upgrades data. It adds quite a bit of overhead to the parser since we need to check ability entities (which we don't do at all right now).

@muratsu
Copy link

muratsu commented Jan 29, 2016

Did you guys think about malicious demo files? I'm a bit concerned about trusting 3rd parties completely. Im pretty sure demo parsers werent coded with security in mind.

@howardchung
Copy link
Member

Might be a question for @spheenik. I don't think there is any risk in running any binary through clarity. It'll just crash if it's malformed--we're not treating the uploaded file as an executable.

@muratsu
Copy link

muratsu commented Jan 29, 2016

Im more concerned about denial type of attacks (infinite loops, etc)

@howardchung
Copy link
Member

As far as I know clarity will keep advancing the file stream pointer as it parses, so this can't happen unless the file is of infinite size.

I set a limit of 100MB on uploads. Some replays are larger, but those are generally 2h+ games, which are pretty rare.

@spheenik
Copy link

I am certain clarity can be crashed in a gazillion ways when handing it malicious data.
A dota replay is a very fragile thing: a single bit off, and you probably can't recover.

I have a hard time thinking of real exploits. Most times, it'll just run into an error, and stop processing.
You could certainly craft data that will make clarity read as much data as you specified, but with the upload limit, each attempt is limited to 100MB. Making it loop forever and eat cpu uselessly might also be possible, but one would have to take a look at how to craft it.

@howardchung
Copy link
Member

I think we're going to roll this out next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants