Add support to filter task / job counters #160

piyushnarang · 2017-04-05T23:39:19Z

We noticed that in scenarios where the number of tasks is very large (10K+), requesting for task counters ends up resulting in a very large payload. This PR adds a filter criteria (includeCounter) to the job / task endpoints to allow users to provide a set of counters they're interested in. This is similar to the existing include clause. The response payload is then the filtered set of counters + include fields requested.

Here's a couple of examples:

Job:
/api/v1/job/mycluster/my_job_id?include=jobKey&includeCounter=org.apache.hadoop.mapreduce.FileSystemCounter.HDFS_BYTES_WRITTEN

Response:

{
  "jobKey" : {
    "cluster" : "procrev@atla",
    "userName" : "ads",
    "appId" : "ad_shard_log_joiner_procrev_atla_even_hourly",
    "runId" : 1486598274736,
    "jobId" : {
      "cluster" : "procrev@atla",
      "jobEpoch" : 1486504431708,
      "jobSequence" : 30155,
      "jobIdString" : "job_1486504431708_30155"
    },
    "qualifiedJobId" : {
      "cluster" : "procrev@atla",
      "jobEpoch" : 1486504431708,
      "jobSequence" : 30155,
      "jobIdString" : "job_1486504431708_30155"
    },
    "encodedRunId" : 9223370550256501071
  },
  "counters" : {
    "org.apache.hadoop.mapreduce.FileSystemCounter" : {
      "HDFS_BYTES_WRITTEN" : 3269002525990
    }
  },
  "mapCounters" : {
    "org.apache.hadoop.mapreduce.FileSystemCounter" : {
      "HDFS_BYTES_WRITTEN" : 0
    }
  },
  "reduceCounters" : {
    "org.apache.hadoop.mapreduce.FileSystemCounter" : {
      "HDFS_BYTES_WRITTEN" : 3269002525990
    }
  }
}

Task: /api/v1/tasks/mycluster/my_job_id?include=taskType&include=taskId&includeCounter=org.apache.hadoop.mapreduce.TaskCounter.COMMITTED_HEAP_BYTES&includeCounter=org.apache.hadoop.mapreduce.TaskCounter.PHYSICAL_MEMORY_BYTES&includeCounter=org.apache.hadoop.mapreduce.TaskCounter.GC_TIME_MILLIS&includeCounter=org.apache.hadoop.mapreduce.TaskCounter.CPU_MILLISECONDS

Response:

[  
   {  
      "taskId":"task_1486504431708_30155_m_000000",
      "taskType":"MAP",
      "counters":{  
         "org.apache.hadoop.mapreduce.TaskCounter":{  
            "PHYSICAL_MEMORY_BYTES":722796544,
            "GC_TIME_MILLIS":632,
            "CPU_MILLISECONDS":144470,
            "COMMITTED_HEAP_BYTES":798347264
         }
      }
   },
   {  
      "taskId":"task_1486504431708_30155_m_000000",
      "taskType":"MAP",
      "counters":{  
         "org.apache.hadoop.mapreduce.TaskCounter":{  
            "PHYSICAL_MEMORY_BYTES":722796544,
            "GC_TIME_MILLIS":632,
            "CPU_MILLISECONDS":144470,
            "COMMITTED_HEAP_BYTES":798347264
         }
      }
   },
]

coveralls · 2017-04-05T23:49:21Z

Coverage decreased (-0.02%) to 2.127% when pulling 75b8123 on piyushnarang:task_counter_filters into e17ec79 on twitter:master.

piyushnarang · 2017-04-05T23:50:30Z

cc @vrushalivc / @dieu

vrushalivc · 2017-04-10T21:13:40Z

Thanks @piyushnarang and @dieu

Vrushali and others added 7 commits December 12, 2016 16:38

First check in for memory optimizations for jobs

7d2bea6

Updating the json serialization for String members

449d033

Merge branch 'master' into vrushalivc/memory_tasks_rest_api

592a105

Add some ignore files

b5b61fa

Clean up dead code

e8e81a0

Updates to RestResource to support counter filtering and tests

46340b5

Fix formatting, add comments in HRavenRestClient

75b8123

piyushnarang mentioned this pull request Apr 6, 2017

Add support to request for compressed payloads from the HRavenRestClient #161

Merged

dieu approved these changes Apr 10, 2017

View reviewed changes

vrushalivc merged commit 2089d69 into twitter:master Apr 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to filter task / job counters #160

Add support to filter task / job counters #160

piyushnarang commented Apr 5, 2017

coveralls commented Apr 5, 2017

piyushnarang commented Apr 5, 2017

vrushalivc commented Apr 10, 2017

Add support to filter task / job counters #160

Add support to filter task / job counters #160

Conversation

piyushnarang commented Apr 5, 2017

coveralls commented Apr 5, 2017

piyushnarang commented Apr 5, 2017

vrushalivc commented Apr 10, 2017