Skip to content

Commit

Permalink
net: Optimize snmp stat aggregation by walking all the percpu data at…
Browse files Browse the repository at this point in the history
… once

Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.25s
cache miss             2.7%             1.41%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.57%  docker           docker                [.] scanblock
     8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
     6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one

changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
ktraghavendra authored and davem330 committed Aug 31, 2015
1 parent c4c6bc3 commit a3a7737
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions net/ipv6/addrconf.c
Expand Up @@ -4726,27 +4726,33 @@ static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib,
}

static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib,
int items, int bytes, size_t syncpoff)
int bytes, size_t syncpoff)
{
int i;
int pad = bytes - sizeof(u64) * items;
int i, c;
u64 buff[IPSTATS_MIB_MAX];
int pad = bytes - sizeof(u64) * IPSTATS_MIB_MAX;

BUG_ON(pad < 0);

/* Use put_unaligned() because stats may not be aligned for u64. */
put_unaligned(items, &stats[0]);
for (i = 1; i < items; i++)
put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]);
memset(buff, 0, sizeof(buff));
buff[0] = IPSTATS_MIB_MAX;

memset(&stats[items], 0, pad);
for_each_possible_cpu(c) {
for (i = 1; i < IPSTATS_MIB_MAX; i++)
buff[i] += snmp_get_cpu_field64(mib, c, i, syncpoff);
}

memcpy(stats, buff, IPSTATS_MIB_MAX * sizeof(u64));
memset(&stats[IPSTATS_MIB_MAX], 0, pad);
}

static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
int bytes)
{
switch (attrtype) {
case IFLA_INET6_STATS:
__snmp6_fill_stats64(stats, idev->stats.ipv6,
IPSTATS_MIB_MAX, bytes, offsetof(struct ipstats_mib, syncp));
__snmp6_fill_stats64(stats, idev->stats.ipv6, bytes,
offsetof(struct ipstats_mib, syncp));
break;
case IFLA_INET6_ICMP6STATS:
__snmp6_fill_statsdev(stats, idev->stats.icmpv6dev->mibs, ICMP6_MIB_MAX, bytes);
Expand Down

0 comments on commit a3a7737

Please sign in to comment.