# Debugging Performance Issues

_Brief abstract/introduction/motivation.  State what the chapter is about in 1-2 paragraphs._
_Then, have an introduction video:_

In [1]:
from bookutils import YouTubeVideo
YouTubeVideo("w4u5gCgPlmg")

**Prerequisites**

* _Refer to earlier chapters as notebooks here, as here:_ [Earlier Chapter](Debugger.ipynb).

## Synopsis
<!-- Automatically generated. Do not edit. -->

To [use the code provided in this chapter](Importing.ipynb), write

```python
>>> from debuggingbook.PerformanceDebugger import <identifier>
```

and then make use of the following features.


_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._



In [2]:
import bookutils

In [3]:
import Intro_Debugging

## Some Long-Running Function

In [4]:
from ChangeCounter import ChangeCounter, debuggingbook_change_counter

In [5]:
# with Timer() as t:
#     change_counter = debuggingbook_change_counter(ChangeCounter)

## Simple Profiling

In [6]:
import cProfile

In [7]:
# cProfile.run('debuggingbook_change_counter(ChangeCounter)')

Mining calls `diff` for every two versions. This is expensive.

## Alternative: Use a Tracer

In [8]:
# ignore
from typing import Any, Optional, Type, Union, Dict, Tuple, List

In [9]:
from Intro_Debugging import remove_html_markup

In [10]:
from bookutils import print_content

In [11]:
import inspect

In [12]:
print_content(inspect.getsource(remove_html_markup), '.py',
              start_line_number=238)

238  [34mdef[39;49;00m [32mremove_html_markup[39;49;00m(s):  [37m# type: ignore[39;49;00m
239      tag = [34mFalse[39;49;00m
240      quote = [34mFalse[39;49;00m
241      out = [33m"[39;49;00m[33m"[39;49;00m
242  
243      [34mfor[39;49;00m c [35min[39;49;00m s:
244          [34massert[39;49;00m tag [35mor[39;49;00m [35mnot[39;49;00m quote
245  
246          [34mif[39;49;00m c == [33m'[39;49;00m[33m<[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
247              tag = [34mTrue[39;49;00m
248          [34melif[39;49;00m c == [33m'[39;49;00m[33m>[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
249              tag = [34mFalse[39;49;00m
250          [34melif[39;49;00m (c == [33m'[39;49;00m[33m"[39;49;00m[33m'[39;49;00m [35mor[39;49;00m c == [33m"[39;49;00m[33m'[39;49;00m[33m"[39;49;00m) [35mand[39;49;00m tag:
251              quote = [35mnot[39;49;00m quote
252          [34mel

In [13]:
import Timer

In [14]:
from types import FrameType

In [15]:
from Tracer import Tracer

In [16]:
class PerformanceTracer(Tracer):
    def __init__(self) -> None:
        super().__init__()
        self.reset_timer()
        self.hits: Dict[Tuple[str, int], int] = {}
        self.time: Dict[Tuple[str, int], float] = {}

    def reset_timer(self) -> None:
        self.timer = Timer.Timer()

    def __enter__(self) -> Any:
        super().__enter__()
        self.reset_timer()
        return self

    def traceit(self, frame: FrameType, event: str, arg: Any) -> None:
        t = self.timer.elapsed_time()
        key = (frame.f_code.co_name, frame.f_lineno)

        self.hits.setdefault(key, 0)
        self.time.setdefault(key, 0.0)
        self.hits[key] += 1
        self.time[key] += t

        self.reset_timer()

In [17]:
with PerformanceTracer() as perf_tracer:
    for i in range(10000):
        s = remove_html_markup('<b>foo</b>')

In [18]:
perf_tracer.hits

{('__init__', 17): 1,
 ('__init__', 19): 1,
 ('clock', 8): 1,
 ('clock', 12): 2,
 ('__init__', 20): 2,
 ('remove_html_markup', 238): 10000,
 ('remove_html_markup', 239): 10000,
 ('remove_html_markup', 240): 10000,
 ('remove_html_markup', 241): 10000,
 ('remove_html_markup', 243): 110000,
 ('remove_html_markup', 244): 100000,
 ('remove_html_markup', 246): 100000,
 ('remove_html_markup', 247): 20000,
 ('remove_html_markup', 248): 80000,
 ('remove_html_markup', 250): 60000,
 ('remove_html_markup', 252): 60000,
 ('remove_html_markup', 249): 20000,
 ('remove_html_markup', 253): 30000,
 ('remove_html_markup', 255): 20000}

In [19]:
perf_tracer.time

{('__init__', 17): 3.868498606607318e-05,
 ('__init__', 19): 2.702989149838686e-06,
 ('clock', 8): 1.847016392275691e-06,
 ('clock', 12): 2.9189977794885635e-06,
 ('__init__', 20): 3.108987584710121e-06,
 ('remove_html_markup', 238): 0.02706018611206673,
 ('remove_html_markup', 239): 0.021644888940500095,
 ('remove_html_markup', 240): 0.02082053857157007,
 ('remove_html_markup', 241): 0.019359971687663347,
 ('remove_html_markup', 243): 0.19406225995044224,
 ('remove_html_markup', 244): 0.17277625788119622,
 ('remove_html_markup', 246): 0.17115685949102044,
 ('remove_html_markup', 247): 0.03494799174950458,
 ('remove_html_markup', 248): 0.13707106409128755,
 ('remove_html_markup', 250): 0.10299515444785357,
 ('remove_html_markup', 252): 0.1092039400828071,
 ('remove_html_markup', 249): 0.03538127048523165,
 ('remove_html_markup', 253): 0.05167707509826869,
 ('remove_html_markup', 255): 0.035241642995970324}

In [20]:
import inspect

In [21]:
from bookutils import print_content

## Collect

In [22]:
from StatisticalDebugger import CoverageCollector, SpectrumDebugger

In [23]:
class MetricCollector(CoverageCollector):
    def metric(self, event: Any) -> Optional[float]:
        return None

    def all_metrics(self, func: str) -> List[float]:
        return []

    def total(self, func: str) -> float:
        return sum(self.all_metrics(func))

    def maximum(self, func: str) -> float:
        return max(self.all_metrics(func))

In [24]:
Location = Tuple[str, int]

In [25]:
class TimeCollector(MetricCollector):
    def __init__(self) -> None:
        super().__init__()
        self.reset_timer()
        self.time: Dict[Location, float] = {}
        self.add_items_to_ignore([Timer.Timer, Timer.clock])

    def collect(self, frame: FrameType, event: str, arg: Any) -> None:
        t = self.timer.elapsed_time()
        super().collect(frame, event, arg)
        location = (frame.f_code.co_name, frame.f_lineno)

        self.time.setdefault(location, 0.0)
        self.time[location] += t

        self.reset_timer()

    def reset_timer(self) -> None:
        self.timer = Timer.Timer()

    def __enter__(self) -> Any:
        super().__enter__()
        self.reset_timer()
        return self

    def metric(self, location: Any) -> Optional[float]:
        if location in self.time:
            return self.time[location]
        else:
            return None
        
    def all_metrics(self, func: str) -> List[float]:
        return [time
                for (func_name, lineno), time in self.time.items()
                if func_name == func]

In [26]:
with TimeCollector() as collector:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')

In [27]:
for location, time in collector.time.items():
    print(location, time)

('remove_html_markup', 238) 0.00046378184924833477
('remove_html_markup', 239) 0.0003461359301581979
('remove_html_markup', 240) 0.0003651719889603555
('remove_html_markup', 241) 0.00038043787935748696
('remove_html_markup', 243) 0.0033500908175483346
('remove_html_markup', 244) 0.0030661557393614203
('remove_html_markup', 246) 0.0030253041768446565
('remove_html_markup', 247) 0.0006380707200150937
('remove_html_markup', 248) 0.0023946501896716654
('remove_html_markup', 250) 0.0017864947440102696
('remove_html_markup', 252) 0.0019519860798027366
('remove_html_markup', 249) 0.0006006777402944863
('remove_html_markup', 253) 0.0009687698911875486
('remove_html_markup', 255) 0.0005702070484403521


In [28]:
collector.total('remove_html_markup')

0.01990793479490094

## Visualize

In [29]:
class MetricDebugger(SpectrumDebugger):
    def metric(self, location: Location) -> float:
        sum = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                m = collector.metric(location)
                if m is not None:
                    sum += m  # type: ignore

        return sum
    
    def total(self, func_name: str) -> float:
        total = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                total += sum(collector.all_metrics(func_name))

        return total
    
    def maximum(self, func_name: str) -> float:
        maximum = 0.0
        for outcome in self.collectors:
            for collector in self.collectors[outcome]:
                assert isinstance(collector, MetricCollector)
                maximum = max(maximum, 
                              max(collector.all_metrics(func_name)))

        return maximum
    
    def suspiciousness(self, location: Location) -> float:
        func_name, _ = location
        return self.metric(location) / self.total(func_name)
    
    def color(self, location: Location) -> str:
        func_name, _ = location
        hue = 240  # blue
        saturation = 100  # fully saturated
        darkness = self.metric(location) / self.maximum(func_name)
        lightness = 100 - darkness * 25
        return f"hsl({hue}, {saturation}%, {lightness}%)"
    
    def tooltip(self, location: Location) -> str:
        return f"{super().tooltip(location)} {self.metric(location)}"

In [30]:
class PerformanceDebugger(MetricDebugger):
    def __init__(self, collector_class: Type, log: bool = False):
        assert issubclass(collector_class, MetricCollector)
        super().__init__(collector_class, log=log)

In [31]:
with PerformanceDebugger(TimeCollector) as debugger:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')

In [32]:
print(debugger)

 238   2% def remove_html_markup(s):  # type: ignore
 239   1%     tag = False
 240   1%     quote = False
 241   2%     out = ""
 242   0%
 243  16%     for c in s:
 244  15%         assert tag or not quote
 245   0%
 246  15%         if c == '<' and not quote:
 247   2%             tag = True
 248  12%         elif c == '>' and not quote:
 249   2%             tag = False
 250   8%         elif (c == '"' or c == "'") and tag:
 251   0%             quote = not quote
 252   9%         elif not tag:
 253   4%             out = out + c
 254   0%
 255   3%     return out



In [33]:
debugger

## Other Metrics

In [34]:
class HitCollector(MetricCollector):
    def __init__(self) -> None:
        super().__init__()
        self.hits: Dict[Location, int] = {}

    def collect(self, frame: FrameType, event: str, arg: Any) -> None:
        super().collect(frame, event, arg)
        location = (frame.f_code.co_name, frame.f_lineno)

        self.hits.setdefault(location, 0)
        self.hits[location] += 1
    
    def metric(self, location: Location) -> Optional[int]:
        if location in self.hits:
            return self.hits[location]
        else:
            return None
        
    def all_metrics(self, func: str) -> List[float]:
        return [hits
                for (func_name, lineno), hits in self.hits.items()
                if func_name == func]

In [35]:
with PerformanceDebugger(HitCollector) as debugger:
    for i in range(100):
        s = remove_html_markup('<b>foo</b>')

In [36]:
debugger.total('remove_html_markup')

6400.0

In [37]:
print(debugger)

 238   1% def remove_html_markup(s):  # type: ignore
 239   1%     tag = False
 240   1%     quote = False
 241   1%     out = ""
 242   0%
 243  17%     for c in s:
 244  15%         assert tag or not quote
 245   0%
 246  15%         if c == '<' and not quote:
 247   3%             tag = True
 248  12%         elif c == '>' and not quote:
 249   3%             tag = False
 250   9%         elif (c == '"' or c == "'") and tag:
 251   0%             quote = not quote
 252   9%         elif not tag:
 253   4%             out = out + c
 254   0%
 255   3%     return out



In [38]:
debugger

## _Section 1_

\todo{Add}

## Synopsis

_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _assertions_ to check conditions at runtime](Assertions.ipynb)
* [reduce _failing inputs_ for efficient debugging](DeltaDebugger.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [39]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [40]:
# Some code for the solution
2 + 2

4

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_