The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

morotti · 2025-01-15T13:02:02Z

Describe the bug
Hello, I was running some benchmark on my python codebase on different hardware.
Turns out, there were massive unexplained differences up to 10 times slower in some environment, while the difference with older/newer CPU should only be about 2 or 3 times as slow.
After many many days of debugging, I've finally found the root cause. The issue is when running tests with code coverage.

Running some python tests with branch code coverage is 5 times slower, or 3 times slower without branch coverage.

To Reproduce

tests are running with the CTracer. (I saw the other bug tickets where you said to check that first :D )

CLOSE TO LATEST
python 3.11.9
pytest 7.4.4 (latest last year, there is a 8.x now)
coverage 7.2.7 (same result on 7.6.10 latest)

OLDER PYTHON FOR COMPARISON
python 3.8.12
pytest 3.10.1
coverage 5.0.3

source code

test_empty.py is an empty test like def test_empty(): \npass
test_multiplication.py is a math operation def test_multiplication(): 9**10_000_000
test_prime.py is a python script below to compute prime numbers up to N.

import socket
import time


def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:  # Exclude even numbers
        return False
    for i in range(3, int(n ** 0.5) + 1, 2):  # Only check odd divisors
        if n % i == 0:
            return False
    return True


def cpu_benchmark_find_primes(limit):
    """Benchmark CPU by finding primes and measuring the time taken."""
    print(f"Starting CPU benchmark on {socket.getfqdn()}...")
    start_time = time.time()

    # Calculate all prime numbers up to a large value
    primes = [n for n in range(2, limit) if is_prime(n)]

    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Found {len(primes)} primes up to {limit}")
    print(f"Time taken: {elapsed_time:.3f} seconds")
    return elapsed_time


def test_find_primes():
    cpu_benchmark_find_primes(3_000_000)


if __name__ == "__main__":
    test_find_primes()

**Actual behavior **

best of 5 runs. the command is basically <coveragewithoptions> pytest <testfile>

full test command with hyperfine.

hyperfine -N --runs 3 --parameter-list wrapper "","coverage run","coverage run --branch","coverage run --omit=tests/*","coverage run --branch --omit=tests/*" --parameter-list testfile test_empty.py,test_multiplication.py,test_prime.py --parameter-list pythonversion 38,311 --show-output --export-markdown=result.md --export-csv=result.csv -- '{wrapper} /turbo/rmorotti/pyenvs/python-benchmark-{pythonversion}/bin/pytest tests/unit/{testfile} --durations=0'

** Expected behavior **

Obviously the coverage should have as little overhead as possible.
On the operation 9^N, the overhead is very little. that's fine.

On the python code to compute prime numbers up to N, the overhead is insane, as much as 5 times with branch coverage enabled.
Even if you disable code coverage on the test file with --omit=tests/*, the overhead is still 3 times, which makes no sense!
What could possibly explain that?

additional context
The pure pytest run without coverage is 5.695 seconds on python 3.8 vs 4.821 on python 3.11.
The runtime is massively increased with code coverage, it's actually increased to the same duration, despite being totally different versions with totally different interpreter. Which I find extremely odd!
Is it possible that there is a massive fix overhead with cover coverage?

Maybe important to know. Python 3.11+ is 10%-30% faster on most operations. Python 3.11 is 4 times faster on tight loop and list comprehension.
If code coverage has some fixed overhead around loops, that could have a massive effect on recent python versions.

The text was updated successfully, but these errors were encountered:

nedbat · 2025-01-17T12:17:26Z

Thanks for the code and timings. The performance of coverage is dependent on many things, so it can be hard to isolate a single cause for slowness, and even harder to improve. I'm not sure when I'll be able to dig into this to understand if there is a way to make it faster.

morotti · 2025-01-23T13:48:59Z

hello, thanks for getting back to me.

I was trying to do a bit of debugging. I've read some of your articles on writing coveragepy, nice blog by the way. :)

Doing a run with debug stats.
The line handler is called 300M times. That is why coverage is very slow.

coverage run -- test_prime.py 
Starting CPU benchmark on rmorotti.hn.res.ahl...
Found 216816 primes up to 3000000
Time taken: 18.256 seconds

Coverage.py tracer stats:
               calls: 3000029
              errors: 0
               files: 13
               lines: 331479882
              others: 1
             pycalls: 13
             returns: 3000029
         stack_alloc: 20
      stack_reallocs: 1
 start_context_calls: 0

Doing a debug run with tracer. It's apparent that a lot of the time is taken to invoke coverage for the same lines of code in the loop.

...
95584b70:           4   25 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> primes = [n for n in range(2, limit) if is_prime(n)]
95584b70:             5    5 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py traced    -> def is_prime(n):
95584b70:             5    7 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n < 2:
95584b70:             5    9 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n == 2:
95584b70:             5   11 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % 2 == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
95584b70:             5   14 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> if n % i == 0:
95584b70:             5   13 /users/is/rmorotti/PycharmProjects/coveragepy-github/test_prime.py line    -> for i in range(3, int(n ** 0.5) + 1, 2):
...

Doing a run with --omit test_prime.py to ignore the test file. It's still a 13 seconds run (down from 18 seconds).
I've added some debug code to measure the time in the respective handler for total / calls / returns / lines.
It shows there is almost no time spent in the coveragepy CTracer code.
I think the vast majority of the time is spent in the interpreter (python 3.11) to prepare the stack frame and arguments, to invoke the Ctracer. that's quite unfortunate.

Found 216816 primes up to 3000000
Time taken: 12.999 seconds
total Ctracer_trace() 0.544730831 seconds  
total calls() 0.155593790 seconds  
total returns() 0.072397800 seconds  
total lines() 0.000000380 seconds  

Coverage.py tracer stats:
               calls: 3000029
              errors: 0
               files: 13
               lines: 11
              others: 1
             pycalls: 13
             returns: 3000029
         stack_alloc: 20
      stack_reallocs: 1
 start_context_calls: 0

nedbat · 2025-01-24T01:53:38Z

Thanks for digging into it. Unfortunately, we don't have a way to turn off the trace function per-line, and we don't know when the entire function has been trace, though that could be an interesting area for exploration.

If you can run on 3.12, and are only doing line coverage not branch coverage, try the sysmon core, it should be much more efficient.

morotti added the bug Something isn't working label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

morotti commented Jan 15, 2025 •

edited

Loading

nedbat commented Jan 17, 2025

morotti commented Jan 23, 2025

nedbat commented Jan 24, 2025

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

The performance overhead of test coverage is insane - 5 times slower on python code - REPRO ATTACHED #1916

Comments

morotti commented Jan 15, 2025 • edited Loading

nedbat commented Jan 17, 2025

morotti commented Jan 23, 2025

nedbat commented Jan 24, 2025

morotti commented Jan 15, 2025 •

edited

Loading