Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unsafe from PathNormalizer #56805

Merged
merged 13 commits into from
Aug 5, 2024

Conversation

ladeak
Copy link
Contributor

@ladeak ladeak commented Jul 15, 2024

Remove unsafe from PathNormalizer

Updated ContainsDotSegments and RemoveDotSegments to be safe in Kestrel's PathNormalizer.

Description

Using Span<T> where possible to remove pointer arithmetic and unsafe code. Added some new unit test cases and extended performance comparison from DotSegmentRemovalBenchmark.

While some cases got faster, some got slower. Looking at the performance results, I think the cases that got faster are probably the more life-like scenarios, while the results that are slower may be the less likely to occur. I leave it up to the maintainers to decide which cases should be in focus when it comes to performance.

The idea proposed in the PR uses two integers to track the read and write positions. In my performance comparison this yielded the best (reasonable) performance.
A different implementation uses Span<T> to track the read position instead of the int readPointer. While to code may be simpler, it is also about 7-8ns slower in the SingleDotSegments and DoubleDotSegments benchmarks.

There is another implementation which tries to unroll the IndexOf and copy into vectorized operations - it is not complete; I did not want to spend too much time on it before it is agreed. This second implementation yields SingleDotSegments | 56.36 ns results, while the rest of the measurements are on-par with the *AFTER cases shown below.

Performance

Performance results for the current changes:

BEFORE:

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.22631
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK=9.0.100-preview.6.24316.4
  [Host]     : .NET 9.0.0 (9.0.24.35501), X64 RyuJIT
  Job-SNLRNU : .NET 9.0.0 (9.0.24.30702), X64 RyuJIT

Server=True  Toolchain=.NET Core 9.0  RunStrategy=Throughput

|               Method |     Mean |    Error |   StdDev |         Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |---------:|---------:|---------:|-------------:|------:|--------:|------:|------:|------:|----------:|
|        NoDotSegments | 27.53 ns | 0.154 ns | 0.128 ns | 36,323,959.6 |  1.00 |    0.00 |     - |     - |     - |         - |
|    SingleDotSegments | 72.42 ns | 1.331 ns | 1.245 ns | 13,808,221.4 |  2.63 |    0.04 |     - |     - |     - |         - |
|    DoubleDotSegments | 73.39 ns | 0.972 ns | 0.909 ns | 13,626,033.6 |  2.67 |    0.03 |     - |     - |     - |         - |
| OneSingleDotSegments | 71.97 ns | 1.331 ns | 1.245 ns | 13,894,999.6 |  2.62 |    0.05 |     - |     - |     - |         - |
| OneDoubleDotSegments | 53.33 ns | 0.451 ns | 0.376 ns | 18,749,760.5 |  1.94 |    0.01 |     - |     - |     - |         - |
|               Method |     Mean |    Error |   StdDev |         Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |---------:|---------:|---------:|-------------:|------:|--------:|------:|------:|------:|----------:|
|        NoDotSegments | 27.87 ns | 0.250 ns | 0.234 ns | 35,874,484.8 |  1.00 |    0.00 |     - |     - |     - |         - |
|    SingleDotSegments | 72.49 ns | 0.561 ns | 0.497 ns | 13,794,394.2 |  2.60 |    0.03 |     - |     - |     - |         - |
|    DoubleDotSegments | 73.94 ns | 1.098 ns | 1.027 ns | 13,524,773.4 |  2.65 |    0.05 |     - |     - |     - |         - |
| OneSingleDotSegments | 63.54 ns | 0.882 ns | 0.825 ns | 15,738,183.1 |  2.28 |    0.03 |     - |     - |     - |         - |
| OneDoubleDotSegments | 61.15 ns | 0.934 ns | 0.874 ns | 16,352,111.0 |  2.19 |    0.04 |     - |     - |     - |         - |

AFTER (Commit: Review feedback copy 3 chars when not matched a dot segment)

|               Method |     Mean |    Error |   StdDev |         Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |---------:|---------:|---------:|-------------:|------:|--------:|------:|------:|------:|----------:|
|        NoDotSegments | 11.55 ns | 0.119 ns | 0.100 ns | 86,603,144.6 |  1.00 |    0.00 |     - |     - |     - |         - |
|    SingleDotSegments | 86.58 ns | 0.714 ns | 0.668 ns | 11,549,398.0 |  7.51 |    0.09 |     - |     - |     - |         - |
|    DoubleDotSegments | 92.44 ns | 0.777 ns | 0.727 ns | 10,818,128.3 |  8.01 |    0.11 |     - |     - |     - |         - |
| OneSingleDotSegments | 21.93 ns | 0.184 ns | 0.172 ns | 45,608,856.4 |  1.90 |    0.03 |     - |     - |     - |         - |
| OneDoubleDotSegments | 24.50 ns | 0.312 ns | 0.277 ns | 40,815,649.1 |  2.12 |    0.03 |     - |     - |     - |         - |
|               Method |     Mean |    Error |   StdDev |         Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |---------:|---------:|---------:|-------------:|------:|--------:|------:|------:|------:|----------:|
|        NoDotSegments | 11.65 ns | 0.112 ns | 0.105 ns | 85,866,950.4 |  1.00 |    0.00 |     - |     - |     - |         - |
|    SingleDotSegments | 87.05 ns | 0.908 ns | 0.805 ns | 11,488,305.3 |  7.47 |    0.08 |     - |     - |     - |         - |
|    DoubleDotSegments | 94.28 ns | 0.555 ns | 0.464 ns | 10,606,751.7 |  8.09 |    0.09 |     - |     - |     - |         - |
| OneSingleDotSegments | 22.25 ns | 0.202 ns | 0.189 ns | 44,935,385.0 |  1.91 |    0.02 |     - |     - |     - |         - |
| OneDoubleDotSegments | 24.71 ns | 0.281 ns | 0.262 ns | 40,476,357.9 |  2.12 |    0.03 |     - |     - |     - |         - |

Linked to #56534

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Jul 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 15, 2024
@ladeak
Copy link
Contributor Author

ladeak commented Jul 16, 2024

The build issue seems like an infra issue.

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass. For the most part looks good, just had a few questions around avoiding extra work and sprinkling comments about which case we're in. Some of the comments might be wrong as I didn't pull the change down and actually test my understanding 😅

@ladeak ladeak requested a review from BrennanConroy July 19, 2024 19:53
@sebastienros
Copy link
Member

/benchmark plaintext aspnet-citrine-lin kestrel

Copy link

pr-benchmarks bot commented Jul 24, 2024

Benchmark started for plaintext on aspnet-citrine-lin with kestrel. Logs: link

@ladeak
Copy link
Contributor Author

ladeak commented Jul 24, 2024

These benchmark results are internal only, right?

Copy link

pr-benchmarks bot commented Jul 24, 2024

plaintext - aspnet-citrine-lin

application plaintext.base plaintext.pr
Max CPU Usage (%) 99 100 +1.01%
Max Cores usage (%) 2,784 2,789 +0.18%
Max Working Set (MB) 92 90 -2.17%
Max Private Memory (MB) 591 591 0.00%
Build Time (ms) 4,447 3,532 -20.58%
Start Time (ms) 238 224 -5.88%
Published Size (KB) 106,158 106,158 0.00%
Symbols Size (KB) 53 53 0.00%
.NET Core SDK Version 9.0.100-preview.7.24374.11 9.0.100-preview.7.24374.11
Max CPU Usage (%) 99 99 +0.16%
Max Working Set (MB) 96 94 -2.00%
Max GC Heap Size (MB) 24 24 +1.39%
Size of committed memory by the GC (MB) 14 16 +20.61%
Max Number of Gen 0 GCs / sec 3.00 3.00 0.00%
Max Number of Gen 1 GCs / sec 2.00 2.00 0.00%
Max Number of Gen 2 GCs / sec 2.00 2.00 0.00%
Max Gen 0 GC Budget (MB) 26 26 0.00%
Max Time in GC (%) 7.00 14.00 +100.00%
Max Gen 0 Size (B) 0 1,758,456 +∞%
Max Gen 1 Size (B) 2,081,728 2,071,512 -0.49%
Max Gen 2 Size (B) 2,383,736 2,456,400 +3.05%
Max LOH Size (B) 95,248 95,248 0.00%
Max POH Size (B) 6,295,168 9,446,968 +50.07%
Max Allocation Rate (B/sec) 14,280,848 14,268,576 -0.09%
Max GC Heap Fragmentation (%) 591% 652% +10.28%
# of Assemblies Loaded 62 62 0.00%
Max Exceptions (#/s) 1,294 1,336 +3.25%
Max Lock Contention (#/s) 332 322 -3.01%
Max ThreadPool Threads Count 48 48 0.00%
Max ThreadPool Queue Length 711 707 -0.56%
Max ThreadPool Items (#/s) 790,065 787,908 -0.27%
Max Active Timers 1 1 0.00%
IL Jitted (B) 257,824 258,360 +0.21%
Methods Jitted 3,113 3,123 +0.32%
load plaintext.base plaintext.pr
Max CPU Usage (%) 99 98 -1.01%
Max Cores usage (%) 2,767 2,752 -0.54%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 370 370 0.00%
Build Time (ms) 3,350 3,486 +4.06%
Start Time (ms) 0 0
Published Size (KB) 72,283 72,283 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.303 8.0.303
First Request (ms) 114 115 +0.88%
Requests/sec 11,898,111 11,766,220 -1.11%
Requests 179,648,384 177,592,872 -1.14%
Mean latency (ms) 1.44 1.41 -2.08%
Max latency (ms) 61.38 58.71 -4.35%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1,433.60 1,413.12 -1.43%
Latency 50th (ms) 0.67 0.68 +0.30%
Latency 75th (ms) 1.03 1.04 +0.97%
Latency 90th (ms) 2.99 2.53 -15.38%
Latency 99th (ms) 18.45 15.07 -18.32%

@sebastienros
Copy link
Member

@ladeak what do you mean by "internal only". These are public benchmarks we run continuously (https://github.com/aspnet/benchmarks) but in this case we have a bot that can pull and build an external PR to run them and report the results automatically.

@ladeak
Copy link
Contributor Author

ladeak commented Jul 25, 2024

At the time I was trying to access the 'Logs: link', I saw the results posted later.

What is the variance of these results?

@sebastienros
Copy link
Member

Numbers are in the standard deviation:

image

Even when including the value that changed a lot between two runs

image

@sebastienros
Copy link
Member

@halter73 @BrennanConroy the bot comments are flowing again (I think they might just be broken for the runtime repository).

@ladeak ladeak force-pushed the ladeak-56534-readpointer branch from ac18517 to 4f78231 Compare July 29, 2024 06:34
@ladeak
Copy link
Contributor Author

ladeak commented Jul 29, 2024

@amcasey - it seems @BrennanConroy is on holiday. Do you happen to know if these tests fail in general/flaky? (they seem to fail for me on main branch too)
ref: https://github.com/dotnet/aspnetcore/pull/56805/checks?check_run_id=28037667558

@ladeak ladeak force-pushed the ladeak-56534-readpointer branch from 0925301 to 8590b1b Compare July 29, 2024 19:18
@amcasey
Copy link
Member

amcasey commented Jul 29, 2024

They failed in my PR too 😆
#56985

@ladeak
Copy link
Contributor Author

ladeak commented Jul 29, 2024

Thank you for confirming.

@mitchdenny
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@ladeak
Copy link
Contributor Author

ladeak commented Jul 31, 2024

@adityamandaleeka / @BrennanConroy could someone help to review the changes? Not sure if people are on holiday or on other leave (I hope the former). I would be happy to follow up on the open discussion points. It is also not urgent from my side if it is about waiting a few days or a release candidate - I am happy to wait. Just please keep me posted so I can also align my plans.

@BrennanConroy
Copy link
Member

We're just busy with other work right now. I do plan on reviewing sometime this week.

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one outstanding comment about incrementing by 3 instead of 2 in a couple cases.

Comment on lines 109 to 114
slashDot.CopyTo(src[writtenLength..]);
writtenLength += 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's fine?
In the case below where we have else if (remainingLength == 3) then we can avoid an extra iteration of the loop.

@@ -2,6 +2,7 @@
// The .NET Foundation licenses this file to you under the MIT license.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do a follow up PR to share this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow up if I don't forget (holidays coming soon that wipe my memory)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrennanConroy
Copy link
Member

Sorry, we meant to fix this test a couple months ago...

public async Task Caching_SendFileWithFullContentLength_Cached()

This test uses the httpsys.dll binary we produce and tests caching with http.sys. But since the default cache limit is ~16MB, if the dll size increases above that it fails the test.
Could you add [QuarantinedTest("new issue")] to the test or make it send a small file we create before-hand, up to you (I'd recommend quarantining and we'll fix later)

@ladeak
Copy link
Contributor Author

ladeak commented Aug 3, 2024

I have added the QuarantinedTest attribute.

@BrennanConroy BrennanConroy merged commit b5a97c4 into dotnet:main Aug 5, 2024
26 checks passed
@dotnet-policy-service dotnet-policy-service bot added this to the 9.0-rc1 milestone Aug 5, 2024
@BrennanConroy
Copy link
Member

Thanks @ladeak!

ladeak pushed a commit to ladeak/aspnetcore that referenced this pull request Aug 5, 2024
- Follow up on comment: dotnet#56805 (comment)
- Moving PathNormalizer from Shared\HttpSys to Shared\PathNormalizer, changing the namespace to AspNetCore.Internal
- Shares the code between HttpSys and Kestrel.Core
- Remaining Kestrel specific code renamed to PathDecoder
- Keeping the tests and benchmarks in Kestrel
ladeak pushed a commit to ladeak/aspnetcore that referenced this pull request Aug 5, 2024
- Follow up on comment: dotnet#56805 (comment)
- Moving PathNormalizer from Shared\HttpSys to Shared\PathNormalizer, changing the namespace to AspNetCore.Internal
- Shares the code between HttpSys and Kestrel.Core
- Remaining Kestrel specific code renamed to PathDecoder
- Keeping the tests and benchmarks in Kestrel
BrennanConroy pushed a commit that referenced this pull request Aug 6, 2024
* Merging PathNormalizer Implementations

- Follow up on comment: #56805 (comment)
- Moving PathNormalizer from Shared\HttpSys to Shared\PathNormalizer, changing the namespace to AspNetCore.Internal
- Shares the code between HttpSys and Kestrel.Core
- Remaining Kestrel specific code renamed to PathDecoder
- Keeping the benchmarks in Kestrel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants