Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix decoding url #207

Merged
merged 22 commits into from
Nov 12, 2021
Merged

fix decoding url #207

merged 22 commits into from
Nov 12, 2021

Conversation

Eomm
Copy link
Contributor

@Eomm Eomm commented Oct 6, 2021

Fixes #206

@mav-rik
Copy link

mav-rik commented Oct 6, 2021

@Eomm I would rather put decodeURI right at sanitizeUrl output.

First of all decodeURI must be wrapped to be called safe:

function safeDecodeURI (v) {
    if (v.indexOf('%') < 0) return v
    try {
        return decodeURI(v)
    } catch (e) {
        return null
    }
}

and the sanitizeUrl:

function sanitizeUrl (url) {
  for (var i = 0, len = url.length; i < len; i++) {
    var charCode = url.charCodeAt(i)
    // Some systems do not follow RFC and separate the path and query
    // string with a `;` character (code 59), e.g. `/foo;jsessionid=123456`.
    // Thus, we need to split on `;` as well as `?` and `#`.
    if (charCode === 63 || charCode === 59 || charCode === 35) {
      return safeDecodeURI(url.slice(0, i))
    }
  }
  return safeDecodeURI(url)
}

All the calls of fastDecode should stay untached for further decoding of dynamic params.

@Eomm
Copy link
Contributor Author

Eomm commented Oct 8, 2021

All the calls of fastDecode should stay untached for further decoding of dynamic params.

Right, and moreover

https://www.rfc-editor.org/rfc/rfc3986#section-2.4

Implementations must not
percent-encode or decode the same string more than once, as decoding
an already decoded string might lead to misinterpreting a percent
data octet as the beginning of a percent-encoding, or vice versa in
the case of percent-encoding an already percent-encoded string.

The main issue is that:

  • we want to decode the URL before executing the routing phase
  • we don't want to decode the /(%2F) because it defines the URL segment and it is a path parameter

I think the sanitizeUrl works fine, but I had to move the call into the find method, removing from the lookup.
I think this is overkill: we should sanitize only once and the find method is recursive.

I need to find how to archive it

@Eomm Eomm marked this pull request as ready for review October 11, 2021 07:27
@Eomm
Copy link
Contributor Author

Eomm commented Oct 11, 2021

we should sanitize only once and the find method is recursive.

This was not true: the iteration happens inside the while loop:

while (true) {

so the sanitizeUrl is executed once

@Eomm
Copy link
Contributor Author

Eomm commented Oct 13, 2021

The @mav-rik solution works (thks for the help)

I have added a failing test to check this issue https://owasp.org/www-community/Double_Encoding
Checking how to solve ⏳

@@ -30,7 +30,7 @@ test('If onBadUrl is defined, then a bad url should be handled differently (look
t.fail('Should not be defaultRoute')
},
onBadUrl: (path, req, res) => {
t.equal(path, '%world')
t.equal(path, '/hello/%world')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a breaking change:
the onBadUrl was getting the path parameter when there were wrong URLs and not the whole path

@mcollina mcollina requested a review from delvedor October 13, 2021 15:38
@Eomm
Copy link
Contributor Author

Eomm commented Oct 14, 2021

I have added two benches on both branches.
When there is an encoded path parameter, the performance drop is quite big

I think we can work on it a bit more: the decode-component should be executed only when there are one of these encoded chars:

; , / ? : @ & = + $ - _ . ! ~ * ' ( ) #

NOTE: this decoding is needed to keep supporting this feature:

The url is sanitized internally, all the parameters and wildcards are decoded automatically.

master branch
lookup static route x 77,362,766 ops/sec ±0.11% (595 runs sampled)
lookup dynamic route x 6,564,291 ops/sec ±0.07% (597 runs sampled)
lookup dynamic multi-parametric route x 3,502,668 ops/sec ±0.11% (596 runs sampled)
lookup dynamic multi-parametric route with regex x 2,688,043 ops/sec ±0.14% (596 runs sampled)
lookup long static route x 5,130,292 ops/sec ±0.09% (595 runs sampled)
lookup long dynamic route x 3,867,784 ops/sec ±0.29% (595 runs sampled)
lookup static route on constrained router x 26,144,108 ops/sec ±0.21% (588 runs sampled)
lookup static versioned route x 4,292,943 ops/sec ±0.11% (596 runs sampled)
lookup static constrained (version & host) route x 4,249,856 ops/sec ±0.38% (595 runs sampled)
find static route x 54,420,447 ops/sec ±0.37% (592 runs sampled)
find dynamic route x 7,250,609 ops/sec ±0.22% (595 runs sampled)
+find dynamic route with encoded parameter x 5,018,355 ops/sec ±0.16% (596 runs sampled)
find dynamic multi-parametric route x 3,886,136 ops/sec ±0.09% (596 runs sampled)
find dynamic multi-parametric route with regex x 2,944,841 ops/sec ±0.29% (595 runs sampled)
find long static route x 7,246,821 ops/sec ±0.25% (596 runs sampled)
find long dynamic route x 5,256,830 ops/sec ±0.13% (595 runs sampled)
find long nested dynamic route x 2,145,797 ops/sec ±0.11% (594 runs sampled)
+find long nested dynamic route with encoded parameter x 1,711,683 ops/sec ±0.09% (597 runs sampled)
find long nested dynamic route with other method x 4,399,929 ops/sec ±0.12% (595 runs sampled)
find long nested dynamic route x 2,115,145 ops/sec ±0.09% (596 runs sampled)
find long nested dynamic route with other method x 4,405,515 ops/sec ±0.09% (596 runs sampled)
this PR
lookup static route x 75,074,382 ops/sec ±0.17% (595 runs sampled)
lookup dynamic route x 6,722,545 ops/sec ±0.09% (597 runs sampled)
lookup dynamic multi-parametric route x 3,511,734 ops/sec ±0.11% (596 runs sampled)
lookup dynamic multi-parametric route with regex x 2,706,381 ops/sec ±0.14% (595 runs sampled)
lookup long static route x 4,966,484 ops/sec ±0.11% (595 runs sampled)
lookup long dynamic route x 3,873,694 ops/sec ±0.11% (595 runs sampled)
lookup static route on constrained router x 25,935,927 ops/sec ±0.24% (587 runs sampled)
lookup static versioned route x 4,210,813 ops/sec ±0.10% (596 runs sampled)
lookup static constrained (version & host) route x 4,201,175 ops/sec ±0.10% (596 runs sampled)
find static route x 50,637,288 ops/sec ±0.38% (590 runs sampled)
find dynamic route x 6,713,246 ops/sec ±0.13% (596 runs sampled)
-find dynamic route with encoded parameter x 1,328,750 ops/sec ±0.18% (594 runs sampled)
find dynamic multi-parametric route x 3,582,311 ops/sec ±0.11% (595 runs sampled)
find dynamic multi-parametric route with regex x 2,765,211 ops/sec ±0.10% (596 runs sampled)
find long static route x 5,171,196 ops/sec ±0.09% (596 runs sampled)
find long dynamic route x 3,942,188 ops/sec ±0.11% (596 runs sampled)
find long nested dynamic route x 1,997,218 ops/sec ±0.11% (596 runs sampled)
-find long nested dynamic route with encoded parameter x 719,727 ops/sec ±0.17% (594 runs sampled)
find long nested dynamic route with other method x 3,961,963 ops/sec ±0.12% (596 runs sampled)
find long nested dynamic route x 1,966,554 ops/sec ±0.11% (596 runs sampled)
find long nested dynamic route with other method x 3,955,305 ops/sec ±0.19% (596 runs sampled)

@mcollina
Copy link
Collaborator

I would land this as a semver-major.

@mcollina
Copy link
Collaborator

I'm not liking the decreased performance, but that's expectable.

@Eomm
Copy link
Contributor Author

Eomm commented Oct 14, 2021

I'm not liking the decreased performance, but that's expectable.

Yes. I want to filter out cases like the space char %20 because in this case, we can skip decodeURIComponent
We should run only the code strictly needed for reducing the performance drop

@Eomm Eomm requested a review from mcollina October 20, 2021 07:08
@mcollina
Copy link
Collaborator

How are we with benchmarks?

@mcollina
Copy link
Collaborator

I think this is just a bad bug we should fix, so I would release as a patch and then deal with the fallout in Fastify (if there is any).

@Eomm
Copy link
Contributor Author

Eomm commented Oct 20, 2021

here the results
the encoded parameter benches have a -50% performance drop

comparing master-pr
  pr:     lookup static route x 76,674,656 ops/sec ±0.12% (595 runs sampled)
+ master: lookup static route x 79,131,796 ops/sec ±0.25% (594 runs sampled)

  pr:     lookup dynamic route x 6,259,776 ops/sec ±0.29% (596 runs sampled)
+ master: lookup dynamic route x 6,604,871 ops/sec ±0.19% (595 runs sampled)

  pr:     lookup dynamic multi-parametric route x 3,508,864 ops/sec ±0.29% (594 runs sampled)
+ master: lookup dynamic multi-parametric route x 3,555,251 ops/sec ±0.11% (595 runs sampled)

  pr:     lookup dynamic multi-parametric route with regex x 2,816,789 ops/sec ±0.46% (596 runs sampled)
+ master: lookup dynamic multi-parametric route with regex x 2,704,261 ops/sec ±0.15% (595 runs sampled)

  pr:     lookup long static route x 4,970,945 ops/sec ±0.11% (595 runs sampled)
+ master: lookup long static route x 5,166,473 ops/sec ±0.19% (593 runs sampled)

  pr:     lookup long dynamic route x 3,939,538 ops/sec ±0.10% (596 runs sampled)
+ master: lookup long dynamic route x 3,884,604 ops/sec ±0.96% (593 runs sampled)

  pr:     lookup static route on constrained router x 24,927,696 ops/sec ±0.20% (588 runs sampled)
+ master: lookup static route on constrained router x 24,286,435 ops/sec ±0.22% (590 runs sampled)

  pr:     lookup static versioned route x 4,206,344 ops/sec ±0.10% (596 runs sampled)
+ master: lookup static versioned route x 4,297,009 ops/sec ±0.17% (595 runs sampled)

  pr:     lookup static constrained (version & host) route x 4,206,567 ops/sec ±0.16% (596 runs sampled)
+ master: lookup static constrained (version & host) route x 4,335,753 ops/sec ±0.20% (596 runs sampled)

  pr:     find static route x 52,060,384 ops/sec ±0.38% (586 runs sampled)
+ master: find static route x 54,845,502 ops/sec ±0.41% (585 runs sampled)

  pr:     find dynamic route x 6,809,845 ops/sec ±0.13% (596 runs sampled)
+ master: find dynamic route x 7,363,324 ops/sec ±0.14% (595 runs sampled)

  pr:     find dynamic route with encoded parameter unoptimized x 1,794,848 ops/sec ±0.13% (596 runs sampled)
+ master: find dynamic route with encoded parameter unoptimized x 4,879,835 ops/sec ±0.15% (596 runs sampled)

  pr:     find dynamic route with encoded parameter optimized x 1,916,581 ops/sec ±0.15% (590 runs sampled)
+ master: find dynamic route with encoded parameter optimized x 4,955,861 ops/sec ±0.12% (596 runs sampled)

  pr:     find dynamic multi-parametric route x 3,347,742 ops/sec ±0.12% (594 runs sampled)
+ master: find dynamic multi-parametric route x 3,957,345 ops/sec ±0.10% (596 runs sampled)

  pr:     find dynamic multi-parametric route with regex x 2,686,346 ops/sec ±0.12% (597 runs sampled)
+ master: find dynamic multi-parametric route with regex x 2,938,081 ops/sec ±0.33% (596 runs sampled)

  pr:     find long static route x 4,479,007 ops/sec ±0.11% (595 runs sampled)
+ master: find long static route x 7,311,289 ops/sec ±0.16% (595 runs sampled)

  pr:     find long dynamic route x 3,392,116 ops/sec ±0.11% (592 runs sampled)
+ master: find long dynamic route x 5,357,651 ops/sec ±0.10% (595 runs sampled)

  pr:     find long nested dynamic route x 1,882,345 ops/sec ±0.12% (595 runs sampled)
+ master: find long nested dynamic route x 2,174,753 ops/sec ±0.09% (596 runs sampled)

  pr:     find long nested dynamic route with encoded parameter unoptimized x 944,112 ops/sec ±0.16% (596 runs sampled)
+ master: find long nested dynamic route with encoded parameter unoptimized x 1,732,654 ops/sec ±0.11% (596 runs sampled)

  pr:     find long nested dynamic route with encoded parameter optimized x 976,013 ops/sec ±0.15% (596 runs sampled)
+ master: find long nested dynamic route with encoded parameter optimized x 1,762,524 ops/sec ±0.11% (596 runs sampled)

  pr:     find long nested dynamic route with other method x 3,700,463 ops/sec ±0.12% (591 runs sampled)
+ master: find long nested dynamic route with other method x 4,476,532 ops/sec ±0.12% (596 runs sampled)

  pr:     find long nested dynamic route x 1,856,434 ops/sec ±0.26% (593 runs sampled)
+ master: find long nested dynamic route x 2,136,509 ops/sec ±0.10% (596 runs sampled)

  pr:     find long nested dynamic route with other method x 3,737,505 ops/sec ±0.12% (596 runs sampled)
+ master: find long nested dynamic route with other method x 4,478,782 ops/sec ±0.10% (595 runs sampled)

@@ -11,7 +11,7 @@ test('If onBadUrl is defined, then a bad url should be handled differently (find
t.fail('Should not be defaultRoute')
},
onBadUrl: (path, req, res) => {
t.equal(path, '/%world')
t.equal(path, '/%world', { todo: 'this is not executed' })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Can we remove this assertion then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it for a follow up PR

@mcollina
Copy link
Collaborator

Does this change impact at all Fastify benchmarks?

Copy link
Owner

@delvedor delvedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Looks good to me. The drop in performances is not that much in normal cases, so I'm fine with it, and in any case, better to be correct from a spec perspective.

Just to be sure tho, I would release this in a major, just in time for Fastify v4.

Copy link
Contributor

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Co-authored-by: Simone Busoli <[email protected]>
@Eomm
Copy link
Contributor Author

Eomm commented Oct 25, 2021

Does this change impact at all Fastify benchmarks?

No, I think the bench does not check path parameters.

I run the benchmark script on fastify and here are the results (3 runs each, and get the best one).

main
[1] ┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
[1] │ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max   │
[1] ├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
[1] │ Latency │ 2 ms │ 5 ms │ 6 ms  │ 7 ms │ 4.84 ms │ 1.55 ms │ 75 ms │
[1] └─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
[1] ┌───────────┬────────┬────────┬─────────┬─────────┬───────────┬─────────┬────────┐
[1] │ Stat      │ 1%     │ 2.5%   │ 50%     │ 97.5%   │ Avg       │ Stdev   │ Min    │
[1] ├───────────┼────────┼────────┼─────────┼─────────┼───────────┼─────────┼────────┤
[1] │ Req/Sec   │ 166015 │ 166015 │ 188287  │ 189823  │ 185962.67 │ 5577.08 │ 165919 │
[1] ├───────────┼────────┼────────┼─────────┼─────────┼───────────┼─────────┼────────┤
[1] │ Bytes/Sec │ 31 MB  │ 31 MB  │ 35.2 MB │ 35.5 MB │ 34.8 MB   │ 1.04 MB │ 31 MB  │
[1] └───────────┴────────┴────────┴─────────┴─────────┴───────────┴─────────┴────────┘
within this branch
[1] ┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐
[1] │ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max   │
[1] ├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤
[1] │ Latency │ 2 ms │ 5 ms │ 6 ms  │ 8 ms │ 4.81 ms │ 1.64 ms │ 72 ms │
[1] └─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘
[1] ┌───────────┬─────────┬─────────┬─────────┬────────┬───────────┬─────────┬─────────┐
[1] │ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%  │ Avg       │ Stdev   │ Min     │
[1] ├───────────┼─────────┼─────────┼─────────┼────────┼───────────┼─────────┼─────────┤
[1] │ Req/Sec   │ 164863  │ 164863  │ 189951  │ 192255 │ 187118.94 │ 6384.67 │ 164805  │
[1] ├───────────┼─────────┼─────────┼─────────┼────────┼───────────┼─────────┼─────────┤
[1] │ Bytes/Sec │ 30.8 MB │ 30.8 MB │ 35.5 MB │ 36 MB  │ 35 MB     │ 1.19 MB │ 30.8 MB │
[1] └───────────┴─────────┴─────────┴─────────┴────────┴───────────┴─────────┴─────────┘

@Eomm
Copy link
Contributor Author

Eomm commented Oct 26, 2021

@mcollina could you re-run the pipeline?
I think there were some flakiness on the last run

@mcollina
Copy link
Collaborator

@mcollina could you re-run the pipeline?

I think there were some flakiness on the last run

seems all passing now

@Eomm Eomm requested a review from mcollina October 29, 2021 07:27
@darkgl0w
Copy link
Contributor

darkgl0w commented Oct 30, 2021

There is a little typo on the README.md file.
It should be :

**Note** that you must encode the parameters containing [reserved characters](https://www.rfc-editor.org/rfc/rfc3986#section-2.2).

Copy link
Collaborator

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina
Copy link
Collaborator

@delvedor could you land and ship a new major?

Co-authored-by: darkgl0w <[email protected]>
@Eomm
Copy link
Contributor Author

Eomm commented Nov 8, 2021

@delvedor I may target a next branch if needed 👍🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the lib doesn't handle url-encoded URI properly
7 participants