-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TraceContextExt::span() return an Option #1089
Conversation
I'm happy to look at the breaking tests here, but would like some consensus first that this is a good direction. |
I like the direction! I'll add your remote and run some benchmarks... |
Hm I agree that this is more idiomatic but think it needs to return something to satisfy this requirement of the spec.
Could possibly have another method that has this behavior to work around the spec constraint? but the confusion may make it net negative from a DX perspective. |
@djc, here are the benchmark results for this change. shaun@shaun-amd:~/code/open-telemetry/opentelemetry-rust(djc-optional-span)$ taskset --cpu-list 4 cargo bench -p opentelemetry_sdk --bench trace -- --load-baseline djc --baseline main start-end
Finished bench [optimized] target(s) in 0.12s
Running benches/trace.rs (target/release/deps/trace-b2444cb9aa7f705e)
start-end-span/always-sample
time: [594.56 ns 618.02 ns 639.49 ns]
change: [-12.514% -7.5863% -2.2491%] (p = 0.01 < 0.05)
Performance has improved.
start-end-span/never-sample
time: [171.08 ns 171.50 ns 171.91 ns]
change: [-8.5121% -8.3054% -8.0769%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
start-end-span-4-attrs/always-sample
time: [944.12 ns 977.55 ns 1.0092 µs]
change: [-12.035% -6.8071% -0.5840%] (p = 0.03 < 0.05)
Change within noise threshold.
start-end-span-4-attrs/never-sample
time: [220.33 ns 220.79 ns 221.28 ns]
change: [-4.6267% -4.4331% -4.2439%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
start-end-span-8-attrs/always-sample
time: [1.4526 µs 1.4984 µs 1.5406 µs]
change: [-2.8210% +4.5795% +13.103%] (p = 0.29 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
start-end-span-8-attrs/never-sample
time: [270.36 ns 270.95 ns 271.55 ns]
change: [-5.4790% -5.2004% -4.9441%] (p = 0.00 < 0.05)
Performance has improved.
start-end-span-all-attr-types/always-sample
time: [1.2232 µs 1.2619 µs 1.2969 µs]
change: [-4.2679% +1.8863% +8.1148%] (p = 0.55 > 0.05)
No change in performance detected.
start-end-span-all-attr-types/never-sample
time: [234.85 ns 235.36 ns 235.90 ns]
change: [-5.0780% -4.7854% -4.5055%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
start-end-span-all-attr-types-2x/always-sample
time: [2.0825 µs 2.1587 µs 2.2285 µs]
change: [-12.105% -4.7848% +2.6529%] (p = 0.24 > 0.05)
No change in performance detected.
start-end-span-all-attr-types-2x/never-sample
time: [298.57 ns 299.12 ns 299.63 ns]
change: [-4.5375% -4.1205% -3.7261%] (p = 0.00 < 0.05)
Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
5 (5.00%) low severe
13 (13.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe |
Maybe the trade off could be |
(Might as well add some feedback) If I'm reading that spec correctly, it seems to be designed around languages with pervasive nullability, where retuning a null value could take down the process. @cijothomas mentioned recently that the specification is generally flexible, and .NET has taken some liberties with their official implementation to adhere more closely to .NET's idioms. I don't see why Rust shouldn't follow that precedent, especially if the divergence from the specification is an improvement in UX. Put differently, I think adhering to the spirit of the the specification rather than letter is warranted. |
I was reading some part of the OpenTelemetry spec/documentation earlier that mentioned something like implementations should focus on the conceptual capabilities more than the exact API prescriptions, but of course I can't find that bit now. I do feel like the way this is written doesn't really make sense for Rust -- and I feel like providing an idiomatic API does make things a lot easier. If you look at the code changes here, I think a lot of changes make the code simpler to follow/more robust, and there's potentially other simplifications (like dropping the (Perhaps we should propose a spec modification to explicitly allow this kind of leeway in the quoted language and see how that goes over with the editors?) |
I feel like the reason why spec asks for a noop spans is to make sure the propgation works in absence of a SDK. So as long as we can make sure the following is true, we should return a
|
It's right here (and in every language's contributing guide.): https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md#focus-on-capabilities-not-structure-compliance |
@jtescher any further thoughts on this? |
Hm so I think the spec provides default values in APIs like this mostly for convenience. E.g. as a user of this crate you don't need to check for // can always call methods, values ignored by non recording spans
cx.span().add_event(my_event)
// would have to explicitly handle the none case, with map or if let some..
cx.span().map(|span| span.add_event(my_event)) A similar choice is made in I'm not opposed to deviating from the spec, especially where it makes things more erconomic or idiomatic, I'm just not sure that is the case here. Are there other crates or otel language impls that return |
My concern for the first case above, and the current direction of returning no-op spans, is the performance implications. While it's nice that once can always call all span methods and the values, are ignored, it's definitely not nice that one still has to do all the work to create those ignored values in the first place. (And they are not exactly "cheap" values... one has to give up ownership of a In my ideal world, I want a lazily evaluated lambda to be passed which only gets called back when the all the data it's going to produce is actually going to be consumed for a benefit and not just immediately dropped. If one wants to add multiple attributes to a span (only if its recording) the current api would have us do: cx.span().add_attribute(KeyValue::new(...));
cx.span().add_attribute(KeyValue::new(...));
... Which is a real head-scratcher for me to understand in the context of the goals of instrumenting to be as close to zero-overhead as possible for the case when no sdk is being used or when 99.9% of the spans are not being sampled. So, I strongly prefer the Another alternative is a |
This would be an argument for changing all of the APIs that return spans (and likely meter instruments too) and removing the noop spec impl. E.g. creating a new span would need to also return an option as the span may not be recording. Could consider doing that, but it would be a larger scope.
The spec would have you do a check to let span = cx.span();
if span.is_recording() {
let attrs = expensive_call();
span.set_attributes(attrs);
}
I think that would likely be the most compliant way to accomplish what you're looking for as you'd want to both check that there is an active span and that the span is recording. Would look similar to the check above with a closure vs the current let span = cx.span();
span.visit_recording(|span| {
let attrs = expensive_call();
span.set_attributes(attrs);
}); |
I think I would be in favor of doing that across the board. It makes the notion of recording vs non-recording spans (etc) explicit in the type system, which I think is the idiomatic way to do things in Rust (which also makes it easier to build performant software, since the trade-offs are more obvious).
I think where possible, having a simple |
Agreed, but would still potentially need a way of distinguishing the concepts of an active vs recording. You can have all 3 possibilities: no span, active with recording, active without recording. So either this is returning if let Some(recording_span) = cx.span().filter(|span| span.is_recording()) {
let attrs = expensive_call();
span.set_attributes(attrs);
} Or you need a different way of propagating the current trace id as this method would no longer work: // default instead of existing trace if span is not recording
let span_context = cx.span().map(|span| span.span_context()).unwrap_or_default() The current API requires the explicit check to |
Okay, fair point. So maybe we should yield a trivariant enum? enum SpanState<'a> {
Recording(&'a mut Span),
Active(&'a Span),
None,
} |
I think that would be the clearest way to represent it in the type system. But do we want end users of this crate to have to handle all types at each callsite? Still seems like the current API has the simplest mental model exposed that allows for optional perf optimizations through checking span recording state. |
I mean, we can then add some helper methods on top of that type that make it easy to adapt to the caller's needs? Like I think Rust APIs usually optimize not for simplicity but for accurately modeling the underlying state. Yes, this can make things a bit more verbose, but also less error-prone and often more efficient. For such a core building block, it seems unfortunate to go the other way on this particular convention. |
In the interest of moving this forward towards a decision, here's my opinion on an improvement.
To handle further checking if the span is recording, encourage: let span = cx.span().filter(is_recording); I feel this is the idiomatic middle-ground that is still simple to use. |
@shaun-cox just to clarify, how do you see that being different from the current changes in this PR? (I think it's pretty close?) |
@djc, yep, no different than this PR. Just trying to vote for banking this progress rather than considering a bigger trivariant approach right now. Thanks! |
I personally still think the existing APIs are a better choice as I believe others will find them simpler to understand and use, as well as having the same performance optimizations available that an That being said if the other @open-telemetry/rust-approvers feel that this is the best way forward I won't block it. But if an alternate API is being considered it should be consistent with the |
@jtescher , I'm really trying to see where the same performance optimizations would be available with the current API. Could you expand a bit on it? I'd characterize a lot of the perf wins I've recently been involved with as eliminating expensive |
Which expensive clones on the no ops are you referring to? I was responding to the perf issues you were bringing up in #1089 (comment), e.g. being able to conditionally record fields if there is overhead when not recording. |
I was referring to #1140, where avoiding cloning So instead of this: let span = cx.span();
if span.is_recording() {
let attrs = expensive_call();
span.set_attributes(attrs);
} We could do something like this: cx.span()
.filter(is_recording)
.map(|span| {
let attrs = expensive_call();
span.set_attributes(attrs);
})
}); Now there is no no-op Putting it together with previous optimization to avoid obtaining a cloned Context::map_current(TraceContextEx::span)
.filter(is_recording)
.map(|span| {
let attrs = expensive_call();
span.set_attributes(attrs);
})
}); |
Hm not seeing any overhead of dropping the noop span when running benches vs an option. benchmarksgroup.bench_function(BenchmarkId::new("noop", p), |b| {
b.iter(|| {
black_box(Context::map_current(|cx| {
let span = cx.span();
drop(span)
}))
})
});
group.bench_function(BenchmarkId::new("option", p), |b| {
b.iter(|| {
black_box(Context::map_current(|cx| {
if let Some(span) = cx.span_opt() {
drop(span)
}
}))
})
}); They look effectively the same in terms of active or not active perf
Using this commit for diff jtescher@beabfcd Possibly something else was the source of the perf differences in #1140? |
Sorry, I didn't mean to confuse that it was just no-op Span clone elimination that would explain all of the performance gains, just that an example of using Option-like optimizations ala #1140 rather than returning cloned temporaries did result in performance gains... similar to how this PR has also measured performance gains (up to 8% in some scenarios glancing at the top of this thread) by having span() return an Option rather than a no-op instance. My original question a few replies up was trying to gain insight into how to achieve these performance optimizations without using Option, as you alluded that you think they are still possible? |
I can't reproduce the 8% number above. When I compare the start-end-span benchmarks against main I actually see worse performance on this branch, so something else may have changed in the mean time. |
Perhaps we could talk about this at the next SIG if @jtescher can make it? Since we've been talking about performance. |
Closing old, inactive PRs. Feel free to reopen when ready. Thanks. |
Changes
The changes that @shaun-cox made in #1088 make sense to me, but I wanted to look into the core
TraceContextExt
API, which I don't think makes much sense for Rust. Instead of having aspan()
method that fabricates aNOOP_SPAN
and yields if it there is no active span and a separatehas_active_span()
to check if there is a currently active span, it seems more idiomatic to makespan()
yield anOption
.Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes