Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datadog_traces - Failed to encode Datadog traces. #14244

Closed
yocca opened this issue Sep 1, 2022 · 2 comments · Fixed by #18903
Closed

datadog_traces - Failed to encode Datadog traces. #14244

yocca opened this issue Sep 1, 2022 · 2 comments · Fixed by #18903
Assignees
Labels
domain: traces Anything related to Vectors' trace events sink: datadog_traces Anything `datadog_traces` sink related type: bug A code related bug.

Comments

@yocca
Copy link

yocca commented Sep 1, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We are receiving the following error message very often after updating our datadog agent to send traces through vector:

ERROR sink{component_kind="sink" component_id=datadog_traces_sink component_type=datadog_traces component_name=datadog_traces_sink}: vector::internal_events::datadog_traces: Failed to encode Datadog traces. error=unable to split into small chunks error_reason=message_too_big error_type="encoder_failed" stage="processing"

Configuration

sources:
  datadog_source:
    type: datadog_agent
    address: 0.0.0.0:80
    multiple_outputs: true
sinks:
  datadog_traces_sink:
    type: datadog_traces
    inputs:
      - datadog_agent.traces
    default_api_key: $DATADOG_API_KEY

Version

vector 0.24.0 (aarch64-unknown-linux-gnu 43267b9 2022-08-30)

Debug Output

Example Data

No response

Additional Context

We started sending traces from the datadog agent through vector on 8/30 around 1pm where trace metrics started to drop off as seen in the following image:
image

But we didn't seen any change in the amount of traces indexed for that same operation
image

And we still have the same amount of ingested traces in datadog for the same time period when traces are going through vector:
image

A theory is that there's an issue when vector transforms traces and that somehow results in a large portion of traces not being counted in trace metrics. Because we're using trace metrics for our APM view that data has become way less useful so we decided to rollback sending traces through vector and now we're sending traces directly from the datadog agent to datadog.

References

No response

@yocca yocca added the type: bug A code related bug. label Sep 1, 2022
@jszwedko jszwedko added domain: traces Anything related to Vectors' trace events sink: datadog_traces Anything `datadog_traces` sink related labels Sep 6, 2022
@szibis
Copy link
Contributor

szibis commented Apr 3, 2023

@jszwedko We see something similar but what is strange it is only visible for Java services and Ruby services are fine.
Can we somehow bump this issue as this is confusing for people and the experience is broken in Datadog?

@neuronull
Copy link
Contributor

I was able to reproduce this locally by setting the PAYLOAD_LIMIT (not exposed) to very small like 2k bytes. And changing the batch settings to correspond.

What seems to be happening here is the logic in encode_trace() for splitting a set of traces into smaller encoded payloads, is not going to work if the number of events in the batch is small but the encoded event sizes are very large. Code comments indicate a 3.2 MB limit requirement for the Datadog backend receiving, and that is what this is all based on.
Further investigation is necessary but, if we are receiving an event that is that large, there must be a way to handle splitting that up. It would be interesting to see how the Agent handles this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: traces Anything related to Vectors' trace events sink: datadog_traces Anything `datadog_traces` sink related type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants