-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add background for TensorArray #4564
add background for TensorArray #4564
Conversation
…_tensor_array_design
doc/design/tensor_array.md
Outdated
@@ -1,9 +1,50 @@ | |||
# Design for TensorArray | |||
## Background | |||
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest changing the first paragraph as:
This design doc presents the necessity of a new C++ class
TensorArray
. In addition to the very simple C++ implementationclass TensorArray : public std::vector<LoDTensor> { public: explicit TensorArray(const LoDTensor&); explicit TensorArray(int size);we also need to expose it to PaddlePaddle's Python API, because users would want to use it with our very flexible operators WhileOp. An example for your reference:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @wangkuiyi . We should introduce the TensorArray
and then start describing its use case in the RNN.
doc/design/tensor_array.md
Outdated
@@ -1,9 +1,50 @@ | |||
# Design for TensorArray | |||
## Background | |||
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @wangkuiyi . We should introduce the TensorArray
and then start describing its use case in the RNN.
doc/design/tensor_array.md
Outdated
## Background | ||
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step. | ||
|
||
An RNN could be implemented with the following pseudo codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An RNN can be implemented with the following pseudocode
doc/design/tensor_array.md
Outdated
step++; | ||
} | ||
``` | ||
According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs to support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
several different RNNs that Paddle will eventually support.
doc/design/tensor_array.md
Outdated
``` | ||
According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs to support. | ||
|
||
Currently, we have an RNN implementation called `recurrent_op` which takes tensor as input; it splits the input tensors into `input_segments`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the basic RNN implementation supported by Paddle is the recurrent_op
which takes tensors as input and splits them into input_segments
doc/design/tensor_array.md
Outdated
|
||
Currently, we have an RNN implementation called `recurrent_op` which takes tensor as input; it splits the input tensors into `input_segments`. | ||
|
||
Considering a tensor can't store variable-length sequences directly, we proposed the tensor with the level of details (`LoDTensor` for short). Segmenting the `LoDTensor` is much more complicated than splitting a tensor, that makes it necessary to refactor the `recurrent_op` with `LoDTensor` segmenting support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line can be changes to Since a tensor cannot store variable-length sequences directly, Paddle implements the tensor with level of details (LoDTensor
for short).
doc/design/tensor_array.md
Outdated
In the second stage, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences. | ||
The implementation is the same with `recurrent_op` except that **how to split the original input `LoDTensors` and outputs to get the `input_segments` and `output_segments`** . | ||
|
||
In the next stage, a dynamic RNN model based on dynamic operators would be supported. Though it can't be built on `recurrent_op` or `dynamic_recurrent_op` directly, the logic about how to split a tensor or a LoD tensor and get `input_segments` is the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence should be:
Though it can't be built over recurrent_op
or dynamic_recurrent_op
directly, the logic behind splitting a tensor or a LoD tensor into input_segments
remains the same.
doc/design/tensor_array.md
Outdated
In the next stage, a dynamic RNN model based on dynamic operators would be supported. Though it can't be built on `recurrent_op` or `dynamic_recurrent_op` directly, the logic about how to split a tensor or a LoD tensor and get `input_segments` is the same. | ||
|
||
## Why `TensorArray` | ||
In the three different RNNs, the logic of how to split the inputs to segments, states and outputs are similar and could be shared as a separate module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a separate module.
doc/design/tensor_array.md
Outdated
|
||
The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes. | ||
|
||
So there should be an array-like container which might store the segments of a tensor or LoD tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there should be an array-like container, which can store the segments of a tensor or LoD tensor.
doc/design/tensor_array.md
Outdated
The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes. | ||
|
||
So there should be an array-like container which might store the segments of a tensor or LoD tensor. | ||
**This container could store an array of tensor and provides several methods to split a tensor or a LoD tensor** , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This container could store an array of tensors and provides several methods to split a tensor or a LoD tensor
doc/design/tensor_array.md
Outdated
|
||
So there should be an array-like container which might store the segments of a tensor or LoD tensor. | ||
**This container could store an array of tensor and provides several methods to split a tensor or a LoD tensor** , | ||
that's where the notion `TensorArray` comes from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the notion of TensorArray
comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea and design are clean and elegant for LoD tensors of level=1.
Below are some thoughts on generalizing the idea - though this may be beyond the scope of this design.
I think the general problem can be stated as this: We want to convert an LoD tensor to a sequence of minibatches for efficient computation and then restore the computation results to an LoD tensor matching the original levels structure.
What we are allowed to batch together depends on which levels have sequential dependencies.
To use the example in other documents: Suppose an LoD tensor represents a document: It contains several paragraphs, each paragraph contains several sentences, each sentence contains several words.
If we treat the sentences within a paragraph as independent, but the paragraphs as having a sequential dependency, then we can batch together the sentences. So something like pack(batch_levels=[2])
.
If we treat the paragraphs as independent, too, then we can batch all sentences together. Something like: pack(batch_levels=[1, 2])
.
If the paragraphs and sentences both have sequential dependency, we have no choice but to run each sentence one by one. This would be pack(batch_levels=[])
.
Finally, if we treat the paragraphs as independent, but the sentences as having a sequential dependency, then we can batch together the first sentences of all paragraphs, then batch together the second sentences of all paragraphs, and so on. This would be something like pack(batch_levels=[1])
.
I think one can come up with real-life use cases for all these scenarios, so it would be nice to have that flexibility.
Finally, two questions I haven't thought much about yet but was curious if you could clarify:
- Is there an elegant way to handle reverse sequential dependency (so we can implement bidirectional RNN's)?
- Where is the best place to handle "max batch size" issues? Some options seem:
a. TensorArray::pack
b. No max batch size (i.e., creator of the LoD tensor is responsible for ensuring the pack operation does not create too large batches).
c. The computation code likernnstep
takes care of splitting into smaller batches when necessary.
Good suggestions, I will add some real-life use cases of Reply to those two questions:
after
The batches have 3, 3, 2, 1 instances respectively. So the max batch size might be the first batch's size 3, the number of batches is 4. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is only for the grammatical mistakes. Make sure somebody who has more context on TensorArray verifies the correctness of the information.
…_tensor_array_design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fix: #4621