Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add background for TensorArray #4564

Merged

Conversation

Superjomn
Copy link
Contributor

@Superjomn Superjomn commented Oct 3, 2017

fix: #4621

@Superjomn Superjomn changed the title add background add background for TensorArray Oct 3, 2017
@@ -1,9 +1,50 @@
# Design for TensorArray
## Background
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step.
Copy link
Collaborator

@wangkuiyi wangkuiyi Oct 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest changing the first paragraph as:

This design doc presents the necessity of a new C++ class TensorArray. In addition to the very simple C++ implementation

class TensorArray : public std::vector<LoDTensor> {
 public:
  explicit TensorArray(const LoDTensor&);
  explicit TensorArray(int size);

we also need to expose it to PaddlePaddle's Python API, because users would want to use it with our very flexible operators WhileOp. An example for your reference:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @wangkuiyi . We should introduce the TensorArray and then start describing its use case in the RNN.

@@ -1,9 +1,50 @@
# Design for TensorArray
## Background
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @wangkuiyi . We should introduce the TensorArray and then start describing its use case in the RNN.

## Background
Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call `states[step_id]` will get the state in `step_id`th time step.

An RNN could be implemented with the following pseudo codes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An RNN can be implemented with the following pseudocode

step++;
}
```
According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs to support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several different RNNs that Paddle will eventually support.

```
According to the [RNN roadmap](https://github.com/PaddlePaddle/Paddle/issues/4561), there are several different RNNs to support.

Currently, we have an RNN implementation called `recurrent_op` which takes tensor as input; it splits the input tensors into `input_segments`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the basic RNN implementation supported by Paddle is the recurrent_op which takes tensors as input and splits them into input_segments


Currently, we have an RNN implementation called `recurrent_op` which takes tensor as input; it splits the input tensors into `input_segments`.

Considering a tensor can't store variable-length sequences directly, we proposed the tensor with the level of details (`LoDTensor` for short). Segmenting the `LoDTensor` is much more complicated than splitting a tensor, that makes it necessary to refactor the `recurrent_op` with `LoDTensor` segmenting support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line can be changes to Since a tensor cannot store variable-length sequences directly, Paddle implements the tensor with level of details (LoDTensor for short).

In the second stage, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences.
The implementation is the same with `recurrent_op` except that **how to split the original input `LoDTensors` and outputs to get the `input_segments` and `output_segments`** .

In the next stage, a dynamic RNN model based on dynamic operators would be supported. Though it can't be built on `recurrent_op` or `dynamic_recurrent_op` directly, the logic about how to split a tensor or a LoD tensor and get `input_segments` is the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence should be:
Though it can't be built over recurrent_op or dynamic_recurrent_op directly, the logic behind splitting a tensor or a LoD tensor into input_segments remains the same.

In the next stage, a dynamic RNN model based on dynamic operators would be supported. Though it can't be built on `recurrent_op` or `dynamic_recurrent_op` directly, the logic about how to split a tensor or a LoD tensor and get `input_segments` is the same.

## Why `TensorArray`
In the three different RNNs, the logic of how to split the inputs to segments, states and outputs are similar and could be shared as a separate module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a separate module.


The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes.

So there should be an array-like container which might store the segments of a tensor or LoD tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there should be an array-like container, which can store the segments of a tensor or LoD tensor.

The array of `states`, `input_segments` and `output_segments` would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes.

So there should be an array-like container which might store the segments of a tensor or LoD tensor.
**This container could store an array of tensor and provides several methods to split a tensor or a LoD tensor** ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container could store an array of tensors and provides several methods to split a tensor or a LoD tensor


So there should be an array-like container which might store the segments of a tensor or LoD tensor.
**This container could store an array of tensor and provides several methods to split a tensor or a LoD tensor** ,
that's where the notion `TensorArray` comes from.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the notion of TensorArray comes from.

Copy link
Contributor

@mkliegl mkliegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea and design are clean and elegant for LoD tensors of level=1.

Below are some thoughts on generalizing the idea - though this may be beyond the scope of this design.

I think the general problem can be stated as this: We want to convert an LoD tensor to a sequence of minibatches for efficient computation and then restore the computation results to an LoD tensor matching the original levels structure.

What we are allowed to batch together depends on which levels have sequential dependencies.

To use the example in other documents: Suppose an LoD tensor represents a document: It contains several paragraphs, each paragraph contains several sentences, each sentence contains several words.

If we treat the sentences within a paragraph as independent, but the paragraphs as having a sequential dependency, then we can batch together the sentences. So something like pack(batch_levels=[2]).

If we treat the paragraphs as independent, too, then we can batch all sentences together. Something like: pack(batch_levels=[1, 2]).

If the paragraphs and sentences both have sequential dependency, we have no choice but to run each sentence one by one. This would be pack(batch_levels=[]).

Finally, if we treat the paragraphs as independent, but the sentences as having a sequential dependency, then we can batch together the first sentences of all paragraphs, then batch together the second sentences of all paragraphs, and so on. This would be something like pack(batch_levels=[1]).

I think one can come up with real-life use cases for all these scenarios, so it would be nice to have that flexibility.

Finally, two questions I haven't thought much about yet but was curious if you could clarify:

  1. Is there an elegant way to handle reverse sequential dependency (so we can implement bidirectional RNN's)?
  2. Where is the best place to handle "max batch size" issues? Some options seem:
    a. TensorArray::pack
    b. No max batch size (i.e., creator of the LoD tensor is responsible for ensuring the pack operation does not create too large batches).
    c. The computation code like rnnstep takes care of splitting into smaller batches when necessary.

@Superjomn
Copy link
Contributor Author

Superjomn commented Oct 3, 2017

Good suggestions, I will add some real-life use cases of pack and unpack latter.

Reply to those two questions:

  • the first
    For a reversed RNN, the RNN operators will have a reversed attribute, and that will control the RNN to traverse the sequence from tail to head.
    There is another way to do this, add a reverse() function to TensorArray.

  • the second
    TensorArray's unpack splits a LoDTensor into batches, for example, a LoD tensor's content might have 3 sequences at some level, the sequence length is 4,3,2 respectively.

xxxx
xxx
xx

after unpack operation, it will be split into 4 batches:

0      1      2     3
x      x      x     x
x      x      x
x      x

The batches have 3, 3, 2, 1 instances respectively.

So the max batch size might be the first batch's size 3, the number of batches is 4.

The pack operation will concatenate the batches to the original LoD-formated Tensor.

@mkliegl

abhinavarora
abhinavarora previously approved these changes Oct 4, 2017
Copy link
Contributor

@abhinavarora abhinavarora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is only for the grammatical mistakes. Make sure somebody who has more context on TensorArray verifies the correctness of the information.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Superjomn Superjomn merged commit 3419384 into PaddlePaddle:develop Oct 7, 2017
@Superjomn Superjomn deleted the bug/fix_tensor_array_design branch October 7, 2017 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

more TensorArray background is needed in design
5 participants