You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
It seems that the implemented replay buffers only operate over transitions, with no ability to operate over entire sequences. This prevents the use of recurrent policies for tackling POMDPs.
Describe the solution you'd like
A SequenceReplayBuffer that returns contiguous episodes instead of shuffled transitions.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Thanks, that's a very good suggestion. It's definitely been on mind.
I'm thinking of having a reward tracer that does something similar to what the frame stacking wrapper does. The idea is to stack entire transitions rather than only the observations. As long as we ensure to only create shallow copies (i.e. not copying the actual numpy arrays), I think we could keep this fairly lightweight and simple.
You can also achieve something via the record_extra_info of the NStep reward tracer. Its a little besides the point but will give you the n observations, actions, etc. that follows a sampled observation.
I don't actually know enough about the architecture to provide good advice. I just found the design of coax really clean, and was considering porting some of my models to the framework.
Is your feature request related to a problem? Please describe.
It seems that the implemented replay buffers only operate over transitions, with no ability to operate over entire sequences. This prevents the use of recurrent policies for tackling POMDPs.
Describe the solution you'd like
A
SequenceReplayBuffer
that returns contiguous episodes instead of shuffled transitions.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: