Skip to content

Latest commit



273 lines (245 loc) · 11.9 KB

File metadata and controls

273 lines (245 loc) · 11.9 KB




  • trainer:
    • Custom scoring now supported for selecting the best model. #1202
  • highlevel:
    • DiscreteSACExperimentBuilder: Expose method with_actor_factory_default #1248 #1250

Breaking Changes

  • data:
    • stats:
      • InfoStats has a new non-optional field best_score which is used for selecting the best model. #1202

Release 1.1.0


Evaluation Package

This release introduces a new package evaluation that integrates best practices for running experiments (seeding test and train environmets) and for evaluating them using the rliable library. This should be especially useful for algorithm developers for comparing performances and creating meaningful visualizations. This functionality is currently in alpha state and will be further improved in the next releases. You will need to install tianshou with the extra eval to use it.

The creation of multiple experiments with varying random seeds has been greatly facilitated. Moreover, the ExpLauncher interface has been introduced and implemented with several backends to support the execution of multiple experiments in parallel.

An example for this using the high-level interfaces can be found here, examples that use low-level interfaces will follow soon.

Improvements in Batch

Apart from that, several important extensions have been added to internal data structures, most notably to Batch. Batches now implement __eq__ and can be meaningfully compared. Applying operations in a nested fashion has been significantly simplified, and checking for NaNs and dropping them is now possible.

One more notable change is that torch Distribution objects are now sliced when slicing a batch. Previously, when a Batch with say 10 actions and a dist corresponding to them was sliced to [:3], the dist in the result would still correspond to all 10 actions. Now, the dist is also "sliced" to be the distribution of the first 3 actions.

A detailed list of changes can be found below.


  • evaluation: New package for repeating the same experiment with multiple seeds and aggregating the results. #1074 #1141 #1183
  • data:
    • Batch:
      • Add methods to_dict and to_list_of_dicts. #1063 #1098
      • Add methods to_numpy_ and to_torch_. #1098, #1117
      • Add __eq__ (semantic equality check). #1098
      • keys() deprecated in favor of get_keys() (needed to make iteration consistent with naming) #1105.
      • Major: new methods for applying functions to values, to check for NaNs and drop them, and to set values. #1181
      • Slicing a batch with a torch distribution now also slices the distribution. #1181
    • data.collector:
      • Collector:
        • Introduced BaseCollector as a base class for all collectors. #1123
        • Add method close #1063
        • Method reset is now more granular (new flags controlling behavior). #1063
      • CollectStats: Add convenience constructor with_autogenerated_stats. #1063
  • trainer:
    • Trainers can now control whether collectors should be reset prior to training. #1063
  • policy:
    • introduced attribute in_training_step that is controlled by the trainer. #1123
    • policy automatically set to eval mode when collecting and to train mode when updating. #1123
    • Extended interface of compute_action to also support array-like inputs #1169
  • highlevel:
    • SamplingConfig:
      • Add support for batch_size=None. #1077
      • Add training_seed for explicit seeding of training and test environments, the test_seed is inferred from training_seed. #1074
    • experiment:
      • Experiment now has a name attribute, which can be set using ExperimentBuilder.with_name and which determines the default run name and therefore the persistence subdirectory. It can still be overridden in, the new parameter name being run_name rather than experiment_name (although the latter will still be interpreted correctly). #1074 #1131
      • Add class ExperimentCollection for the convenient execution of multiple experiment runs #1131
      • The World object, containing all low-level objects needed for experimentation, can now be extracted from an Experiment instance. This enables customizing the experiment prior to its execution, bridging the low and high-level interfaces. #1187
      • ExperimentBuilder:
        • Add method build_seeded_collection for the sound creation of multiple experiments with varying random seeds #1131
        • Add method copy to facilitate the creation of multiple experiments from a single builder #1131
    • env:
      • Added new VectorEnvType called SUBPROC_SHARED_MEM_AUTO and used in for Atari and Mujoco venv creation. #1141
  • utils:
    • logger:
      • Loggers can now restore the logged data into python by using the new restore_logged_data method. #1074
      • Wandb logger extended #1183
    • net.continuous.Critic:
      • Add flag apply_preprocess_net_to_obs_only to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network #1128
    • torch_utils (new module)
      • Added context managers torch_train_mode and policy_within_training_step #1123
    • print
      • DataclassPPrintMixin now supports outputting a string, not just printing the pretty repr. #1141


  • highlevel:
    • CriticFactoryReuseActor: Enable the Critic flag apply_preprocess_net_to_obs_only for continuous critics, fixing the case where we want to reuse an actor's preprocessing network for the critic (affects usages of the experiment builder method with_critic_factory_use_actor with continuous environments) #1128
    • Policy parameter action_scaling value "default" was not correctly transformed to a Boolean value for algorithms SAC, DDPG, TD3 and REDQ. The value "default" being truthy caused action scaling to be enabled even for discrete action spaces. #1191
  • atari_network.DQN:
    • Fix constructor input validation #1128
    • Fix output_dim not being set if features_only=True and output_dim_added_layer is not None #1128
  • PPOPolicy:
    • Fix max_batchsize not being used in logp_old computation inside process_fn #1168
  • Fix Batch.__eq__ to allow comparing Batches with scalar array values #1185

Internal Improvements

  • Collectors rely less on state, the few stateful things are stored explicitly instead of through a .data attribute. #1063
  • Introduced a first iteration of a naming convention for vars in Collectors. #1063
  • Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
  • Improved typing for exploration_noise and within Collector. #1063
  • Better variable names related to model outputs (logits, dist input etc.). #1032
  • Improved typing for actors and critics, using Tianshou classes like Actor, ActorProb, etc., instead of just nn.Module. #1032
  • Added interfaces for most Actor and Critic classes to enforce the presence of forward methods. #1032
  • Simplified PGPolicy forward by unifying the dist_fn interface (see associated breaking change). #1032
  • Use .mode of distribution instead of relying on knowledge of the distribution type. #1032
  • Exception no longer raised on len of empty Batch. #1084
  • tests and examples are covered by mypy. #1077
  • NetBase is more used, stricter typing by making it generic. #1077
  • Use explicit multiprocessing context for creating Pipe in #1102

Breaking Changes

  • data:
    • Collector:
      • Removed .data attribute. #1063
      • Collectors no longer reset the environment on initialization. Instead, the user might have to call reset expicitly or pass reset_before_collect=True . #1063
      • Removed no_grad argument from collect method (was unused in tianshou). #1123
    • Batch:
      • Fixed iter(Batch(...) which now behaves the same way as Batch(...).__iter__(). Can be considered a bugfix. #1063
      • The methods to_numpy and to_torch in are not in-place anymore (use to_numpy_ or to_torch_ instead). #1098, #1117
      • The method Batch.is_empty has been removed. Instead, the user can simply check for emptiness of Batch by using len on dicts. #1144
      • Stricter cat_, only concatenation of batches with the same structure is allowed. #1181
      • to_torch and to_numpy are no longer static methods. So Batch.to_numpy(batch) should be replaced by batch.to_numpy(). #1200
  • utils:
    • logger:
      • BaseLogger.prepare_dict_for_logging is now abstract. #1074
      • Removed deprecated and unused BasicLogger (only affects users who subclassed it). #1074
      • Recurrent now receives and returns a RecurrentStateBatch instead of a dict. #1077
    • Modules with code that was copied from sensAI have been replaced by imports from new dependency sensAI-utils:
      • tianshou.utils.logging is replaced with sensai.util.logging
      • tianshou.utils.string is replaced with sensai.util.string
      • tianshou.utils.pickle is replaced with sensai.util.pickle
  • env:
    • All VectorEnvs now return a numpy array of info-dicts on reset instead of a list. #1063
  • policy:
    • Changed interface of dist_fn in PGPolicy and all subclasses to take a single argument in both continuous and discrete cases. #1032
  • AtariEnvFactory constructor (in examples, so not really breaking) now requires explicit train and test seeds. #1074
  • EnvFactoryRegistered now requires an explicit test_seed in the constructor. #1074
  • highlevel:
    • params: The parameter dist_fn has been removed from the parameter objects (PGParams, A2CParams, PPOParams, NPGParams, TRPOParams). The correct distribution is now determined automatically based on the actor factory being used, avoiding the possibility of misspecification. Persisted configurations/policies continue to work as expected, but code must not specify the dist_fn parameter. #1194 #1195
    • env:
      • EnvFactoryRegistered: parameter seed has been replaced by the pair of parameters train_seed and test_seed Persisted instances will continue to work correctly. Subclasses such as AtariEnvFactory are also affected requires explicit train and test seeds. #1074
      • VectorEnvType: SUBPROC_SHARED_MEM has been replaced by SUBPROC_SHARED_MEM_DEFAULT. It is recommended to use SUBPROC_SHARED_MEM_AUTO instead. However, persisted configs will continue working. #1141


  • Fixed env seeding it so that the test doesn't fail randomly. #1081
  • Improved CI triggers and added telemetry (if requested by user) #1177
  • Improved environment used in tests.
  • Improved tests bach equality to check with scalar values #1185


  • DeepDiff added to help with diffs of batches in tests. #1098
  • Bumped black, idna, pillow
  • New extra "eval"
  • Bumped numba to >=60.0.0, permitting installation on python 3.12 # 1177
  • New dependency sensai-utils

Started after v1.0.0