citylearn.agents.rlc module

class citylearn.agents.rlc.RLC(env: CityLearnEnv, hidden_dimension: List[float] = None, discount: float = None, tau: float = None, alpha: float = None, lr: float = None, batch_size: int = None, replay_buffer_capacity: int = None, standardize_start_time_step: int = None, end_exploration_time_step: int = None, action_scaling_coefficienct: float = None, reward_scaling: float = None, update_per_time_step: int = None, **kwargs: Any)[source]

Bases: Agent

Base reinforcement learning controller class.

Parameters:
  • env (CityLearnEnv) – CityLearn environment.

  • hidden_dimension (List[float], default: [256, 256]) – Hidden dimension.

  • discount (float, default: 0.99) – Discount factor.

  • tau (float, default: 5e-3) – Decay rate.

  • alpha (float, default: 0.2) – Temperature; exploration-exploitation balance term.

  • lr (float, default: 3e-4) – Learning rate.

  • batch_size (int, default: 256) – Batch size.

  • replay_buffer_capacity (int, default: 1e5) – Replay buffer capacity.

  • standardize_start_time_step (int, optional) – Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 2.

  • end_exploration_time_step (int, optional) – Time step to stop random or RBC-guided exploration. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 1.

  • action_scaling_coefficient (float, default: 0.5) – Action scaling coefficient.

  • reward_scaling (float, default: 5.0) – Reward scaling.

  • update_per_time_step (int, default: 2) – Number of updates per time step.

  • **kwargs (Any) – Other keyword arguments used to initialize super class.

property action_scaling_coefficient: float

Action scaling coefficient.

property alpha: float

Temperature; exploration-exploitation balance term.

property batch_size: int

Batch size.

property discount: float

Discount factor.

property end_exploration_time_step: int

Time step to stop exploration. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 1.

property hidden_dimension: List[float]

Hidden dimension.

property lr: float

Learning rate.

property observation_dimension: int

Number of observations after applying encoders.

property random_seed: int

Pseudorandom number generator seed for repeatable results.

property replay_buffer_capacity: int

Replay buffer capacity.

property reward_scaling: float

Reward scaling.

set_encoders() List[List[Encoder]][source]

Get observation value transformers/encoders for use in agent algorithm.

The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.

Returns:

encoders – Encoder classes for observations ordered with respect to active_observations.

Return type:

List[List[Encoder]]

property standardize_start_time_step: int

Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 2.

property tau: float

Decay rate.

property update_per_time_step: int

Number of updates per time step.