citylearn.agents.rlc module

class citylearn.agents.rlc.RLC(env: CityLearnEnv, hidden_dimension: List[float] = None, discount: float = None, tau: float = None, alpha: float = None, lr: float = None, batch_size: int = None, replay_buffer_capacity: int = None, standardize_start_time_step: int = None, end_exploration_time_step: int = None, action_scaling_coefficienct: float = None, reward_scaling: float = None, update_per_time_step: int = None, **kwargs: Any)[source]

Bases: Agent

Base reinforcement learning controller class.

Parameters:

env (CityLearnEnv) – CityLearn environment.
hidden_dimension (List[float], default: [256, 256]) – Hidden dimension.
discount (float, default: 0.99) – Discount factor.
tau (float, default: 5e-3) – Decay rate.
alpha (float, default: 0.2) – Temperature; exploration-exploitation balance term.
lr (float, default: 3e-4) – Learning rate.
batch_size (int, default: 256) – Batch size.
replay_buffer_capacity (int, default: 1e5) – Replay buffer capacity.
standardize_start_time_step (int, optional) – Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 2.
end_exploration_time_step (int, optional) – Time step to stop random or RBC-guided exploration. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 1.
action_scaling_coefficient (float, default: 0.5) – Action scaling coefficient.
reward_scaling (float, default: 5.0) – Reward scaling.
update_per_time_step (int, default: 2) – Number of updates per time step.
**kwargs (Any) – Other keyword arguments used to initialize super class.

property action_scaling_coefficient: float: Action scaling coefficient.

property alpha: float: Temperature; exploration-exploitation balance term.

property batch_size: int: Batch size.

property discount: float: Discount factor.

property end_exploration_time_step: int: Time step to stop exploration. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 1.

property hidden_dimension: List[float]: Hidden dimension.

property lr: float: Learning rate.

property observation_dimension: int: Number of observations after applying encoders.

property random_seed: int: Pseudorandom number generator seed for repeatable results.

property replay_buffer_capacity: int: Replay buffer capacity.

property reward_scaling: float: Reward scaling.

set_encoders() → List[List[Encoder]][source]

Get observation value transformers/encoders for use in agent algorithm.

The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.

Returns:: encoders – Encoder classes for observations ordered with respect to active_observations.
Return type:: List[List[Encoder]]

property standardize_start_time_step: int: Time step to calculate mean and standard deviation, and begin standardization of observations and rewards in replay buffer. Defaults to citylearn.citylearn.CityLearnEnv.time_steps - 2.

property tau: float: Decay rate.

property update_per_time_step: int: Number of updates per time step.