citylearn.agents.sac module

class citylearn.agents.sac.SAC(*args, **kwargs)[source]

Bases: citylearn.agents.rlc.RLC

add_to_buffer(observations: List[List[float]], actions: List[List[float]], reward: List[float], next_observations: List[List[float]], done: bool)[source]

Update replay buffer.

  • observations (List[List[float]]) – Previous time step observations.

  • actions (List[List[float]]) – Previous time step actions.

  • reward (List[float]) – Current time step reward.

  • next_observations (List[List[float]]) – Current time step observations.

  • done (bool) – Indication that episode has ended.

get_encoded_observations(index: int, observations: List[float]) numpy.ndarray[Any, numpy.dtype[numpy.float64]][source]
get_exploration_actions(observations: List[List[float]]) List[List[float]][source]

Return randomly sampled actions from action_space multiplied by action_scaling_coefficient.


actions – Action values.

Return type


get_normalized_observations(index: int, observations: List[float]) numpy.ndarray[Any, numpy.dtype[numpy.float64]][source]
get_normalized_reward(index: int, reward: float) float[source]
get_post_exploration_actions(observations: List[List[float]], deterministic: bool) List[List[float]][source]

Action sampling using policy, post-exploration time step

select_actions(observations: List[List[float]], deterministic: Optional[bool] = None)[source]

Provide actions for current time step.

Will return randomly sampled actions from action_space if end_exploration_time_step >= time_step else will use policy to sample actions.

  • observations (List[List[float]]) – Environment observations

  • deterministic (bool, default: False) – Wether to return purely exploitatative deterministic actions.


actions – Action values

Return type


set_encoders() List[List[citylearn.preprocessing.Encoder]][source]

Get observation value transformers/encoders for use in agent algorithm.

The encoder classes are defined in the module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.


encoders – Encoder classes for observations ordered with respect to active_observations.

Return type


set_networks(internal_observation_count: Optional[int] = None)[source]
class citylearn.agents.sac.SACBasicBatteryRBC(*args, **kwargs)[source]

Bases: citylearn.agents.sac.SACBasicRBC

class citylearn.agents.sac.SACBasicRBC(*args, **kwargs)[source]

Bases: citylearn.agents.sac.SACRBC

class citylearn.agents.sac.SACOptimizedRBC(*args, **kwargs)[source]

Bases: citylearn.agents.sac.SACBasicRBC

class citylearn.agents.sac.SACRBC(*args, **kwargs)[source]

Bases: citylearn.agents.sac.SAC

get_exploration_actions(states: List[float]) List[float][source]

Return actions using RBC.


actions – Action values.

Return type


property rbc: citylearn.agents.rbc.RBC

RBC or child class, used to select actions during exploration.