citylearn.agents.sac module

class citylearn.agents.sac.SAC(env: CityLearnEnv, **kwargs: Any)[source]

Bases: RLC

get_encoded_observations(index: int, observations: List[float]) → ndarray[Any, dtype[float64]][source]

get_exploration_prediction(observations: List[List[float]]) → List[List[float]][source]: Return randomly sampled actions from action_space multiplied by action_scaling_coefficient.

get_normalized_observations(index: int, observations: List[float]) → ndarray[Any, dtype[float64]][source]

get_normalized_reward(index: int, reward: float) → float[source]

get_post_exploration_prediction(observations: List[List[float]], deterministic: bool) → List[List[float]][source]: Action sampling using policy, post-exploration time step

predict(observations: List[List[float]], deterministic: bool = None)[source]

Provide actions for current time step.

Will return randomly sampled actions from action_space if end_exploration_time_step <= time_step else will use policy to sample actions.

Parameters:

observations (List[List[float]]) – Environment observations
deterministic (bool, default: False) – Wether to return purely exploitatative deterministic actions.

Returns:

actions – Action values

Return type:

List[float]

set_encoders() → List[List[Encoder]][source]

Get observation value transformers/encoders for use in agent algorithm.

The encoder classes are defined in the preprocessing.py module and include PeriodicNormalization for cyclic observations, OnehotEncoding for categorical obeservations, RemoveFeature for non-applicable observations given available storage systems and devices and Normalize for observations with known minimum and maximum boundaries.

Returns:: encoders – Encoder classes for observations ordered with respect to active_observations.
Return type:: List[List[Encoder]]

set_networks(internal_observation_count: int = None)[source]

update(observations: List[List[float]], actions: List[List[float]], reward: List[float], next_observations: List[List[float]], terminated: bool, truncated: bool)[source]

Update replay buffer.

Parameters:

observations (List[List[float]]) – Previous time step observations.
actions (List[List[float]]) – Previous time step actions.
reward (List[float]) – Current time step reward.
next_observations (List[List[float]]) – Current time step observations.
terminated (bool) – Indication that episode has ended.
truncated (bool) – If episode truncates due to a time limit or a reason that is not defined as part of the task MDP.

class citylearn.agents.sac.SACRBC(env: CityLearnEnv, rbc: RBC | str = None, **kwargs: Any)[source]

Bases: SAC

Uses citylearn.agents.rbc.RBC to select actions during exploration before using citylearn.agents.sac.SAC.

Parameters:

env (CityLearnEnv) – CityLearn environment.
rbc (RBC) – citylearn.agents.rbc.RBC or child class, used to select actions during exploration.
**kwargs (Any) – Other keyword arguments used to initialize super class.

get_exploration_prediction(observations: List[float]) → List[float][source]: Return actions using RBC.

property rbc: RBC: citylearn.agents.rbc.RBC class child class or string path to an RBC class e.g. ‘citylearn.agents.rbc.RBC’, used to select actions during exploration.