citylearn.agents.base module

class citylearn.agents.base.Agent(env: CityLearnEnv, **kwargs: Any)[source]

Bases: Environment

Base agent class.

Parameters:

env (CityLearnEnv) – CityLearn environment.
**kwargs (dict) – Other keyword arguments used to initialize super class.

property action_dimension: List[int]: Number of returned actions.

property action_names: List[List[str]]: Names of active actions that can be used to map action values.

property action_space: List[Box]: Format of valid actions.

property actions: List[List[List[Any]]]: Action history/time series.

property building_metadata: List[Mapping[str, Any]]: Building(s) metadata.

property env: CityLearnEnv: CityLearn environment.

property episode_time_steps: int

learn(episodes: int = None, deterministic: bool = None, deterministic_finish: bool = None, logging_level: int = None)[source]

Train agent.

Parameters:

episodes (int, default: 1) – Number of training episode >= 1.
deterministic (bool, default: False) – Indicator to take deterministic actions i.e. strictly exploit the learned policy.
deterministic_finish (bool, default: False) – Indicator to take deterministic actions in the final episode.
logging_level (int, default: 30) – Logging level where increasing the number silences lower level information.

next_time_step()[source]

Advance to next time_step value.

Notes

Override in subclass for custom implementation when advancing to next time_step.

property observation_names: List[List[str]]: Names of active observations that can be used to map observation values.

property observation_space: List[Box]: Format of valid observations.

predict(observations: List[List[float]], deterministic: bool = None) → List[List[float]][source]

Provide actions for current time step.

Return randomly sampled actions from action_space.

Parameters:

observations (List[List[float]]) – Environment observations
deterministic (bool, default: False) – Wether to return purely exploitatative deterministic actions.

Returns:

actions – Action values

Return type:

List[List[float]]

reset()[source]

Reset environment to initial state.

Calls reset_time_step.

Notes

Override in subclass for custom implementation when reseting environment.

update(*args, **kwargs)[source]

Update replay buffer and networks.

Notes

This implementation does nothing but is kept to keep the API for all agents similar during simulation.

class citylearn.agents.base.BaselineAgent(env: CityLearnEnv, **kwargs: Any)[source]

Bases: Agent

Agent class for business-as-usual simulation where the storage systems and heat pumps are not controlled.

This agent will provide results for when there is no storage for load shifting and no heat pump partial load. The storage actions prescribed will be 0.0 and the heat pump will have no action, i.e. None, causing it to deliver the ideal load in the building time series files.

To ensure that the environment does not expect non-zero and non-null actions, the buildings in the parsed env will be set to have no active actions. This means that you must initialize a new env if you want to simulate with a new agent type.

This agent class is best used to establish a baseline simulation that can then be compared to RBC, RLC, or MPC control algorithms.

Parameters:

env (CityLearnEnv) – CityLearn environment.
**kwargs (dict) – Other keyword arguments used to initialize super class.

property env: CityLearnEnv: CityLearn environment.

predict(observations: List[List[float]], deterministic: bool = None) → List[List[float]][source]

Provide actions for current time step.

Return randomly sampled actions from action_space.

Parameters:

observations (List[List[float]]) – Environment observations
deterministic (bool, default: False) – Wether to return purely exploitatative deterministic actions.

Returns:

actions – Action values

Return type:

List[List[float]]