citylearn.reward_function module

class citylearn.reward_function.ComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None)[source]

Bases: RewardFunction

Reward for occupant thermal comfort satisfaction.

The reward is calculated as the negative difference between the setpoint and indoor dry-bulb temperature raised to some exponent if outside the comfort band. If within the comfort band, the reward is the negative difference when in cooling mode and temperature is below the setpoint or when in heating mode and temperature is above the setpoint. The reward is 0 if within the comfort band and above the setpoint in cooling mode or below the setpoint and in heating mode.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • band (float, default = 2.0) – Setpoint comfort band (+/-).

  • lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.

  • higher_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.

property band: float
calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property higher_exponent: float
property lower_exponent: float
class citylearn.reward_function.IndependentSACReward(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

Recommended for use with the SAC controllers.

Returned reward assumes that the building-agents act independently of each other, without sharing information through the reward.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

class citylearn.reward_function.MARL(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

MARL reward function class.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

class citylearn.reward_function.RewardFunction(env_metadata: Mapping[str, Any], exponent: float = None, **kwargs)[source]

Bases: object

Base and default reward function class.

The default reward is the electricity consumption from the grid at the current time step returned as a negative value.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • **kwargs (dict) – Other keyword arguments for custom reward calculation.

calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property central_agent: bool

Expect 1 central agent to control all buildings.

property env_metadata: Mapping[str, Any]

General static information about the environment.

property exponent: float
reset()[source]

Use to reset variables at the start of an episode.

class citylearn.reward_function.SolarPenaltyAndComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None, coefficients: Tuple = None)[source]

Bases: RewardFunction

Addition of citylearn.reward_function.SolarPenaltyReward and citylearn.reward_function.ComfortReward.

Parameters:
  • env_metadata (Mapping[str, Any]:) – General static information about the environment.

  • band (float, default = 2.0) – Setpoint comfort band (+/-).

  • lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.

  • higher_exponent (float, default = 3.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.

  • coefficients (Tuple, default = (1.0, 1.0)) – Coefficents for citylearn.reward_function.SolarPenaltyReward and citylearn.reward_function.ComfortReward values respectively.

calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]

property coefficients: Tuple
property env_metadata: Mapping[str, Any]

General static information about the environment.

class citylearn.reward_function.SolarPenaltyReward(env_metadata: Mapping[str, Any])[source]

Bases: RewardFunction

The reward is designed to minimize electricity consumption and maximize solar generation to charge energy storage systems.

The reward is calculated for each building, i and summed to provide the agent with a reward that is representative of all the building or buildings (in centralized case)it controls. It encourages net-zero energy use by penalizing grid load satisfaction when there is energy in the energy storage systems as well as penalizing net export when the energy storage systems are not fully charged through the penalty term. There is neither penalty nor reward when the energy storage systems are fully charged during net export to the grid. Whereas, when the energy storage systems are charged to capacity and there is net import from the grid the penalty is maximized.

Parameters:

env_metadata (Mapping[str, Any]:) – General static information about the environment.

calculate(observations: List[Mapping[str, int | float]]) List[float][source]

Calculates reward.

Parameters:

observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current citylearn.citylearn.CityLearnEnv. time_step that are got from calling citylearn.building.Building.observations().

Returns:

reward – Reward for transition to current timestep.

Return type:

List[float]