citylearn.reward_function module
- class citylearn.reward_function.ComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None)[source]
Bases:
RewardFunction
Reward for occupant thermal comfort satisfaction.
The reward is calculated as the negative difference between the setpoint and indoor dry-bulb temperature raised to some exponent if outside the comfort band. If within the comfort band, the reward is the negative difference when in cooling mode and temperature is below the setpoint or when in heating mode and temperature is above the setpoint. The reward is 0 if within the comfort band and above the setpoint in cooling mode or below the setpoint and in heating mode.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
band (float, default: 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.
lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.
higher_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.
- property band: float
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property higher_exponent: float
- property lower_exponent: float
- class citylearn.reward_function.IndependentSACReward(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
Recommended for use with the SAC controllers.
Returned reward assumes that the building-agents act independently of each other, without sharing information through the reward.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- class citylearn.reward_function.MARL(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
MARL reward function class.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- class citylearn.reward_function.RewardFunction(env_metadata: Mapping[str, Any], exponent: float = None, **kwargs)[source]
Bases:
object
Base and default reward function class.
The default reward is the electricity consumption from the grid at the current time step returned as a negative value.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
**kwargs (dict) – Other keyword arguments for custom reward calculation.
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property central_agent: bool
Expect 1 central agent to control all buildings.
- property env_metadata: Mapping[str, Any]
General static information about the environment.
- property exponent: float
- class citylearn.reward_function.SolarPenaltyAndComfortReward(env_metadata: Mapping[str, Any], band: float = None, lower_exponent: float = None, higher_exponent: float = None, coefficients: Tuple = None)[source]
Bases:
RewardFunction
Addition of
citylearn.reward_function.SolarPenaltyReward
andcitylearn.reward_function.ComfortReward
.- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
band (float, default = 2.0) – Setpoint comfort band (+/-). If not provided, the comfort band time series defined in the building file, or the default time series value of 2.0 is used.
lower_exponent (float, default = 2.0) – Penalty exponent for when in cooling mode but temperature is above setpoint upper boundary or heating mode but temperature is below setpoint lower boundary.
higher_exponent (float, default = 3.0) – Penalty exponent for when in cooling mode but temperature is below setpoint lower boundary or heating mode but temperature is above setpoint upper boundary.
coefficients (Tuple, default = (1.0, 1.0)) – Coefficents for citylearn.reward_function.SolarPenaltyReward and
citylearn.reward_function.ComfortReward
values respectively.
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]
- property coefficients: Tuple
- property env_metadata: Mapping[str, Any]
General static information about the environment.
- class citylearn.reward_function.SolarPenaltyReward(env_metadata: Mapping[str, Any])[source]
Bases:
RewardFunction
The reward is designed to minimize electricity consumption and maximize solar generation to charge energy storage systems.
The reward is calculated for each building, i and summed to provide the agent with a reward that is representative of all the building or buildings (in centralized case)it controls. It encourages net-zero energy use by penalizing grid load satisfaction when there is energy in the energy storage systems as well as penalizing net export when the energy storage systems are not fully charged through the penalty term. There is neither penalty nor reward when the energy storage systems are fully charged during net export to the grid. Whereas, when the energy storage systems are charged to capacity and there is net import from the grid the penalty is maximized.
- Parameters:
env_metadata (Mapping[str, Any]:) – General static information about the environment.
- calculate(observations: List[Mapping[str, int | float]]) List[float] [source]
Calculates reward.
- Parameters:
observations (List[Mapping[str, Union[int, float]]]) – List of all building observations at current
citylearn.citylearn.CityLearnEnv. time_step
that are got from callingcitylearn.building.Building.observations()
.- Returns:
reward – Reward for transition to current timestep.
- Return type:
List[float]