# Reward Function

A reward is calculated and returned each time citylearn.citylearn.CityLearnEnv.step() is called. The reward time series is also accessible through the citylearn.citylearn.CityLearnEnv.rewards property.

CityLearn provides custom reward functions:

Class

Equation

citylearn.reward_function.RewardFunction

$min(-e, 0)$

citylearn.reward_function.MARL

$\textrm{sign}(-e) \times 0.01(e^2) \times \textrm{max}(0, E)$

citylearn.reward_function.IndependentSACReward

$min(-e^3, 0)$

citylearn.reward_function.SolarPenaltyReward

$\sum_{i=0}^n - \Big( \Big(1 + \frac{e}{|e|} \times \textrm{storage}_{i}^{\textrm{SoC}} \Big) \times |e| \Big)$

citylearn.reward_function.ComfortReward

$\begin{split}\begin{cases} -|T_{in} - T_{spt}|^3, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|^2, \quad \textrm{if} \ T_{in} < (T_{spt} - T_{b}) \ \textrm{and heating} \\ -|T_{in} - T_{spt}|, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and cooling} \\ 0, \quad \textrm{if} \ (T_{spt} - T_{b}) \le T_{in} < T_{spt} \ \textrm{and heating} \\ 0, \quad \textrm{if} \ T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|, \ \textrm{if} \: T_{spt} \le T_{in} \le (T_{spt} + T_{b}) \ \textrm{and heating} \\ -|T_{in} - T_{spt}|^2, \quad \textrm{if} \ (T_{spt} + T_{b}) < T_{in} \ \textrm{and cooling} \\ -|T_{in} - T_{spt}|^3, \quad \textrm{otherwise} \end{cases}\end{split}$

Where $$e$$ is a building’s net electricity consumption, $$T_{in}$$ is a building’s indoor dry-bulb temperature, $$T_{spt}$$ is a building’s indoor dry-bulb temperature setpoint, $$T_{b}$$ is a building’s indoor dry-bulb temperature setpoint comfort band while $$E$$ is the district’s net electricity consumption. These rewards are defined for a decentralized single building application and for a centralized agent controlling all buildings, the reward will be the sum of the decentralized values.

## How to Point to the Reward Function

The reward function to use in a simulation is defined in the reward_function key-value of the schema:

{
...,
"reward_function": {
"type": "citylearn.reward_function.RewardFunction",
...
},
...
}


## How to Define a Custom Reward Function

CityLearn also allows for custom reward functions by inheriting the base citylearn.reward_function.RewardFunction:

from typing import Any, List, Mapping, Union
from citylearn.reward_function import RewardFunction

class CustomReward(RewardFunction):
"""Calculates custom user-defined multi-agent reward.

Reward is the :py:attr:net_electricity_consumption_emission
for entire district if central agent setup otherwise it is the
:py:attr:net_electricity_consumption_emission each building.

Parameters
----------
General static information about the environment.
"""

def calculate(self, observations: List[Mapping[str, Union[int, float]]]) -> List[float]:
r"""Calculates reward.

Parameters
----------
observations: List[Mapping[str, Union[int, float]]]
List of all building observations at current :py:attr:citylearn.citylearn.CityLearnEnv.
time_step that are got from calling :py:meth:citylearn.building.Building.observations.

Returns
-------
reward: List[float]
Reward for transition to current timestep.
"""

net_electricity_consumption_emission = [o['net_electricity_consumption_emission'] for o in observations]

if self.central_agent:
reward = [-sum(net_electricity_consumption_emission)]
else:
reward = [-v for v in net_electricity_consumption_emission]

return reward


The schema must then be updated to reference the custom reward function:

{
...,
"reward_function": {
"type": "custom_module.CustomReward",
...
},
...
}