In this study, a heat pump satisfies the heating and cooling needs of a building, and two water tanks store heat and cold respectively. Reinforcement learning (RL) is a model-free control approach that can learn from the behaviour of the occupants, weather conditions, and the thermal behaviour of the building in order to make near-optimal decisions. In this work we use of a specific RL technique called batch Q-learning, and integrate it into the urban building energy simulator CitySim. The goal of the controller is to reduce the energy consumption while maintaining adequate comfort temperatures.