Demand response allows consumers to reduce their electrical consumption during periods of peak energy use. This reduces the peaks of electrical demand, and, consequently, the wholesale electricity prices. However, buildings must coordinate with each other to avoid delaying their electricity consumption simultaneously, which would create new, delayed peaks of electrical demand. In this work, we examine this coordination using batch reinforcement learning (BRL). BRL does not require a model, and allows the buildings to adapt over time to the optimal behavior. We implemented our controller in CitySim, a building simulator, using TensorFlow, a machine learning library.