11While it is not the main point here, we encounter what is called the “exploration-exploitation trade-off”, which means the agent cannot very well simultaneously both gather new information and use previously acquired information to obtain reward. To put it simply, when the agent is randomly exploring, it is unlikely to get a lot of reward since it is not even trying.