This paper describes an RL-based approach to optimize the number of tokens and cost in RAG-based chatbots, achieving cost savings while maintaining accuracy. The results showcase the effectiveness of RL in improving the efficiency of domain-specific query answering chatbots.