StreamBench: Continuous Improvement of Language Agents Authors: Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Chen, Hung-yi Lee
Evaluate the continuous improvement capabilities of Large Language Models agents with StreamBench. This benchmark simulates an online learning environment where agents receive feedback streams to iteratively enhance their performance. Effective baselines and critical components are identified for successful streaming strategies, laying the foundation for adaptive AI systems in streaming scenarios.