Researchers recently introduced FoFo, a benchmark to evaluate large language models’ (LLMs) format-following capabilities. This benchmark presents a variety of real-world scenarios to test models’ adherence to domain-specific formats. The key findings are:
FoFo marks an important step in selecting AI agents for specialized tasks. The study underlines the potential and necessity of developing LLMs with strong format-following skills.