Evaluating LLM Agents in the Game of Avalon

AI Agent 150

LLMs

Games

Avalon

Social Deduction

Benchmarking

Evaluating LLM Agents in the Game of Avalon

Social deduction games are excellent for analyzing decision-making and linguistic abilities in AI. ‘AvalonBench: Evaluating LLMs Playing the Game of Avalon’ introduces a specialized game environment to evaluate LLM Agents’ performance in The Resistance: Avalon, where players need to be adept at deception and negotiation. The paper’s introducing AvalonBench encompasses a new environment, baseline bots, and ReAct-style LLM agents with customized prompts for every game role. Noteworthy outcomes include:

ChatGPT’s good-role agents achieving a 22.2% win rate against evil-role bots.
A comprehensive setup for studying multi-agent dynamics in strategic play.
AvalonBench as a promising tool for advancing LLMs through self-play methodologies.

This study serves as an exciting development in the quest for advanced LLMs and multi-agent frameworks, tackling the complexities present in human-like strategic environments like the Avalon game.

Personalized AI news from scientific papers.