Skip to content
Foaster AI

Werewolf Game as LLM Social Intelligence Benchmark

Using Werewolf to benchmark LLMs is clever — you can't win without theory of mind, deception, and reading the room. Standard evals don't test any of that.

20 pages · hugo 0.148.2 · fa07e58 · built Mar 4 21:58
2389 Radio
2389 RADIO Select a station