Skip to main content
Open Source

Tactical AI
Benchmark

Evaluate LLMs and train RL agents on turn-based tactical combat. Compare models head-to-head in competitive tournaments.

terminal
$ pip install reinforcetactics $ python -m reinforcetactics tournament \ --agents gpt-4o claude-sonnet gemini-pro \ --rounds 50 \ --map crossroads Tournament started: 3 agents, 50 rounds Round 50/50 complete ───────────────────────────── #1 claude-sonnet ELO 1847 #2 gpt-4o ELO 1723 #3 gemini-pro ELO 1630

LLM Evaluation

Benchmark GPT, Claude, Gemini, and custom models on strategic reasoning, spatial awareness, and multi-step planning.

RL Training

Full Gymnasium environment with multi-discrete action space, configurable reward shaping, and headless mode for fast training.

Tournaments

Automated round-robin tournaments with ELO ratings, replay recording, and detailed performance analytics.

Diverse Tactical Maps

14 hand-crafted maps across 1v1, 1v1v1, and 2v2 formats with varied terrain, chokepoints, and strategic objectives.

CrossroadsCrossroads
Island FortressIsland Fortress
Tower RushTower Rush
Center MountainsCenter Mountains

8 Unique Unit Types

Each unit has distinct stats, abilities, and roles — creating a rich decision space for AI agents to master.

Warrior
WarriorFrontline Fighter
Mage
MageArcane Striker
Knight
KnightHeavy Cavalry
Archer
ArcherRanged Specialist
Rogue
RogueStealth Assassin
Cleric
ClericSupport Healer

How It Works

1

Install

Install via pip with optional GPU, GUI, and LLM extras. Works on Python 3.10+.

pip install reinforcetactics[llm]
2

Configure

Pick your agents — LLM bots, RL models, rule-based bots, or your own custom agent.

--agents gpt-4o claude-sonnet
3

Compete

Run tournaments, compare ELO ratings, analyze replays, and iterate on your models.

python -m reinforcetactics tournament

Built for AI Research

Gymnasium Compatible

Standard RL interface with observation and action spaces, reward shaping, and episode management.

Multi-Agent Support

PettingZoo integration for multi-agent RL. Train cooperative and competitive policies.

Replay & Analysis

Record battles, export to video, and analyze decision patterns for model interpretability.

Extensible Architecture

Add custom units, maps, reward functions, and AI agents with a clean Python API.

Multiple AI Backends

OpenAI, Anthropic, and Google Gemini SDKs built-in. Plug in any LLM via API.

Docker Tournaments

Containerized tournament runner for reproducible benchmarks at scale.

Ready to benchmark your AI?

Open source and ready for research. Clone the repo and run your first tournament in minutes.