Deep Reinforcement Learning for Competitive Agents in MicroRTS - Architecture, Training, and Tournament Evaluation

Mathis Delsart

Author: Mathis Delsart

Type: Master's thesis

Programme: Master [120] in Computer Science and Engineering

Institution: UCLouvain, École polytechnique de Louvain

Academic year: 2025–2026

Supervisor: Eric Piette

Readers: Quentin Cappart, Achille Morenville, and Benoît Ronval

Full text: Download thesis (PDF)

Abstract

Real-time strategy (RTS) games are among the most demanding benchmarks for sequential decision-making: players gather resources, coordinate many units, and plan over long horizons, in real time and within a combinatorial action space. AlphaStar reached Grandmaster level in StarCraft II, but at the cost of hundreds of accelerators running for weeks, beyond academic reach. MicroRTS distills these difficulties onto small grid maps while keeping training tractable on a modest budget, making it the reference academic testbed and the subject of an annual competition since 2017.

This master's thesis investigates deep reinforcement learning (DRL) for MicroRTS, guided by two questions: which architectural and algorithmic design decisions most improve a MicroRTS agent, and whether one competitive with the strongest prior competition entries can be trained within an academic compute budget. Taking RAISocketAI, the first DRL agent to win the competition, as reference and starting from the Gym-µRTS GridNet baseline, every design decision is evaluated in isolation before combining the best ones. The work contributes (i) a reproducible Conference on Games (CoG)-style tournament framework over twelve maps and fifteen reference agents under five ranking metrics; (ii) an extended, modular Java–Python environment stack with composable wrappers and vectorized self-play; (iii) the UECD architecture, fusing multi-scale convolution, entity-level Transformer reasoning, and bottleneck self-attention to cover an RTS network's local, relational, and global demands; (iv) a modular PPO pipeline whose mechanisms are ablated individually; and (v) a formal analysis of a discount-induced reward collapse under shaped-to-sparse annealing.

The resulting agent, UECD-Best, combines these under a two-phase opponent-curriculum fine-tuning schedule. On the basesWorkers16x16A map, it tops a 19-agent round-robin tournament (96.67% win rate, first on four of the five metrics) and wins 65.7% of its head-to-head games against RAISocketAI, using 9.47 GPU-days and approximately 350 million steps, below the approximately 23.6 GPU-days and 500 million steps RAISocketAI reports for its small-map subset. A second agent, UECD-MultiMap, trained across five layouts of three different sizes, spreads its competence evenly with no per-map collapse, showing that the padded environment and a prioritized-level-replay curriculum make cross-layout training feasible, though it does not yet match the single-map specialist's peak.

The open-source pipeline released with this thesis offers a DRL substrate for future generalist agents and hybrid DRL/LLM systems, as the competition shifts toward language-model-based agents.

Keywords

Deep Reinforcement Learning; MicroRTS; Real-Time Strategy Games; Proximal Policy Optimization; Transformer; Game-Theoretic Evaluation; Reward Shaping.

Suggested citation

Delsart, M. (2026). Deep Reinforcement Learning for Competitive Agents in MicroRTS - Architecture, Training, and Tournament Evaluation. Master's thesis, Université catholique de Louvain (UCLouvain).