Authors:
Dennis J. N. J. Soemers, Eric Piette, Matthew Stephenson, Cameron Browne

Venue:
IEEE Conference on Games (CoG), 2020

Topics:
reinforcement learning, self-play, expert iteration, Monte Carlo Tree Search, game AI

Links: PDF · arXiv

Abstract

Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play, in which a policy is trained to mimic the behaviour of a search algorithm and iteratively improves it.

This paper investigates three approaches for manipulating the distribution of data collected during self-play and the sampling procedures used for training: weighting experience based on episode durations, applying prioritized experience replay, and introducing an exploratory policy to diversify trajectories.

Experimental results across fourteen board games show that these modifications can significantly improve early training performance, with more modest gains overall.

Context

This work extends the Expert Iteration framework, a key paradigm in modern game AI combining reinforcement learning and tree search.

It contributes to understanding how the distribution of experience in self-play affects learning efficiency, a critical question for general game playing and reinforcement learning systems.

The paper is closely connected to the development of learning techniques within the Ludii general game system and broader research on general and human-like AI agents.

Full reference

Soemers, D. J. N. J., Piette, E., Stephenson, M., Browne, C. (2020). Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration. IEEE Conference on Games (CoG).

BibTeX

@inproceedings{soemers2020manipulating,
  author    = {Soemers, Dennis J. N. J. and Piette, Eric and Stephenson, Matthew and Browne, Cameron},
  title     = {Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration},
  booktitle = {IEEE Conference on Games (CoG)},
  year      = {2020},
  url       = {https://arxiv.org/abs/2006.00283}
}