Determinism in League of Legends: Introduction
Hi, I’m Rick Hoskinson, an engineer on the Deterministic Disaster Recovery team, and I’m here to talk about how we gave ourselves the power to turn back the hands of time in League of Legends. In this series of blog posts, I hope to give you a glimpse of what that work looked like, juicy technical challenges and all. In this first post, I’ll introduce the problem and how we chose to solve it.
Posts in this series:
Part 1: Deterministic Data Recovery (this article)
Part 4: Fixing Divergences: A Case Study in Finding and Resolving Determinism Failures
Project Chronobreak is an esports feature that allows esports officials to “rewind” a live game to a specific point in time. This functionality was developed to deal with the occasional software bug or event hiccup that might otherwise force a complete remake of a game from the very beginning. We implemented the feature by making the LoL game server deterministic so we could re-play a recorded game and restore the server to the exact state it was in at an earlier time.
The Origins of Server Determinism
It may come as a surprise that determinism in League of Legends was not inspired by Project Chronobreak. Rather, it was inspired by a desire to create fast, repeatable tests driven from a set of recordable inputs. This tool was dubbed Delta Checker, and it required a level of client-server determinism to operate reliably.
From Delta Checker to Chronobreak
Around April of 2016, an interesting opportunity emerged as we socialized the work-in-progress deterministic server technology. At that time, the League of Legends esports Features initiative had been investigating the possibility of recovering from an on-stage disaster should a rare bug force a remake of a live game. Remaking a game of League isn’t a good experience for anyone; it’s like re-playing a game of baseball from the first inning after a rain delay. These remakes are frustrating for esports players and fans, and finding a solution became a high priority for Riot.
Before we considered determinism, the esports team had already explored a variety of options for remakes, including save-state snapshots of game memory, using something like the Practice Tool to re-create the state of the game before the bug, and other even more radical solutions involving virtual machines playing the game several minutes behind the live match. Each of these met with a range of problems, and they all failed to give esports officials the precise tools they needed to recover to an exact time within the game.
Delta Checker’s determinism functionality would lead us to combine engineers from multiple teams into a new team called Deterministic Disaster Recovery, the internal working name for Project Chronobreak. The feature work would require:
Creation of the recording technology, systemic changes, and validation technology to make the server deterministic
Realization of server determinism through a pipelined test-and-fix methodology
Creation of tools usable by esports officials to remake games
We were able to deliver our minimum viable product in December of 2016 - just in time for the 2017 Spring Split. In the second week of LCS, Project Chronobreak was used on-stage for the first time in game two of C9 vs. Fly Quest when a bug caused a cannon minion to block Altec’s Miss Fortune ultimate.
In May of 2017, we finished up version 2.0, which featured more robust, GUI-based tools and a comprehensive automated test system to ensure that the feature works for years to come.
Anatomy of a Chronobreak
Before officials can even start to use Chronobreak, we first have to configure the esports game servers to record each game. These recordings (which we call Server Network Recordings or SNRs) ensure that every game played on eSports game servers has a complete record of the inputs, match settings, and configurations used to play the game.
We can then use these recordings to play back the game server to a point in time chosen by esports officials. These playbacks can run very quickly, as we’re not throttling the server to its nominal refresh rate. For example, a recovery to 40 minutes often takes less than 3 minutes to execute.
We “commit” to the Chronobreak by killing the bugged-out server instance using a special command that also disconnects all of the real players and broadcast spectators. Players may then reconnect to the new server process through the League client, just like you would if you were dropped from a normal game.
The complete workflow is as follows:
We haven’t really plumbed too deeply into the technical details of the project just yet, but stay tuned for the next posts in this series, where I’ll cover how we transformed the League of Legends codebase to be deterministic. We’ll also talk about at least one major game engine system that received an overhaul as a result of this effort.
I look forward to any questions you might have!
For more information, check out the rest of this series: