When a new player joins an experiment, he is assigned to one of the trial lists associated with this experiment. This document describes the process (known as "balancing") used for this assignment, and explains how an experiment manager can control it.
Initially (summer 2020), the goals of balancing were formulated simply as ensuring that an approximately equal number of players is associated with each trial list. Accordingly, a very simlpe balancing process was used. The system kept track of the number of plaers in each trial list of each experiment plan (SQL table PlayerInfo), and, every time a player joined an experiment, he was put into a trial list that, at the moment, had the smallest number of players associated with it. In practice, that meant that, for example, for 3 trial lists named TL1, TL2, TL3, the player assignment was cyclical: TL1, TL2, TL3, TL1, TL2, TL3, etc.
In December 2020, Gary and Aria reported that despite an equal number of players being assigned to each list (e.g. we had 12 to 13 players registered in each of the 4 lists in the experiment plan pilot04, the number of players who produced a game record usable for the experiment's purposes varied significantly between lists (from the low of 4 to the high of 9). A "usable" record was one that resulted from the games of a player who has completed the experiment (and received a completion code) and satisfied certain additional data quality criteria.
Accordingly, a new balancing scheme was implemented, as outlined below.
Similarly to the original balancing scheme, the new balancing scheme makes decisions dynamically, i.e. based on the data available at the time when a player is registered. The following numbers, for each trial list in the experiment, are used:
The balancing process works as follows: every time a new player joins the experiment, the system computes the estimating number of "usable" players in each trial list as
Let's analyse this algorithm, disregarding the rare possibility that a "quitter" becomes a "completer" by finishing his series of episodes more than 1 hour after the registration, and assuming that there is no defect file.
In the simplest case -- when players join the system at a rate not exceeding 1 per hour -- this scheme will ensure the numbers of completers in different trial list will never differ by more than one. If up to r >>: 1 players per hour join the experiment with n trial lists, then the difference between the number of completers in different trial list won't exceed ca. r/n. (This assumes the worst case when all of the r/n players assigned during an hour to list A completed the required series of games, while none of the r/n players assigned during that hour to list B complete their series).
The defect file serves as a tool for the experiment manager to tell the system: "Even though a certain number of players formally completed their required series of episodes and received a completion code, we don't want to use them in our analysis, and there for they should not be counted among 'completers' during the balancing process." For example, suppose in experiment pilot04 you have a situation described by the following table:
Trial list | Number of players with a completion code | How many players with a completion code are good enough to be included in the analysis | The "defect": players with a completion code, but with unusuable data |
---|---|---|---|
clock_broad_shape | 9 | 9 | 0 |
clock_specific_shape | 9 | 7 | 2 |
counterClock_broad_shape | 7 | 7 | 2 |
counterClock_specific_shape | 4 | 4 | 0 |
In this case, a file named defect.csv should be created in this experiment plan's directory (/opt/tomcat/game-data/trial-lists/pilot04) with the following one line:
clock_specific_shape,2The file can include lines for other trial lists of this experiment plan as well, but the defect values in them should be zeros, e.g.
clock_broad_shape,0
The defect file, of course, can be used to manage the player assignment based on other considerations too. For example, if you want 20 extra players (beyond what the standard balancing process would consider appropriate) to be assigned to trial_list_A, you can simply write
trial_list_A,20Conversely, if you want trial_list_B to have 10 players fewer than the "balanced" numbers would justify (e.g. because you already have 10 records for an identical trial list accumulated a different experiment plan, and plan to add them to your analysis), you can use a negative defect value and write
trial_list_B,-10
If you are working with the Rule Game server's data, you probably can obtain the numbers you want (how many players have been assigned to each trial list, how many of them have received completion code, etc) by looking at CSV files exported from the server and making appropriate calculations. However, you can also see these numbers by directly entering a SQL query, e.g.
use game; select p.trialListId, count(*) from PlayerInfo p where p.experimentPlan='pilot04' and (p.completionCode is not null or TIMESTAMPDIFF(minute, p.date, now())<60) group by p.trialListId; +-----------------------------+----------+ | trialListId | count(*) | +-----------------------------+----------+ | clock_broad_shape | 9 | | clock_specific_shape | 9 | | counterClock_broad_shape | 7 | | counterClock_specific_shape | 4 | +-----------------------------+----------+
The value reported for each trial list here is C+R, i.e. "completers" + "players in progress". You can vary the time cut-off value (60 min the sample query above) to see how R would change if you used a longer or shorter value.
The balancer unit of the game server re-reads the experiment's defect file (if it exists) every time a new player registers; any error messages go to the server log, /opt/tomcat/logs/catalina.out. If you want to see how the balancer would assign a new player, if one were to register right now, you can use the auxiliary script scripts/test-balancing.sh that comes with the application server code distribution. The script takes two arguments:
~vmenkov/w2020/game/scripts/test-balancing.sh 1 default Looking back at hrs=1.0 Plan=default Dec 14, 2020 8:09:25 PM edu.wisc.game.util.Logging info INFO: EM created, flushMode=AUTO Read 2 entries from the defect file C+R-D for (trial_1)=-1 C+R-D for (trial_3)=1 If a player were to register now, it would be assigned to trialList=trial_1
To get a better idea on how players behave, we carried out some measurements on players that had registered in all pilot* experiment plans, as of 2020-12-13.
The numbers below were obtained using SQL scripts sql/timing.sql and sql/episode-length.sql.
(A) How much time it took for "completers" to get from registration to the end of their last episode? The 77 players are divided into groups based on the time rounded up to multiples of 10 min:
+---------+------------------+ | minutes | Completers count | +---------+------------------+ | 10 | 1 | | 20 | 23 | | 30 | 37 | | 40 | 7 | | 50 | 7 | | 60 | 1 | | 70 | 1 | +---------+------------------+We see that 76 "completers" out of 77 achieved completion within 60 min since registration.
(B) How soon after registration did "quitters" end working?
+---------+-----------------+ | minutes | Quitters count | +---------+-----------------+ | 10 | 12 | | 20 | 7 | | 30 | 1 | | 40 | 2 | | 50 | 1 | | 60 | 1 | | 80 | 2 | +---------+-----------------+Out of 26 people, 24 ended their participation within 60 min since registration.
(C) How closely are a player's episodes spaced? For each episode played by the players in these experiment plans, we measured either:
Rounded up to whole minutes, the times are distributed as follows:
+---------+----------------+ | minutes | Episodes count | +---------+----------------+ | 1 | 989 | | 2 | 571 | | 3 | 139 | | 4 | 41 | | 5 | 23 | | 6 | 16 | | 7 | 4 | | 8 | 2 | | 9 | 1 | | 10 | 1 | | 11 | 2 | | 12 | 3 | | 13 | 1 | | 14 | 1 | | 16 | 1 | | 19 | 1 | | 21 | 1 | | 24 | 1 | | 32 | 1 | | 33 | 1 | | 47 | 1 | | 51 | 1 | +---------+----------------+While there are some people who may have taken close to an hour to complete an episode, 99% of all episodes took less than 10 minutes. This indicates that one can discriminate "quitters" vs. "players in progress" somewhat more precisely by looking at the time since last activity vs. time since registration.