Processing human players' transcripts for Mann-Whitney comparison

Updated 2024-03-03, for Game Server 6.026

This document describes the functionality of the script scripts/analyze-transcripts-mwh.sh, which is designed to analyze human players' transcripts in order to compare rule sets with respect to their ease/difficulty for the human players' population. The underlying Java class is edu.wisc.game.tools.MwByHuman

This command-line script accesses the same functionality as the web interface, but also has some additional functionality which allows you to separate the stages of transcript analysis and M-W comparison.

Usage

There are 3 modes:

You can extract the data from the transcripts and the database and use them for the M-W comparison of rule sets right away.
You can perform the extraction and save the data (-export) for later post-processing by your own tools.
You can import (-import) data that you have prepared using your own tools, and use the for the M-W comparison.

These modes are discussed below.

(1) Extracting data from transcripts, and using them for M-W test.

You just need to specify the names of the experiment plans. All transcripts of players assigned to those plans will be used in the analysis; the M-W comparison will include or rule sets (or, more, precisely, "experiences") experienced by these players.

(The script is actually in /home/vmenkov/w2020/game/scripts on sapir; so you can add that directory to your PATH, or just type the full path).

scripts/analyze-transcripts-mwh.sh  [extractionOptions] [MWOptons] [otherOptions] data_selector

where data_selector specifies the set of transcripts to extract, and options specify the mode of processing.

To save space, the script name in the examples below is given without the full path, which in reality you will need to indicate (unless you simply add the scripts directory to your PATH, of course).

Data selection

There are several data selection modes, with the options -plan, -pid, -uid, and -nickname. They work exactly the same as for the original Analyze Transcript tool.

Extraction options

These options control how the raw transcript data are converted into (player,experience) entries.

-targetStreak 10 : this is how many consecutive error-free moves the player must make in order to demonstrate successful learning.
-defaultMStar 300 : this value the program will assign as m_* (mStar) to the players who have failed to learn (as per the above criterion) the rule set they were playing. You can use the value Infinity to ensure that all non-learners are distinct from all learners; however, any positive value that's larger than the maximum number of moves in any player's series will work just as well. Because M-W is based on comparisons, 300 will work exactly the same way as Infinity as long as no successful player has put more than 300 move attempts into any rule set.

Other options:

-file : if this option is used, then instead of listing plan names (or player IDs, etc) on the command line, you can put them into a file (one name per line), and put that file name's on the command line. This may be handy if you want to look at data from a large number of plans.

M-W options

These options affect the way the (player,experience) entries are used for the M-W-based ranking of rule sets.

-mDagger: if this option is supplied, the M-W computation will be based on the mDagger values, rather than mStar. Unlike the normal (mStar) mode, players who have not learned any of their rule sets (and therefore have mDagger=NaN) will be ignored in the M-W computation.
-precMode [Naive | Every | Ignore] : This controls how different series are assigned to different "experiences" to be compared. This is how the 3 modes work:
- Naive (the default mode): For each rule sets, only include "naive" players (those who played this rule set as their first rule set). In other word, if a player played 3 rules sets, R1, R2, and R3, in this order, then only his experience of playing R1 is analyzed.
- Every : Consider each (rule set + preceding set) combination as a separate experience to be ranked. (That is, the R1 data from R1:R2:R3, R2:R1:R3, and R2:R3:R1 are viewed as belonging to three distinct experiences, "R1", "R2:R1", and "R2:R3:R1")
- EveryCond : this is similar to the Every mode, but for each preceding rule set the "outcome" (success or failure of the learning in the series) is considered part of the "condition". Thus, in the above example, one would consider distinct experinces "R1", "true.R2:R1", "false.R2:R1", "true.R2:false.R3:R1", etc, where the prefixes "true." and "false." indicate whether the player attained successful learning when playing the preceding rule set such as R2.
- Ignore : When viewing a rule set's series, ignore preceding rule sets. (In other words, the R1 data from R1:R2:R3, R2:R1:R3, and R2:R3:R1 are merged, viewed as the same "R1 experience").
-csvOut directoryName: with this option, the 3 tables produced by the M-W tool (the raw M-W matrix, the M-W ratio matrix, and the ranking table) will be not only printed as human-readable text to the standard output, but also saved into CSV files in the specified directory. If the directory does not presently exist, it will be created. The files will contain the same numbers as you see in the standard output (or would see in the web-based tool, if you were using it); however, they are split into more columns. (E.g. if the human-readable table has "X/Y", the CSV table will have "X,Y").

(2) Extracting data from transcripts, and saving them to file.

Use this mode if you want to post-process the mStar data.

scripts/analyze-transcripts-mwh.sh  [extractionOptions] [MWOptions] [otherOptions] -export outputFileName.csv data_selector

The transcript data will be extracted in the same way as in (1), but, in addition to performing the M-W comparison, the extracted and processed data will also be saved in a CSV file to the specified. The program will also carry out the same M-W computations as in (1).

Therefore, the format of the command line is almost the same as in (1), with the addition of the -export option.

The output file will be in the following format:

#ruleSetName,precedingRules,exp,trialListId,seriesNo,playerId,learned,total_moves,total_errors,mStar,mDagger
ep/1_1_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_4,0,A016079037TD5GXCNYBPH,false,99,47,300.0,300
ep/col_ord_rbyg_1_4,ep/1_1_color_4m,ep/rule_ambiguity/ambiguity4,ambiguity4_4,1,A016079037TD5GXCNYBPH,false,90,49,300.0,NaN
ep/shape_ord_SqCTSt_1_4,ep/1_1_color_4m;ep/col_ord_rbyg_1_4,ep/rule_ambiguity/ambiguity4,ambiguity4_4,2,A016079037TD5GXCNYBPH,false,98,58,300.0,NaN
ep/1_2_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_6,0,A10G8U9316K46H,false,52,16,300.0,NaN
   ...  ... ...

In the output file, each line (after the header line) corresponds to 1 series, i.e. the series of episodes played by one player against one rule set. The meaning of the columns is as follows:

ruleSetName - the rule set being played
precedingRules - the semicolon-separated list of rule sets, if any, this player played before encountering this rule set
exp,trialListId,seriesNo - the experiment plan, and the trial list within that plan, and the sequential number (0-based) of the rule set in the trial list
learned - "true" or "false" depending on whether the player has "demonstrated learning" in that series (i.e. managed to make the prescribed number [targetStreak] of consecutive error free moves)
total_moves - the total number of move and pick attempts in all episodes of the series
total_errors - the total number of failed move and pick attempts in the series
mStar - the number of errors (i.e. failed move and pick attempts) the player has made in this series before he "demonstrated learning", or the defaultMStar value (which can be infinity) if the learning was not demonstrated (or if it took more than defaultMStar errors to achieve learning).
mDagger(P,E) = mStar(P,E) - Avg(mStar(P,*)), where the everaging is over all experiences where player P successfully learned the rule. If P learned no rules at all, then mDagger is reported as NaN for all experiences of that player.

(3) M-W computations with imported mStar data

Use this mode if you want to carry out the M-W computations with mStar data that you have computed yourself, using other tools. (Those tools, of course, may consist of a simple perl or awk script post-processing the CSV file produced in (2)).

scripts/analyze-transcripts-mwh.sh  [MWOptions] [otherOptions] -import inputFileName.csv   [ -import file2.csv ... ]

Note that while you cannot supply the extraction options in this mode, you can still supply M-W options (namely, -precMode).

The input file can be in the exactly same format as the one used for the output file in (2). (Thus, your post-processing script may, for example, simply remove some data rows from the file, e.g. based on the playerId). However, if you produce the mStar data in a different way, you can choose to omit some columns. The only columns you must keep are ruleSetName and mStar.

All other columns are optional. Specifically, if the precedingRules column is present, its content can be used together with the ruleSetName to qualify the experience (as per the -precMode option). If the columns learned,total_moves,total_errors are present, they will be used for various statistics in the report; if they are absent, the corresponding fields of the report table will be blank or zeros or similarly non-informative.

If you want to compute the M-W matrix based on mDagger instead of mStar (using the -mDagger option), then of course the mDagger column also should be present. In this case the computation will ignore all entries where the value is NaN (or absent).

You can import multiple CSV files; if you do that, the name of each one must be preceeded by its own -import command. (-import a.csv -import -b.csv -import c.csv ...). This is convenient if difference CSV files have different number of columns, so that you cannot just merge them together into a single file with the UNIX cat command.

Examples

1. Full-cycle analysis on the data from several experiment plans:

scripts/analyze-transcripts-mwh.sh ep/rule_ambiguity/ambiguity1 ep/rule_ambiguity/ambiguity2 ep/rule_ambiguity/ambiguity3 ep/rule_ambiguity/ambiguity4

2. Take the input (precomputed mStar) from a.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).

analyze-transcripts-mwh.sh -import a.csv -csvOut tmp

3. Take the input (precomputed mStar) from a.csv, b.csv and c.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).

analyze-transcripts-mwh.sh -import a.csv -import b.csv -import c.csv      -csvOut tmp

4. Full-cycle analysis on all plans under "ep/". Save the matrices in CSV files in directory tmp/mstar. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mstar.csv.

analyze-transcripts-mwh.sh -plan 'ep/%' -outCsv tmp/mstar  -export tmp-mstar.csv

4. Full-cycle analysis on all plans under "ep/", using m_dagger instead of m_star for the M-W computation. Save the matrices in CSV files in directory tmp/mdagger. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mdagger.csv.

analyze-transcripts-mwh.sh -plan 'ep/%' -csvOut tmp/mdagger -mDagger -export tmp-mdagger.csv