Processing human players' transcripts for Mann-Whitney comparison

Updated 2024-03-03, for Game Server 6.026

This document describes the functionality of the script scripts/analyze-transcripts-mwh.sh, which is designed to analyze human players' transcripts in order to compare rule sets with respect to their ease/difficulty for the human players' population. The underlying Java class is edu.wisc.game.tools.MwByHuman

This command-line script accesses the same functionality as the web interface, but also has some additional functionality which allows you to separate the stages of transcript analysis and M-W comparison.

Usage

There are 3 modes:

  1. You can extract the data from the transcripts and the database and use them for the M-W comparison of rule sets right away.
  2. You can perform the extraction and save the data (-export) for later post-processing by your own tools.
  3. You can import (-import) data that you have prepared using your own tools, and use the for the M-W comparison.
These modes are discussed below.

(1) Extracting data from transcripts, and using them for M-W test.

You just need to specify the names of the experiment plans. All transcripts of players assigned to those plans will be used in the analysis; the M-W comparison will include or rule sets (or, more, precisely, "experiences") experienced by these players.

(The script is actually in /home/vmenkov/w2020/game/scripts on sapir; so you can add that directory to your PATH, or just type the full path).

scripts/analyze-transcripts-mwh.sh  [extractionOptions] [MWOptons] [otherOptions] data_selector
where data_selector specifies the set of transcripts to extract, and options specify the mode of processing.

To save space, the script name in the examples below is given without the full path, which in reality you will need to indicate (unless you simply add the scripts directory to your PATH, of course).

Data selection

There are several data selection modes, with the options -plan, -pid, -uid, and -nickname. They work exactly the same as for the original Analyze Transcript tool.

Extraction options

These options control how the raw transcript data are converted into (player,experience) entries.

Other options:

M-W options

These options affect the way the (player,experience) entries are used for the M-W-based ranking of rule sets.

(2) Extracting data from transcripts, and saving them to file.

Use this mode if you want to post-process the mStar data.

scripts/analyze-transcripts-mwh.sh  [extractionOptions] [MWOptions] [otherOptions] -export outputFileName.csv data_selector
  

The transcript data will be extracted in the same way as in (1), but, in addition to performing the M-W comparison, the extracted and processed data will also be saved in a CSV file to the specified. The program will also carry out the same M-W computations as in (1).

Therefore, the format of the command line is almost the same as in (1), with the addition of the -export option.

The output file will be in the following format:

#ruleSetName,precedingRules,exp,trialListId,seriesNo,playerId,learned,total_moves,total_errors,mStar,mDagger
ep/1_1_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_4,0,A016079037TD5GXCNYBPH,false,99,47,300.0,300
ep/col_ord_rbyg_1_4,ep/1_1_color_4m,ep/rule_ambiguity/ambiguity4,ambiguity4_4,1,A016079037TD5GXCNYBPH,false,90,49,300.0,NaN
ep/shape_ord_SqCTSt_1_4,ep/1_1_color_4m;ep/col_ord_rbyg_1_4,ep/rule_ambiguity/ambiguity4,ambiguity4_4,2,A016079037TD5GXCNYBPH,false,98,58,300.0,NaN
ep/1_2_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_6,0,A10G8U9316K46H,false,52,16,300.0,NaN
   ...  ... ...

In the output file, each line (after the header line) corresponds to 1 series, i.e. the series of episodes played by one player against one rule set. The meaning of the columns is as follows:

(3) M-W computations with imported mStar data

Use this mode if you want to carry out the M-W computations with mStar data that you have computed yourself, using other tools. (Those tools, of course, may consist of a simple perl or awk script post-processing the CSV file produced in (2)).

scripts/analyze-transcripts-mwh.sh  [MWOptions] [otherOptions] -import inputFileName.csv   [ -import file2.csv ... ]

Note that while you cannot supply the extraction options in this mode, you can still supply M-W options (namely, -precMode).

The input file can be in the exactly same format as the one used for the output file in (2). (Thus, your post-processing script may, for example, simply remove some data rows from the file, e.g. based on the playerId). However, if you produce the mStar data in a different way, you can choose to omit some columns. The only columns you must keep are ruleSetName and mStar.

All other columns are optional. Specifically, if the precedingRules column is present, its content can be used together with the ruleSetName to qualify the experience (as per the -precMode option). If the columns learned,total_moves,total_errors are present, they will be used for various statistics in the report; if they are absent, the corresponding fields of the report table will be blank or zeros or similarly non-informative.

If you want to compute the M-W matrix based on mDagger instead of mStar (using the -mDagger option), then of course the mDagger column also should be present. In this case the computation will ignore all entries where the value is NaN (or absent).

You can import multiple CSV files; if you do that, the name of each one must be preceeded by its own -import command. (-import a.csv -import -b.csv -import c.csv ...). This is convenient if difference CSV files have different number of columns, so that you cannot just merge them together into a single file with the UNIX cat command.

Examples

1. Full-cycle analysis on the data from several experiment plans:

scripts/analyze-transcripts-mwh.sh ep/rule_ambiguity/ambiguity1 ep/rule_ambiguity/ambiguity2 ep/rule_ambiguity/ambiguity3 ep/rule_ambiguity/ambiguity4
  

2. Take the input (precomputed mStar) from a.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).

analyze-transcripts-mwh.sh -import a.csv -csvOut tmp

3. Take the input (precomputed mStar) from a.csv, b.csv and c.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).

analyze-transcripts-mwh.sh -import a.csv -import b.csv -import c.csv      -csvOut tmp

4. Full-cycle analysis on all plans under "ep/". Save the matrices in CSV files in directory tmp/mstar. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mstar.csv.

analyze-transcripts-mwh.sh -plan 'ep/%' -outCsv tmp/mstar  -export tmp-mstar.csv

4. Full-cycle analysis on all plans under "ep/", using m_dagger instead of m_star for the M-W computation. Save the matrices in CSV files in directory tmp/mdagger. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mdagger.csv.

analyze-transcripts-mwh.sh -plan 'ep/%' -csvOut tmp/mdagger -mDagger -export tmp-mdagger.csv

See also