This document describes the functionality of the script scripts/analyze-transcripts-mwh.sh, which is designed to analyze human players' transcripts in order to compare rule sets with respect to their ease/difficulty for the human players' population. The underlying Java class is edu.wisc.game.tools.MwByHuman
This command-line script accesses the same functionality as the web interface, but also has some additional functionality which allows you to separate the stages of transcript analysis and M-W comparison.
There are 3 modes:
(The script is actually in /home/vmenkov/w2020/game/scripts on sapir; so you can add that directory to your PATH, or just type the full path).
scripts/analyze-transcripts-mwh.sh [extractionOptions] [MWOptons] [otherOptions] data_selectorwhere data_selector specifies the set of transcripts to extract, and options specify the mode of processing. To save space, the script name in the examples below is given without the full path, which in reality you will need to indicate (unless you simply add the scripts directory to your PATH, of course).
There are several data selection modes, with the options -plan, -pid, -uid, and -nickname. They work exactly the same as for the original Analyze Transcript tool.
These options control how the raw transcript data are converted into (player,experience) entries.
Other options:
These options affect the way the (player,experience) entries are used for the M-W-based ranking of rule sets.
Use this mode if you want to post-process the mStar data.
scripts/analyze-transcripts-mwh.sh [extractionOptions] [MWOptions] [otherOptions] -export outputFileName.csv data_selector
The transcript data will be extracted in the same way as in (1), but, in addition to performing the M-W comparison, the extracted and processed data will also be saved in a CSV file to the specified. The program will also carry out the same M-W computations as in (1).
Therefore, the format of the command line is almost the same as in (1), with the addition of the -export option.
The output file will be in the following format:
#ruleSetName,precedingRules,exp,trialListId,seriesNo,playerId,learned,total_moves,total_errors,mStar,mDagger ep/1_1_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_4,0,A016079037TD5GXCNYBPH,false,99,47,300.0,300 ep/col_ord_rbyg_1_4,ep/1_1_color_4m,ep/rule_ambiguity/ambiguity4,ambiguity4_4,1,A016079037TD5GXCNYBPH,false,90,49,300.0,NaN ep/shape_ord_SqCTSt_1_4,ep/1_1_color_4m;ep/col_ord_rbyg_1_4,ep/rule_ambiguity/ambiguity4,ambiguity4_4,2,A016079037TD5GXCNYBPH,false,98,58,300.0,NaN ep/1_2_color_4m,,ep/rule_ambiguity/ambiguity4,ambiguity4_6,0,A10G8U9316K46H,false,52,16,300.0,NaN ... ... ...
In the output file, each line (after the header line) corresponds to 1 series, i.e. the series of episodes played by one player against one rule set. The meaning of the columns is as follows:
Use this mode if you want to carry out the M-W computations with mStar data that you have computed yourself, using other tools. (Those tools, of course, may consist of a simple perl or awk script post-processing the CSV file produced in (2)).
scripts/analyze-transcripts-mwh.sh [MWOptions] [otherOptions] -import inputFileName.csv [ -import file2.csv ... ]
Note that while you cannot supply the extraction options in this mode, you can still supply M-W options (namely, -precMode).
The input file can be in the exactly same format as the one used for the output file in (2). (Thus, your post-processing script may, for example, simply remove some data rows from the file, e.g. based on the playerId). However, if you produce the mStar data in a different way, you can choose to omit some columns. The only columns you must keep are ruleSetName and mStar.
All other columns are optional. Specifically, if the precedingRules column is present, its content can be used together with the ruleSetName to qualify the experience (as per the -precMode option). If the columns learned,total_moves,total_errors are present, they will be used for various statistics in the report; if they are absent, the corresponding fields of the report table will be blank or zeros or similarly non-informative.
If you want to compute the M-W matrix based on mDagger instead of mStar (using the -mDagger option), then of course the mDagger column also should be present. In this case the computation will ignore all entries where the value is NaN (or absent).
You can import multiple CSV files; if you do that, the name of each one must be preceeded by its own -import command. (-import a.csv -import -b.csv -import c.csv ...). This is convenient if difference CSV files have different number of columns, so that you cannot just merge them together into a single file with the UNIX cat command.
1. Full-cycle analysis on the data from several experiment plans:
scripts/analyze-transcripts-mwh.sh ep/rule_ambiguity/ambiguity1 ep/rule_ambiguity/ambiguity2 ep/rule_ambiguity/ambiguity3 ep/rule_ambiguity/ambiguity4
2. Take the input (precomputed mStar) from a.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).
analyze-transcripts-mwh.sh -import a.csv -csvOut tmp
3. Take the input (precomputed mStar) from a.csv, b.csv and c.csv. Save the M-W results to files in directory tmp (which will be automatically created if it does not exist).
analyze-transcripts-mwh.sh -import a.csv -import b.csv -import c.csv -csvOut tmp
4. Full-cycle analysis on all plans under "ep/". Save the matrices in CSV files in directory tmp/mstar. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mstar.csv.
analyze-transcripts-mwh.sh -plan 'ep/%' -outCsv tmp/mstar -export tmp-mstar.csv
4. Full-cycle analysis on all plans under "ep/", using m_dagger instead of m_star for the M-W computation. Save the matrices in CSV files in directory tmp/mdagger. Aditionally, save the output of the first stage (the data for all (P,E) pairs) to tmp-mdagger.csv.
analyze-transcripts-mwh.sh -plan 'ep/%' -csvOut tmp/mdagger -mDagger -export tmp-mdagger.csv