Pulling data from a remote host

Updated 2024-03-20, for Game Server 6.028

Introduction

Originally, the Rule Game Server analysis tools were designed to be used on the same host (sapir) on which the production Game Server was run. This meant that they were working directly with the data (the MySQL server database, and file store) accumulated by the production server.

In 2024, we have migrated the production server to one of the so-called Plesk hosts -- virual hosts running on the DoIT Shared Hosting hardware. Plesk hosts only give you a "chrooted shell", meaning that you can only run a limited number of shell commands on them. This would make it impractical to run the analysis software on those hosts.

Therefore, we now have to distinguish between the "web server host" (the Plesk host that runs the production Game Server instance and accumulates real data from experiments with a player population) and the "analysis host" (the host under your full control, on which you run the analysis software). This document describes the process whereby you can pull the snapshot of the remote host's (web server host's) data to your local host (the analysis host), so that you can work on it using your analysis tools.

How does this all work?

The data accumulated by a Rule Game Server instances consist of two parts: a MySQL database and a set of CSV files saved directly in the server host's file system. The Rule Game Server's master configuration file (normally sittin in /opt/w2020/w2020.conf) specifies the location of these two parts, i.e.the name of the MySQL database (usually game) and the directory for the CSV files (usually /opt/w2020/saved). The analysis tools, when you run them without the -config option, look up the location of the data in the master configuration file on the host where you run them.

The data pull script pulls both pieces of data from the specified remote server and puts them to their new location on your analysis hosts. The MySQL data go into a new database on your MySQL server; the CSV files go into a new directory created for them under your current directory. The pull script then creates a new config file, in which the location of the newly imported data (the database name and the CSV file directory location) are recorded. You save that file somewhere, and then use it with the -config option of analyze-transcript.sh (and other similar tools) when you want those tools to look at the data in this data set. Naturally, you can also read this config file to find the location of the CSV files, if you'd like to look at them directly.

Step 1: Your analysis host

You can use as your analysis host any host that has a full suite of Game Server software deployed on it. This includes the MySQL database server in which the user account named game has been created, and the Game Server master configuration file (/opt/w2020/w2020.conf). This host also should have an up-to-date set of experiment control files in /opt/w2020/game-data. (You should be able to update it with "git pull origin master", when needed. If this does not go smoothly, you may perhaps need to change the ownership of this directory, to something like pkantor.tomcat, or what have you).

The analysis host can be your own desktop or laptop computer (preferably, running Linux or MacOS), or the host we have at CAE (ie-r108304.ie.wisc.edu). If it's the CAE host, you need to run the CAE VPN to access it.

In the rest of this discussion, we will refer to the analysis host (to which you will copy the data) as the "local host", and to the web server host (from which you will copy the data) as the "remote host".

Step 2 (one-time): set up database accounts.

(This has already been done on the CAE host. If you're setting up some other host as the analysis host, you need to do that yourself.)

The host should have the MySQL database server set up, and the account named "game" created, as per the MySQL setup instructions. Additionally, you need to create a MySQL server user account named replicator with certain special rights (which will allow it to create new databases, and to enable user game to work with those databases).

CREATE USER 'replicator'@'localhost' IDENTIFIED BY 'MySQL-W2020';
GRANT ALL ON *.* TO 'replicator'@'localhost';
GRANT GRANT OPTION ON *.* TO 'replicator'@'localhost';

Step 3 (one-time): arrange for password-free login to MySQL server for the `game` and `replicator` accounts.

During the data pull process, several database server logins, of various kinds, will take place, as the replicator tool will need to connect both to the remote and local database servers. The pull script is written in such a way that you won't need to enter any of the relevant database passwords in real time. Instead, you need to create, just once, the file ~/.mylogin.cnf in your home directory. This file will contain, in (sort of) encrypted form, the passwords for certain database accounts. If you are on the CAE host, you can simply copy this file from my home directory:

      cd
      cp ~vmenkov/.mylogin.cnf .

If you are setting up the analysis host on another machine, you'll need to create that file from scratch, using the script scripts/run-mysql-config-editor.sh that comes with the Game Server, which runs, with appropriate parameters, the tool called mysql-config-editor

During the execution of the script scripts/run-mysql-config-editor.sh you will be asked for several database passwords. That will include the passwords for the accounts game and replicator on your own MySQL server (you should know them, because you likely have created them yourself), as well as the passowrds for the accounts named game on the the MySQL servers on the two Plesk hosts at UW (you can get them from the project staff who works with these hosts).

Whether you have copied the existing cnf file, or created it from scratch, you can test that it works by trying commands like this:

mysql --login-path=local
mysql --login-path=replicator
mysql --login-path=wwwtest.rulegame

and observing a successful login, without a password, each time.

Step 4 (optional, one time): Password-free login to the remote host

During the data pull process, the script will log in, more than once, to the remote server. Each time it (well, the ssh it uses() will ask you for the password. This is not a big deal; but if you are tired of this, you can obviate the need to provide the password by using ssh-keygen, ssh-copy-id, and ssh-agent. You can find the instructions e.g. at these two pages: https://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/ ; https://superuser.com/questions/988185/how-to-avoid-being-asked-enter-passphrase-for-key-when-im-doing-ssh-operatio

Step 5: Pulling the data!

The pull script will do main big things:

It will copy the entire content of the /opt/w2020/saved directory tree from the remote host, and will install them somewhere on your local host
It will export the entire content of the game database from the remote host, and will replicate it by creating a new database, with a new name, in the MySQL server on the local host.

It will then create a configuration file which you will in the future be giving to the analysis tools, so that they will use these imported data instead of the data that the Game Server on the local host may have in its own database.

Before running the script, choose the location where it will store the downloaded data file. Let's suppose you want them to go under ~/pulls. In this case, run the pull script in that directory, and it will create a subdirectory in it for the data in the particular snapshot.

	cd
	cd pulls
	~vmenkov/game/scripts/pull-remote-data.sh wwwtest.rulegame

During this process you will most likely be asked to enter, more than once, the UNIX password for the remote host in question.

The script will tell you that it has created a config file, e.g. w2020_game_wwwtest_rulegame_2024_01_30.conf, which describes the location of the downloaded data (the file directory location, and the database name). When running analysis scripts on these data later, make sure to pass the name of the config file to the analysis script, with the -config option.

Additionally, the script also creates a single ZIP file which contains all parts of the data set (the CSV files plus a SQL file with the content of the database), along with the configuration file. You may choose to store that file on your backup disk (as the complete snapshot of the data set on the particular server as of the current date); it can also be copied to another host if you want to create yet another copy of that snapshot there.

Juggling three hosts: what if you need data from a Plesk host, but only have a CAE id, and not a UW netid?

What if you want to pull a snapshot of data from a Plesk host, but you don't have a UW netid, which means that you cannot run the UW VPN? If you at least have a CAE id, you can use a more cumbersome two-step process, to move the data via our CAE host (ie-r108304.ie.wisc.edu). The process is as follows:

Using your CAE id, run CAE VPN.
Use ssh to connect to the CAE host.
Use the scriptpull-remote-data.sh, as described in this document, to pull a snapshot of data from the Plesk host of your choice to the CAE host. Pay attention to the message printed by the script at the end of its run; it will indicate that it has produced a ZIP file with a name such as pull-remote-data.sh. You may want to verify (with unzip -l, or some GUI ZIP tool on your laptop) that the ZIP contains everything you need -- an opt/saved subdirectory with all CSV files, a SQL file such as dump-game.sql with all the data from the MySQL database, and a config file, e.g. w2020_game_rulegame_2024_02_01.conf.orig which includes the name of the database.
Download the above-mentioned ZIP file to your local host (which you plan to use as the analysis host), and place it in the directory where you want the downloaded data to be housed.
Run the script import-from-zip.sh (which should be in your ~/w2020/game/scripts , if you have pulled the recently updated the Rule Game source code on your machine from GitHub; available since ver 6.030) on your local host, in the directory where the ZIP file has been downloaded, e.g.
```
	      import-from-zip.sh  download-2024_02_01_193059.zip
	    
```
That should create a new MySQL database and a config file used to access it. Read the message printed by the script to see the name of the config file.