forked from Hanabi/hanabi-league
Maximilian Keßler
4b760e5f84
Whenever a variants gains/loses points now, we ensure that the sum of all variant ratings of that type stays constant by adjusting all other variants in the opposite direction.
354 lines
18 KiB
SQL
354 lines
18 KiB
SQL
/**
|
|
* Some general thoughts first regarding how I think we should design the database:
|
|
*
|
|
* There should be a semantic of what it means for our data to be 'consistent':
|
|
* If it is consistent, our program interacting with the database should be well-behaved, i.e. never crash or lead to weird behaviour.
|
|
* From any consistent state, our program should be able to bring the database up to the most recent state automatically, i.e.
|
|
* - Update games
|
|
* - Calculate ELO
|
|
* - Calculate statistics
|
|
* without us having to fix anything in the database.
|
|
*
|
|
* Conversely, our program should be designed in such a way that every database transaction that we do will leave the database in a consistent state.
|
|
* This ensures that (as long as there is no programming error), we can start/stop/run the program at any time, always leaving the database in a
|
|
* consistent state.
|
|
* In particular, in order to be able to have reasonably small transactions, the following all should be valid transactions:
|
|
* - Adding just the metadata (who, score, when, but not detailed actions or deck) of some game, without processing any of its statistics or elo changes
|
|
* - Adding detailed data of a game (actions, deck)
|
|
* - Calculating any subset of statistics for a game (BDR, moves, loss reasons (like: 'Discard crit', 'Bomb crit', 'Strikeout' or similar))
|
|
* - Adding any set of players to the database
|
|
* - Processing a single game for ELO change
|
|
*
|
|
* Conversely, the database should be designed in such a way that all of the above updates lead to consistent states. In particular, I'd like it to be ok if
|
|
* - Games are stored in the database whose statistics or ELO have not been processed yet
|
|
* - Game metadata is stored in the database, but their actions/deck is not
|
|
* - New players are part of the database, but we did not check for games involving them
|
|
*
|
|
* Also, I think that there should be *no* state-change in any other files except the database when running the program:
|
|
* That is, for a full replica of the server state, the source code and the current database state should be enough.
|
|
* This means no json/csv files or whatever that we both write and read to.
|
|
* (Input config files as part of the repository are ouf course fine, as well as output files like log)
|
|
*
|
|
* One thing I felt like was a bit difficult to handle during season zero is people registering after they played some games.
|
|
* At least I think this caused you quite some effort, especially since it's then not so easy to get their games into the database.
|
|
* I recall that you just stored a single game id which represented the latest one that your program already processed.
|
|
* To account for late sign-ups, I suggest the following:
|
|
* - For each registered league user, store the latest game id that we processed (=downloaded, not necessarily calculated ELO or anything) separately.
|
|
* This ensures that for new users, we will fetch all their games on the first database run, whereas for already-registered ones, we do not.
|
|
* - When checking for new games, we can now send these ids to the hanab.live site (there is an API parameter to request only games with id >= x, I think you did not use it last time)
|
|
*
|
|
* Now there's a slight problem, since calculating ELO depends on the order of games. In case we have late sign-ups, I think it's not so nice if they retroactively
|
|
* affect ELO. In particular, this could lead to ELO changes in players X that did not even play a game with new player N,
|
|
* if N and Y played some game and after that, Y and X did, meaning that Y's elo when playing with X changes retroactively, thus also affecting X.
|
|
*
|
|
* Therefore, I suggest that additionally, each game that we download from hanab.live gets a 'league id', which is just a serial number and represents
|
|
* the ordering of games in which we process them.
|
|
* So in case of late sign-ups we would just process these games a bit later. In particular, in case of no late sign-ups, league ids and game ids from hanab.live
|
|
* should have the same ordering, but this might change when we have late sign-ups.
|
|
* I'd like to store this, because in case of bugs or anything, we would then be able to re-calculate all ELO deterministically with this data.
|
|
* Also, it's nice to be transparent about this, when we do have some game logs on the website or anything, we should order by this league id, since this is the
|
|
* order that we used for calculating the ELO.
|
|
* Let me know if you think this is too complicated, but to me personally, I'd quite like that, since it just means
|
|
* (together with the above mentioned solution of storing latest download ids on a per-user basis) that late sign-ups are handled well automatically.
|
|
*
|
|
*/
|
|
|
|
|
|
/**
|
|
* To store the league participants, I suggest the following:
|
|
* - Represent each participant by internal user id
|
|
* - Store 'display name' (what shows up in the tables etc) for each participant.
|
|
* This will only be relevant for output, not any of the algorithms, so we could easily change this at any time.
|
|
* - Store the account names (from hanab.live) separately for each account, thereby allowing for multiple ones having the same user_id
|
|
* I'm not sure how much we need this, but I feel like it's not too hard to have this extra flexibility in our data structure, so just in case
|
|
* I would support this.
|
|
* To be honest, I'd also like to apply all of the code to some non-league related games where I'd need this :D.
|
|
*/
|
|
|
|
DROP TABLE IF EXISTS users CASCADE;
|
|
CREATE TABLE users (
|
|
id SERIAL PRIMARY KEY,
|
|
player_name TEXT NOT NULL UNIQUE,
|
|
/* Can be null */
|
|
discord_tag TEXT UNIQUE
|
|
);
|
|
|
|
|
|
DROP TABLE IF EXISTS user_accounts CASCADE;
|
|
CREATE TABLE user_accounts (
|
|
/* The username from hanab.live. This will be used a) for associating games to one of our user_ids and b) show up in the player profile. */
|
|
username TEXT PRIMARY KEY,
|
|
/**
|
|
* Hanab.live stores a normalized username for each account to ensure that usernames are case-insensitive and don't have weird unicodes that resemble each other.
|
|
* It uses go-unidecode to generate these from the actual usernames, and I suggest we do the same.
|
|
* The corresponding python package is 'unidecode', so we should just use that here.
|
|
* This will then ensure that no two sign-ups correspond to the same hanab.live account.
|
|
*/
|
|
normalized_username TEXT NOT NULL UNIQUE,
|
|
user_id INTEGER NOT NULL,
|
|
FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE
|
|
);
|
|
|
|
/*
|
|
* The latest game id that we already downloaded for a specific user. Note that this can never decrease, since on hanab.live, game ids are only incrementing.
|
|
* In my opinion, it should be a consistent state to *not* have an entry in this table for a user.
|
|
* In that case, when querying, we would of course assume nothing and fetch all the data from hanab.live (from the start of the league)
|
|
* and place an entry here after that.
|
|
* Also I suggest that the latest_game_id actually corresponds to a game that we have stored in the database,
|
|
* this helps with further sanity checks or if we want to know if statistics are up to date.
|
|
*/
|
|
DROP TABLE IF EXISTS downloads;
|
|
CREATE TABLE downloads (
|
|
/** Notice this has to be a hanab.live username, not a user_id */
|
|
normalized_username TEXT PRIMARY KEY REFERENCES user_accounts (normalized_username),
|
|
latest_game_id INTEGER NOT NULL
|
|
);
|
|
|
|
/**
|
|
* For completeness, I suggest we store the hanab.live variants here together with their names.
|
|
* So this will just be a static table that we populate once.
|
|
*/
|
|
DROP TABLE IF EXISTS variants CASCADE;
|
|
CREATE TABLE variants (
|
|
id INTEGER PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
num_suits INTEGER NOT NULL,
|
|
clue_starved BOOLEAN NOT NULL,
|
|
rating_type INTEGER NOT NULL
|
|
);
|
|
|
|
/**
|
|
* We should store data on all games locally for two reasons:
|
|
* - Only download detailed data for each game once (The actions taken and the actual deck)
|
|
* - We can evaluate statistics (also complicated ones) whenever we want, independent of the server
|
|
*/
|
|
DROP TABLE IF EXISTS games CASCADE;
|
|
CREATE TABLE games (
|
|
/** Here, we should use the same ids as the game ids from hanab.live */
|
|
id INTEGER PRIMARY KEY,
|
|
num_players SMALLINT NOT NULL,
|
|
/** This should be the same variant id as on hanab.live, even if we don't use all of them. */
|
|
variant_id SMALLINT NOT NULL,
|
|
/** Same here: seed from hanab.live */
|
|
seed TEXT NOT NULL,
|
|
score SMALLINT NOT NULL,
|
|
num_turns SMALLINT NOT NULL,
|
|
/**
|
|
* This is the league id mentioned above that will represent the ordering of games regarding being processed by ELO.
|
|
* Note that this means when fetching new data from hanab.live, we have to fetch *all* of it and insert the games sorted
|
|
* by their ids. It's not technically needed for the ELO system to function, but it's definitely desirable to process them in order, of course.
|
|
*/
|
|
league_id SERIAL NOT NULL UNIQUE,
|
|
FOREIGN KEY (variant_id) REFERENCES variants (id)
|
|
);
|
|
|
|
DROP TABLE IF EXISTS game_participants CASCADE;
|
|
CREATE TABLE game_participants (
|
|
game_id INTEGER NOT NULL,
|
|
/**
|
|
* This is our internal user id for league accounts. In particular, if a league account is associated to multiple hanab.live accounts,
|
|
* we will lose the distinction between them when storing participants in this way (which I think is totally ok).
|
|
* Also, this makes insertion slightly tricky: You always have to look up these ids first before being able to insert a game.
|
|
* There could be one argument that we should store the hanab.live username here: This would enable us to list games (if we were to do such a stats page)
|
|
* with the hanab.live account names instead of the 'display names' that people used for league sign-up.
|
|
* Let me know what you think.
|
|
*/
|
|
user_id INTEGER NOT NULL,
|
|
/**
|
|
* Hanab.live stores this as well, I don't think that we need it currently, but there's no harm done in storing this, since it would enable us to know
|
|
* which player is which if we were to do per-player statistics also inside an individual game.
|
|
* For example, I could imagine stuff like: 'Clues given' or 'Cards played', which might actually vary across players.
|
|
* Unfortunately, when using the 'history-full' api hook, player names are reported alphabetically, so to retrieve
|
|
* the seat order, we have to use the 'export' endpoint of the api, which requires an extra request for each game.
|
|
* This is why we allow for this entry to be null in general, so that we can easily store the games without the seat
|
|
* being known.
|
|
*/
|
|
seat SMALLINT,
|
|
FOREIGN KEY (game_id) REFERENCES games (id),
|
|
FOREIGN KEY (user_id) REFERENCES users (id),
|
|
CONSTRAINT game_participants_unique UNIQUE (game_id, user_id)
|
|
);
|
|
|
|
/**
|
|
* This is just exactly the same table as on hanab.live, since we might want to do statistics like BDR, we have to replicate this.
|
|
* Note that for now, we can just ignore this and code up the rest, but we will need this in the long run if we want to do BDR.
|
|
*/
|
|
DROP TABLE IF EXISTS game_actions CASCADE;
|
|
CREATE TABLE game_actions (
|
|
game_id INTEGER NOT NULL,
|
|
turn SMALLINT NOT NULL,
|
|
|
|
/**
|
|
* - 0 - play
|
|
* - 1 - discard
|
|
* - 2 - color clue
|
|
* - 3 - rank clue
|
|
* - 4 - game over
|
|
*/
|
|
type SMALLINT NOT NULL,
|
|
|
|
/**
|
|
* - If a play or a discard, corresponds to the order of the the card that was played/discarded.
|
|
* - If a clue, corresponds to the index of the player that received the clue.
|
|
* - If a game over, corresponds to the index of the player that caused the game to end or -1 if
|
|
* the game was terminated by the server.
|
|
*/
|
|
target SMALLINT NOT NULL,
|
|
|
|
/**
|
|
* - If a play or discard, then 0 (as NULL). It uses less database space and reduces code
|
|
* complexity to use a value of 0 for NULL than to use a SQL NULL:
|
|
* https://dev.mysql.com/doc/refman/8.0/en/data-size.html
|
|
* - If a color clue, then 0 if red, 1 if yellow, etc.
|
|
* - If a rank clue, then 1 if 1, 2 if 2, etc.
|
|
* - If a game over, then the value corresponds to the "endCondition" values in "constants.go".
|
|
*/
|
|
value SMALLINT NOT NULL,
|
|
|
|
FOREIGN KEY (game_id) REFERENCES games (id) ON DELETE CASCADE,
|
|
PRIMARY KEY (game_id, turn)
|
|
);
|
|
|
|
|
|
/**
|
|
* Hanab.live uses the go random number generator to deterministically compute a deck shuffle out of a seed,
|
|
* where the seed is just used to seed the RNG.
|
|
* Since we can't replicate this, we need to store the seeds explicitly, but of course it's enough to do this on a per-seed basis
|
|
* rather than a per-game basis.
|
|
* Note that for now, we can just not populate this table, since this would only be relevant for statistics anyway.
|
|
* TODO: I think it should be an invariant that whenever we have stored the actions of a game, we ensure that we also have stored the
|
|
* corresponding seed here (but not vice versa).
|
|
* Unfortunately, to my knowledge, it is not possible to encode this well in PostgreSQL itself, so we will need to make sure our program
|
|
* has this invariant.
|
|
*/
|
|
DROP TABLE IF EXISTS seeds CASCADE;
|
|
CREATE TABLE seeds (
|
|
seed TEXT NOT NULL,
|
|
/** The index of the card in the deck of the seed, so 0,1,..., 49 for NoVar */
|
|
card_index SMALLINT NOT NULL,
|
|
suit_index SMALLINT NOT NULL,
|
|
rank SMALLINT NOT NULL,
|
|
PRIMARY KEY (seed, card_index)
|
|
);
|
|
|
|
|
|
/**
|
|
* I think the tables above now store everything that we need to know from hanab.live
|
|
* So the following should collect our statistics and ELO now.
|
|
*
|
|
* To store ELO, I suggest we use an append-only data structure, i.e. we just insert a new row whenever any ELO changes
|
|
* and keep the old ones. This as good for roll-back, easy to maintain, and also useful if we at some point want to model progression somwhere.
|
|
* Also, we need to separately store the initial ratings for the variants and users, in my opinion for two reasons:
|
|
* - Do not rely on some config file in the repository to be available to interpret the data here
|
|
* - Allow for separate base ratings on a per-user basis (this might be useful at some point).
|
|
*/
|
|
|
|
/* TABLES RELATED TO ELO */
|
|
|
|
/**
|
|
* Records the initial ratings of variants
|
|
*/
|
|
DROP TABLE IF EXISTS variant_base_ratings CASCADE;
|
|
CREATE TABLE variant_base_ratings (
|
|
/* Note that a variant already includes the number of suits, so we do not have to add this here. */
|
|
variant_id SMALLINT,
|
|
num_players SMALLINT NOT NULL,
|
|
rating REAL NOT NULL,
|
|
PRIMARY KEY (variant_id, num_players)
|
|
);
|
|
|
|
/**
|
|
* Store a separate row for each elo update to one of the variants
|
|
*/
|
|
DROP TABLE IF EXISTS variant_ratings CASCADE;
|
|
CREATE TABLE variant_ratings (
|
|
/** This should reference the game that triggered the elo update */
|
|
league_id INTEGER NOT NULL REFERENCES games (league_id),
|
|
variant_id SMALLINT NOT NULL REFERENCES variants (id),
|
|
num_players SMALLINT NOT NULL,
|
|
|
|
change REAL NOT NULL,
|
|
value_after REAL NOT NULL,
|
|
|
|
PRIMARY KEY (league_id, variant_id, num_players)
|
|
);
|
|
|
|
DROP TABLE IF EXISTS user_base_ratings CASCADE;
|
|
CREATE TABLE user_base_ratings (
|
|
user_id INTEGER REFERENCES users (id),
|
|
/**
|
|
* Since a user has different ratings now, this should represent which of the ratings we mean.
|
|
* For now, I suggest 0 = NoVar, 1 = ClueStarved.
|
|
* I'd use an integer here to be open for more elos per player in future seasons
|
|
*/
|
|
rating_type SMALLINT NOT NULL,
|
|
|
|
rating REAL NOT NULL,
|
|
|
|
PRIMARY KEY (user_id, rating_type)
|
|
);
|
|
|
|
DROP TABLE IF EXISTS user_ratings CASCADE;
|
|
CREATE TABLE user_ratings (
|
|
/**
|
|
* This should reference the game that triggered the elo update.
|
|
* I would use the league_id here for proper ordering.
|
|
* Also note that this can then be used to identify whether a given league game has already been processed for rating change.
|
|
*/
|
|
league_id INTEGER NOT NULL REFERENCES games (league_id),
|
|
|
|
user_id INTEGER NOT NULL REFERENCES users (id),
|
|
rating_type SMALLINT NOT NULL,
|
|
|
|
change REAL NOT NULL,
|
|
value_after REAL NOT NULL,
|
|
|
|
PRIMARY KEY (league_id, user_id)
|
|
);
|
|
|
|
|
|
/* TABLES RELATED TO STATISTICS */
|
|
|
|
/** This is a rough outline, not really happy with it right now. */
|
|
DROP TABLE IF EXISTS game_statistics CASCADE;
|
|
CREATE TABLE game_statistics (
|
|
game_id INTEGER PRIMARY KEY REFERENCES games (id),
|
|
/** I'd say all of the following can just be null in case we have not evaluated them yet. */
|
|
num_bottom_deck_risks SMALLINT,
|
|
num_crits_lost SMALLINT
|
|
);
|
|
|
|
/**
|
|
* Need a new table here, since a single game might have several outcomes:
|
|
* Think of losing a crit and striking out, for example
|
|
*/
|
|
DROP TABLE IF EXISTS game_outcomes CASCADE;
|
|
CREATE TABLE game_outcomes (
|
|
game_id INTEGER REFERENCES games (id),
|
|
/** This stores the game outcome, corresponding to the values of the statistics.GameOutcome enum */
|
|
outcome SMALLINT,
|
|
CONSTRAINT outcomes_no_duplicates UNIQUE (game_id, outcome)
|
|
);
|
|
|
|
DROP TABLE IF EXISTS user_statistics;
|
|
CREATE TABLE user_statistics (
|
|
user_id INTEGER NOT NULL REFERENCES users (id),
|
|
/** We track stats separately for each variant type */
|
|
variant_type SMALLINT NOT NULL,
|
|
current_streak INTEGER,
|
|
maximum_streak INTEGER,
|
|
maximum_streak_last_game INTEGER REFERENCES games (league_id),
|
|
games_played INTEGER,
|
|
games_won INTEGER,
|
|
games_lost INTEGER GENERATED ALWAYS AS (games_played - games_won) STORED,
|
|
winrate REAL GENERATED ALWAYS AS (CASE WHEN games_played != 0 THEN CAST(games_won AS REAL)/ games_played ELSE NULL END) STORED,
|
|
total_bdr INTEGER,
|
|
/** Number of critical cards that were either discarded or misplayed */
|
|
total_crits_lots INTEGER,
|
|
total_game_moves INTEGER,
|
|
average_bdr REAL GENERATED ALWAYS AS (CASE WHEN games_played != 0 THEN CAST(total_bdr AS REAL) / games_played ELSE NULL END) STORED,
|
|
average_crits_lost REAL GENERATED ALWAYS AS (CASE WHEN games_played != 0 THEN CAST(total_crits_lots AS REAL) / games_played ELSE NULL END) STORED,
|
|
/** TODO: We should probably also evaluate this over normally-finished (or won?) games */
|
|
average_game_moves REAL GENERATED ALWAYS AS (CASE WHEN games_played != 0 THEN CAST(total_game_moves AS REAL) / games_played ELSE NULL END) STORED,
|
|
PRIMARY KEY (user_id, variant_type)
|
|
);
|