2023-11-22 14:21:09 +01:00
/* *
* Some general thoughts first regarding how I think we should design the database :
*
* There should be a semantic of what it means for our data to be ' consistent ' :
* If it is consistent , our program interacting with the database should be well - behaved , i . e . never crash or lead to weird behaviour .
* From any consistent state , our program should be able to bring the database up to the most recent state automatically , i . e .
* - Update games
* - Calculate ELO
* - Calculate statistics
* without us having to fix anything in the database .
*
* Conversely , our program should be designed in such a way that every database transaction that we do will leave the database in a consistent state .
* This ensures that ( as long as there is no programming error ) , we can start / stop / run the program at any time , always leaving the database in a
* consistent state .
* In particular , in order to be able to have reasonably small transactions , the following all should be valid transactions :
* - Adding just the metadata ( who , score , when , but not detailed actions or deck ) of some game , without processing any of its statistics or elo changes
* - Adding detailed data of a game ( actions , deck )
* - Calculating any subset of statistics for a game ( BDR , moves , loss reasons ( like : ' Discard crit ' , ' Bomb crit ' , ' Strikeout ' or similar ) )
* - Adding any set of players to the database
* - Processing a single game for ELO change
*
* Conversely , the database should be designed in such a way that all of the above updates lead to consistent states . In particular , I ' d like it to be ok if
* - Games are stored in the database whose statistics or ELO have not been processed yet
* - Game metadata is stored in the database , but their actions / deck is not
* - New players are part of the database , but we did not check for games involving them
*
* Also , I think that there should be * no * state - change in any other files except the database when running the program :
* That is , for a full replica of the server state , the source code and the current database state should be enough .
* This means no json / csv files or whatever that we both write and read to .
* ( Input config files as part of the repository are ouf course fine , as well as output files like log )
*
* One thing I felt like was a bit difficult to handle during season zero is people registering after they played some games .
* At least I think this caused you quite some effort , especially since it ' s then not so easy to get their games into the database.
* I recall that you just stored a single game id which represented the latest one that your program already processed .
* To account for late sign - ups , I suggest the following :
* - For each registered league user , store the latest game id that we processed ( = downloaded , not necessarily calculated ELO or anything ) separately .
* This ensures that for new users , we will fetch all their games on the first database run , whereas for already - registered ones , we do not .
* - When checking for new games , we can now send these ids to the hanab . live site ( there is an API parameter to request only games with id > = x , I think you did not use it last time )
*
* Now there ' s a slight problem, since calculating ELO depends on the order of games. In case we have late sign-ups, I think it ' s not so nice if they retroactively
* affect ELO . In particular , this could lead to ELO changes in players X that did not even play a game with new player N ,
* if N and Y played some game and after that , Y and X did , meaning that Y ' s elo when playing with X changes retroactively, thus also affecting X.
*
* Therefore , I suggest that additionally , each game that we download from hanab . live gets a ' league id ' , which is just a serial number and represents
* the ordering of games in which we process them .
* So in case of late sign - ups we would just process these games a bit later . In particular , in case of no late sign - ups , league ids and game ids from hanab . live
* should have the same ordering , but this might change when we have late sign - ups .
* I ' d like to store this, because in case of bugs or anything, we would then be able to re-calculate all ELO deterministically with this data.
* Also , it ' s nice to be transparent about this, when we do have some game logs on the website or anything, we should order by this league id, since this is the
* order that we used for calculating the ELO .
* Let me know if you think this is too complicated , but to me personally , I ' d quite like that, since it just means
* ( together with the above mentioned solution of storing latest download ids on a per - user basis ) that late sign - ups are handled well automatically .
*
* /
/* *
* To store the league participants , I suggest the following :
* - Represent each participant by internal user id
* - Store ' display name ' ( what shows up in the tables etc ) for each participant .
* This will only be relevant for output , not any of the algorithms , so we could easily change this at any time .
* - Store the account names ( from hanab . live ) separately for each account , thereby allowing for multiple ones having the same user_id
* I ' m not sure how much we need this, but I feel like it ' s not too hard to have this extra flexibility in our data structure , so just in case
* I would support this .
* To be honest , I ' d also like to apply all of the code to some non-league related games where I ' d need this : D .
* /
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS users CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE users (
id SERIAL PRIMARY KEY ,
2023-11-22 18:22:53 +01:00
player_name TEXT NOT NULL UNIQUE
2023-11-22 14:21:09 +01:00
) ;
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS user_accounts CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE user_accounts (
/* The username from hanab.live. This will be used a) for associating games to one of our user_ids and b) show up in the player profile. */
2023-11-22 16:00:26 +01:00
username TEXT PRIMARY KEY ,
2023-11-22 14:21:09 +01:00
/* *
* Hanab . live stores a normalized username for each account to ensure that usernames are case - insensitive and don ' t have weird unicodes that resemble each other.
* It uses go - unidecode to generate these from the actual usernames , and I suggest we do the same .
* The corresponding python package is ' unidecode ' , so we should just use that here .
* This will then ensure that no two sign - ups correspond to the same hanab . live account .
* /
2023-11-22 16:00:26 +01:00
normalized_username TEXT NOT NULL UNIQUE ,
2023-11-22 14:21:09 +01:00
user_id INTEGER NOT NULL ,
FOREIGN KEY ( user_id ) REFERENCES users ( id ) ON DELETE CASCADE
) ;
/*
* The latest game id that we already downloaded for a specific user . Note that this can never decrease , since on hanab . live , game ids are only incrementing .
* In my opinion , it should be a consistent state to * not * have an entry in this table for a user .
* In that case , when querying , we would of course assume nothing and fetch all the data from hanab . live ( from the start of the league )
* and place an entry here after that .
* Also I suggest that the latest_game_id actually corresponds to a game that we have stored in the database ,
* this helps with further sanity checks or if we want to know if statistics are up to date .
* /
DROP TABLE IF EXISTS downloads ;
CREATE TABLE downloads (
/* * Notice this has to be a hanab.live username, not a user_id */
2023-11-22 23:35:47 +01:00
normalized_username TEXT PRIMARY KEY REFERENCES user_accounts ( normalized_username ) ,
2023-11-22 16:00:26 +01:00
latest_game_id INTEGER NOT NULL
2023-11-22 14:21:09 +01:00
) ;
/* *
* For completeness , I suggest we store the hanab . live variants here together with their names .
* So this will just be a static table that we populate once .
* /
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS variants CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE variants (
id INTEGER PRIMARY KEY ,
2023-11-22 16:00:26 +01:00
name TEXT NOT NULL ,
2023-11-22 14:21:09 +01:00
num_suits INTEGER NOT NULL ,
2023-11-22 18:56:08 +01:00
clue_starved BOOLEAN NOT NULL
2023-11-22 14:21:09 +01:00
) ;
/* *
* We should store data on all games locally for two reasons :
* - Only download detailed data for each game once ( The actions taken and the actual deck )
* - We can evaluate statistics ( also complicated ones ) whenever we want , independent of the server
* /
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS games CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE games (
/* * Here, we should use the same ids as the game ids from hanab.live */
id INTEGER PRIMARY KEY ,
num_players SMALLINT NOT NULL ,
/* * This should be the same variant id as on hanab.live, even if we don't use all of them. */
variant_id SMALLINT NOT NULL ,
/* * Same here: seed from hanab.live */
seed TEXT NOT NULL ,
score SMALLINT NOT NULL ,
2023-11-22 23:35:47 +01:00
num_turns SMALLINT NOT NULL ,
2023-11-22 14:21:09 +01:00
/* *
* This is the league id mentioned above that will represent the ordering of games regarding being processed by ELO .
* Note that this means when fetching new data from hanab . live , we have to fetch * all * of it and insert the games sorted
* by their ids . It ' s not technically needed for the ELO system to function, but it ' s definitely desirable to process them in order , of course .
* /
league_id SERIAL NOT NULL UNIQUE ,
FOREIGN KEY ( variant_id ) REFERENCES variants ( id )
) ;
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS game_participants CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE game_participants (
game_id INTEGER NOT NULL ,
/* *
* This is our internal user id for league accounts . In particular , if a league account is associated to multiple hanab . live accounts ,
* we will lose the distinction between them when storing participants in this way ( which I think is totally ok ) .
* Also , this makes insertion slightly tricky : You always have to look up these ids first before being able to insert a game .
* There could be one argument that we should store the hanab . live username here : This would enable us to list games ( if we were to do such a stats page )
* with the hanab . live account names instead of the ' display names ' that people used for league sign - up .
* Let me know what you think .
* /
user_id INTEGER NOT NULL ,
/* *
* Hanab . live stores this as well , I don ' t think that we need it currently, but there ' s no harm done in storing this , since it would enable us to know
* which player is which if we were to do per - player statistics also inside an individual game .
* For example , I could imagine stuff like : ' Clues given ' or ' Cards played ' , which might actually vary across players .
2023-11-22 23:35:47 +01:00
* Unfortunately , when using the ' history-full ' api hook , player names are reported alphabetically , so to retrieve
* the seat order , we have to use the ' export ' endpoint of the api , which requires an extra request for each game .
* This is why we allow for this entry to be null in general , so that we can easily store the games without the seat
* being known .
2023-11-22 14:21:09 +01:00
* /
2023-11-22 23:35:47 +01:00
seat SMALLINT ,
2023-11-22 14:21:09 +01:00
FOREIGN KEY ( game_id ) REFERENCES games ( id ) ,
FOREIGN KEY ( user_id ) REFERENCES users ( id ) ,
CONSTRAINT game_participants_unique UNIQUE ( game_id , user_id )
) ;
/* *
* This is just exactly the same table as on hanab . live , since we might want to do statistics like BDR , we have to replicate this .
* Note that for now , we can just ignore this and code up the rest , but we will need this in the long run if we want to do BDR .
* /
DROP TABLE IF EXISTS game_actions CASCADE ;
CREATE TABLE game_actions (
game_id INTEGER NOT NULL ,
turn SMALLINT NOT NULL ,
/* *
* - 0 - play
* - 1 - discard
* - 2 - color clue
* - 3 - rank clue
* - 4 - game over
* /
type SMALLINT NOT NULL ,
/* *
* - If a play or a discard , corresponds to the order of the the card that was played / discarded .
* - If a clue , corresponds to the index of the player that received the clue .
* - If a game over , corresponds to the index of the player that caused the game to end or - 1 if
* the game was terminated by the server .
* /
target SMALLINT NOT NULL ,
/* *
* - If a play or discard , then 0 ( as NULL ) . It uses less database space and reduces code
* complexity to use a value of 0 for NULL than to use a SQL NULL :
* https : / / dev . mysql . com / doc / refman / 8 . 0 / en / data - size . html
* - If a color clue , then 0 if red , 1 if yellow , etc .
* - If a rank clue , then 1 if 1 , 2 if 2 , etc .
* - If a game over , then the value corresponds to the " endCondition " values in " constants.go " .
* /
value SMALLINT NOT NULL ,
FOREIGN KEY ( game_id ) REFERENCES games ( id ) ON DELETE CASCADE ,
PRIMARY KEY ( game_id , turn )
) ;
/* *
* Hanab . live uses the go random number generator to deterministically compute a deck shuffle out of a seed ,
* where the seed is just used to seed the RNG .
* Since we can ' t replicate this, we need to store the seeds explicitly, but of course it ' s enough to do this on a per - seed basis
* rather than a per - game basis .
* Note that for now , we can just not populate this table , since this would only be relevant for statistics anyway .
* TODO : I think it should be an invariant that whenever we have stored the actions of a game , we ensure that we also have stored the
* corresponding seed here ( but not vice versa ) .
* Unfortunately , to my knowledge , it is not possible to encode this well in PostgreSQL itself , so we will need to make sure our program
* has this invariant .
* /
DROP TABLE IF EXISTS seeds CASCADE ;
CREATE TABLE seeds (
seed TEXT NOT NULL ,
/* * The index of the card in the deck of the seed, so 0,1,..., 49 for NoVar */
card_index SMALLINT NOT NULL ,
suit_index SMALLINT NOT NULL ,
rank SMALLINT NOT NULL ,
2023-11-23 13:18:38 +01:00
PRIMARY KEY ( seed , card_index )
2023-11-22 16:00:26 +01:00
) ;
2023-11-22 14:21:09 +01:00
/* *
* I think the tables above now store everything that we need to know from hanab . live
* So the following should collect our statistics and ELO now .
*
* To store ELO , I suggest we use an append - only data structure , i . e . we just insert a new row whenever any ELO changes
* and keep the old ones . This as good for roll - back , easy to maintain , and also useful if we at some point want to model progression somwhere .
* Also , we need to separately store the initial ratings for the variants and users , in my opinion for two reasons :
* - Do not rely on some config file in the repository to be available to interpret the data here
* - Allow for separate base ratings on a per - user basis ( this might be useful at some point ) .
* /
/* TABLES RELATED TO ELO */
/* *
* Records the initial ratings of variants
* /
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS variant_base_ratings CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE variant_base_ratings (
2023-11-22 17:12:45 +01:00
/* Note that a variant already includes the number of suits, so we do not have to add this here. */
variant_id SMALLINT ,
2023-11-22 18:22:53 +01:00
player_count SMALLINT NOT NULL ,
2023-11-22 17:12:45 +01:00
rating REAL NOT NULL ,
PRIMARY KEY ( variant_id , player_count )
2023-11-22 14:21:09 +01:00
) ;
/* *
* Store a separate row for each elo update to one of the variants
* /
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS variant_ratings CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE variant_ratings (
/* * This should reference the game that triggered the elo update */
2023-11-23 15:55:03 +01:00
league_id INTEGER PRIMARY KEY REFERENCES games ( league_id ) ,
2023-11-22 14:21:09 +01:00
variant_id SMALLINT NOT NULL ,
2023-11-22 17:12:45 +01:00
player_count SMALLINT NOT NULL ,
2023-11-22 14:21:09 +01:00
change REAL NOT NULL ,
value_after REAL NOT NULL ,
2023-11-22 16:00:26 +01:00
FOREIGN KEY ( variant_id ) REFERENCES variants ( id )
2023-11-22 14:21:09 +01:00
) ;
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS user_base_ratings CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE user_base_ratings (
user_id SMALLINT ,
/* *
* Since a user has different ratings now , this should represent which of the ratings we mean .
* For now , I suggest 0 = NoVar , 1 = ClueStarved .
* I ' d use an integer here to be open for more elos per player in future seasons
* /
type SMALLINT NOT NULL ,
rating REAL NOT NULL ,
PRIMARY KEY ( user_id , type ) ,
FOREIGN KEY ( user_id ) REFERENCES users ( id )
) ;
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS user_ratings CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE user_ratings (
2023-11-23 15:24:33 +01:00
/* *
* This should reference the game that triggered the elo update .
* I would use the league_id here for proper ordering .
* Also note that this can then be used to identify whether a given league game has already been processed for rating change .
* /
2023-11-23 15:55:03 +01:00
league_id INTEGER REFERENCES games ( league_id ) ,
2023-11-22 14:21:09 +01:00
user_id SMALLINT NOT NULL ,
type SMALLINT NOT NULL ,
2023-11-22 14:32:41 +01:00
/* *
* Do we want to store this here as well ? Would be nice to be displayed in some elo page imo .
* Note : We don ' t need to store the result (i guess), since we can easily retrieve that info by looking up the game using the league_id
2023-11-23 15:55:03 +01:00
* TODO : Since I ' m not even sure on the rating model yet (I could imagine something slightly different than a team rating),
* I ' ll leave this here as potentially null for now and don ' t implement it .
2023-11-22 14:32:41 +01:00
* /
2023-11-23 15:55:03 +01:00
team_rating REAL ,
2023-11-22 14:21:09 +01:00
change REAL NOT NULL ,
value_after REAL NOT NULL ,
2023-11-23 15:55:03 +01:00
FOREIGN KEY ( user_id ) REFERENCES users ( id ) ,
CONSTRAINT user_change_per_game_unique UNIQUE ( league_id , user_id )
2023-11-22 14:21:09 +01:00
) ;
/* TABLES RELATED TO STATISTICS */
/* * This is a rough outline, not really happy with it right now. */
2023-11-22 16:00:26 +01:00
DROP TABLE IF EXISTS statistics CASCADE ;
2023-11-22 14:21:09 +01:00
CREATE TABLE statistics (
game_id INTEGER PRIMARY KEY ,
/* * I'd say all of the following can just be null in case we have not evaluated them yet. */
bdr SMALLINT ,
moves SMALLINT ,
strikeout BOOLEAN
) ;
/* *
* TODO : I ' m really unsure right now how to store user-related statistics, stuff like
* - Current streak
* - Average bdrs / moves / whatever
* - Cumulative stuff like ' number of wins/losses ' , ' number of strikeouts ' etc .
* Computationally it would just be fine to re - evaluate them whenever needed , since it ' s only looking stuff up in the database + some linear time calculation.
* On the other hand , it would be sort of nice to have them here as well .
* In the latter case , I ' d probably suggest having just one row for each user that we automatically update whenever we process a game of that user,
* together with an indication on how recent that entry is ( so that on query , we always know if it ' s up to date).
* /