-
Notifications
You must be signed in to change notification settings - Fork 237
Allow collection of server statistical data #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
syncplay/server.py
Outdated
|
|
||
| def getVersion(self): | ||
| return self._connector.getVersion() | ||
| pattern = r'\A[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}\Z' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to do that since it's prepared statement anyway you can save anything pretty much (there could be length check at most, but rather that should be connector's job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this can be removed and also that a check, if any, should be done in _connector. I'll remove this.
syncplay/server.py
Outdated
| if statsDbFile is not None: | ||
| self.statsDbHandle = self._connectToStatsDb(statsDbFile) | ||
| logDelay = 5*(int(self.port)%10 + 1) | ||
| reactor.callLater(logDelay, self._scheduleClientSnapshot, self.statsDbHandle, self.port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of scheduling this instead of running the code on user join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is for it to occur at set intervals from being started to provide a snapshot, and to have a delay to avoid multiple servers all writing at exactly the same time if they are started at the same time.
syncplay/server.py
Outdated
| def _connectToStatsDb(self, dbPath): | ||
| conn = sqlite3.connect(dbPath) | ||
| c = conn.cursor() | ||
| c.execute('create table if not exists clients_snapshots (snapshot_time integer, port integer, version string, room_index integer, play_status integer)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if all the DB stuff:
- connection
- schema creating
- queries
were put into a separate class (service)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try. Should I keep this new class in server.py or move it to a new file and module? I would advise for the former since all this is server-related only.
syncplay/server.py
Outdated
| playStatus = room.isPlaying() | ||
| for watcher in room.getWatchers(): | ||
| content = (snapshotTime, int(portNumber), watcher.getVersion(), idx, playStatus, ) | ||
| c.execute("INSERT INTO clients_snapshots VALUES (?, ?, ?, ?, ?)", content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of saving room state?
Room index is non-deterministic as well and can change within multiple executions on the same dictionary (unless python 3.7).
You'll log multiple users multiple times without any real way to distinguish between them.
Also loging the port makes very litte sense and you should have a separate file for every instance you run (https://www.sqlite.org/faq.html#q5 http://howfuckedismydatabase.com/sqlite/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of room-idx just store the room-name hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write to single SQLite database with multiple processes - you just need to handle SQLITE_BUSY and wait for other inserts to finish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, sqlite3 handles SQLITE_BUSY by waiting for 5 seconds and then retrying. That's why a port-based fixed delay was introduced. Having a single SQLite database makes handling and analyses much easier, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Room index can be used to get a sense of the typical size of a session, which can help inform development/design decisions (e.g. default size of 'list of who is watching what' section of client)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If encountering SQLITE_BUSY has potential delay of 5 seconds, won't that risk stalling entire server process for 5 second during each iteration of this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that only sqlite3.commit locked the database, but probably I was wrong. In any case, I'll try to replace this loop of execute with a list and a single sqlite3.executescript at the end of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further discussion, @Et0h and I decided to get rid of room_index and play_status for now. The only recorded stats will be the current timestamp, the client's version, and the server port, needed to understand server's load while keeping only one database file.
syncplay/server.py
Outdated
| watcher.setPlaylistIndex(room.getName(), room.getPlaylistIndex()) | ||
|
|
||
| def _connectToStatsDb(self, dbPath): | ||
| conn = sqlite3.connect(dbPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should implement non-blocking db connection:
https://howto.lintel.in/asynchronous-db-operations-twisted/
(haven't tested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that but, quoting the FAQ you linked before, "SQLite allows multiple processes to have the database file open at once". So, I do not think we should rely on twisted for this.
syncplay/server.py
Outdated
|
|
||
| class StatsRecorder(object): | ||
| def __init__(self): | ||
| self._dbPool = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: I'd name this _connection
syncplay/server.py
Outdated
| if self._dbPool is not None: | ||
| self._dbPool.close() | ||
|
|
||
| def startRecorder(self, dbpath, roomManager, delay): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking I'd move dbpath, roomManager into __init__
syncplay/server.py
Outdated
| print("--- Error in initializing the stats database. Server Stats not enabled. ---") | ||
|
|
||
| def _initDatabase(self): | ||
| dbpool = adbapi.ConnectionPool("sqlite3", self._dbPath, check_same_thread=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to sqlite to a separate class, thus making abstraction layer between business code (which is "log this and that data") and app logic (which is "use database to store data").
So something like this:
class Database(NameForChange):
__init__(dbPath): pass
connect(): pass
addVersionLog(version, time): pass
_createSchema(): pass
which later on can be extended by adding function like fetchUser(username) etc. that all relates to fetching data from data layer (i.e. sqlite)
Uriziel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep up good work
|
Reposting my last comment as it got squashed by one of the changes (my bad.) My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to which later on can be extended by adding function like |
Because of the recent increase in the update rate of Syncplay, we would like to have some data on our currently user base.
This PR introduces a new server argument
--stats-db-file [file]that creates a SQLite database in [file] and stores some anonymous information in it.We only store non-sensitive and non-personal data that are already being transmitted by the clients in the current protocol. Specifically, for every user in the server at the time of stats collection, we store (EDIT):
Data collection is scheduled to happen every hour. Nothing else is acquired nor stored in the snapshot.
Thanks to SQLite, full concurrency is supported and multiple servers can use the same DB file at the same time. For further safety, a fixed delay is introduced before the first collection, and its amount is based on the server's port number. (EDIT):
twisted.enterprise.adbapiis employed to ensure asynchronous consistency.