Allow collection of server statistical data #196

albertosottile · 2018-07-25T16:00:08Z

Because of the recent increase in the update rate of Syncplay, we would like to have some data on our currently user base.

This PR introduces a new server argument --stats-db-file [file] that creates a SQLite database in [file] and stores some anonymous information in it.

We only store non-sensitive and non-personal data that are already being transmitted by the clients in the current protocol. Specifically, for every user in the server at the time of stats collection, we store (EDIT):

server's Unix timestamp at the collection time
user's Syncplay client version

Data collection is scheduled to happen every hour. Nothing else is acquired nor stored in the snapshot.

Thanks to SQLite, full concurrency is supported and multiple servers can use the same DB file at the same time. For further safety, a fixed delay is introduced before the first collection, and its amount is based on the server's port number. (EDIT): twisted.enterprise.adbapi is employed to ensure asynchronous consistency.

Uriziel · 2018-07-25T20:55:25Z

syncplay/server.py


    def getVersion(self):
-        return self._connector.getVersion()
+        pattern = r'\A[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}\Z'


You don't need to do that since it's prepared statement anyway you can save anything pretty much (there could be length check at most, but rather that should be connector's job)

I agree this can be removed and also that a check, if any, should be done in _connector. I'll remove this.

Uriziel · 2018-07-25T20:57:18Z

syncplay/server.py

+        if statsDbFile is not None:
+            self.statsDbHandle = self._connectToStatsDb(statsDbFile)
+            logDelay = 5*(int(self.port)%10 + 1)
+            reactor.callLater(logDelay, self._scheduleClientSnapshot, self.statsDbHandle, self.port)


what's the point of scheduling this instead of running the code on user join?

The idea is for it to occur at set intervals from being started to provide a snapshot, and to have a delay to avoid multiple servers all writing at exactly the same time if they are started at the same time.

Uriziel · 2018-07-25T21:00:04Z

syncplay/server.py

+    def _connectToStatsDb(self, dbPath):
+        conn = sqlite3.connect(dbPath)
+        c = conn.cursor()
+        c.execute('create table if not exists clients_snapshots (snapshot_time integer, port integer, version string, room_index integer, play_status integer)')


would be nice if all the DB stuff:

connection

schema creating

queries

were put into a separate class (service)

I'll try. Should I keep this new class in server.py or move it to a new file and module? I would advise for the former since all this is server-related only.

Uriziel · 2018-07-25T21:04:12Z

syncplay/server.py

+            playStatus = room.isPlaying()
+            for watcher in room.getWatchers():
+                content = (snapshotTime, int(portNumber), watcher.getVersion(), idx, playStatus, )
+                c.execute("INSERT INTO clients_snapshots VALUES (?, ?, ?, ?, ?)", content)


What's the point of saving room state?
Room index is non-deterministic as well and can change within multiple executions on the same dictionary (unless python 3.7).
You'll log multiple users multiple times without any real way to distinguish between them.
Also loging the port makes very litte sense and you should have a separate file for every instance you run (https://www.sqlite.org/faq.html#q5 http://howfuckedismydatabase.com/sqlite/)

instead of room-idx just store the room-name hash.

You can write to single SQLite database with multiple processes - you just need to handle SQLITE_BUSY and wait for other inserts to finish.

By default, sqlite3 handles SQLITE_BUSY by waiting for 5 seconds and then retrying. That's why a port-based fixed delay was introduced. Having a single SQLite database makes handling and analyses much easier, in my opinion.

Room index can be used to get a sense of the typical size of a session, which can help inform development/design decisions (e.g. default size of 'list of who is watching what' section of client)

If encountering SQLITE_BUSY has potential delay of 5 seconds, won't that risk stalling entire server process for 5 second during each iteration of this loop?

My understanding was that only sqlite3.commit locked the database, but probably I was wrong. In any case, I'll try to replace this loop of execute with a list and a single sqlite3.executescript at the end of the loop.

Upon further discussion, @Et0h and I decided to get rid of room_index and play_status for now. The only recorded stats will be the current timestamp, the client's version, and the server port, needed to understand server's load while keeping only one database file.

Uriziel · 2018-07-25T21:05:47Z

syncplay/server.py

            watcher.setPlaylistIndex(room.getName(), room.getPlaylistIndex())

+    def _connectToStatsDb(self, dbPath):
+        conn = sqlite3.connect(dbPath)


maybe you should implement non-blocking db connection:
https://howto.lintel.in/asynchronous-db-operations-twisted/
(haven't tested)

We could do that but, quoting the FAQ you linked before, "SQLite allows multiple processes to have the database file open at once". So, I do not think we should rely on twisted for this.

…ated servers

Uriziel · 2018-07-26T22:14:34Z

syncplay/server.py


+class StatsRecorder(object):
+    def __init__(self):
+        self._dbPool = None


Nitpicking: I'd name this _connection

Uriziel · 2018-07-26T22:14:54Z

syncplay/server.py

+        if self._dbPool is not None:
+            self._dbPool.close()
+
+    def startRecorder(self, dbpath, roomManager, delay):


Nitpicking I'd move dbpath, roomManager into __init__

Uriziel · 2018-07-26T22:21:33Z

syncplay/server.py

+            print("--- Error in initializing the stats database. Server Stats not enabled. ---")
+
+    def _initDatabase(self):
+        dbpool = adbapi.ConnectionPool("sqlite3", self._dbPath, check_same_thread=False)


My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to sqlite to a separate class, thus making abstraction layer between business code (which is "log this and that data") and app logic (which is "use database to store data").
So something like this:

class Database(NameForChange): __init__(dbPath): pass connect(): pass addVersionLog(version, time): pass _createSchema(): pass

which later on can be extended by adding function like fetchUser(username) etc. that all relates to fetching data from data layer (i.e. sqlite)

Uriziel

Keep up good work

Uriziel · 2018-07-26T22:23:43Z

Reposting my last comment as it got squashed by one of the changes (my bad.)

My idea for separate class was to cover "Single Responsibility Principle" which moves all stuff related to sqlite to a separate class, thus making abstraction layer between business code (which is "log this and that data") and app logic (which is "use database to store data").
So something like this (all the names were chosen at random without a second thought given to them):

class Database(NameForChange):
    __init__(dbPath): pass
    connect(): pass
   addVersionLog(version, time): pass
    _createSchema(): pass

which later on can be extended by adding function like fetchUser(username) etc. that all relates to fetching data from data layer (i.e. sqlite)

albertosottile added 6 commits July 25, 2018 17:34

Enable logging support on server for client versions

8a88596

Server logger: fix bug and sanitize version string

9f52834

Server logger: add roomIndex and playStatus

c32bd11

Server Stats: use parametrized query and change column definitions

14863fb

Server Stats: rename feature

a630860

Server Stats: remove unnecessary type casting before query

bd766b4

Uriziel requested changes Jul 25, 2018

View reviewed changes

albertosottile added 4 commits July 26, 2018 16:06

Server Stats: remove port, room_index, and play_status

d1e0c97

Server Stats: use twisted asynchronous adbapi

6564f22

Server Stats: incapsulate collection code in try-catch

6012a2b

Server Stats: remove sqlite3 dependency and enable stats for non-isol…

c99c6e5

…ated servers

Uriziel reviewed Jul 26, 2018

View reviewed changes

Server Stats: addressing review comments

de4e989

Uriziel reviewed Jul 26, 2018

View reviewed changes

Uriziel approved these changes Jul 26, 2018

View reviewed changes

Server Stats: use a separate class to manage the database

3234d9b

Et0h merged commit b3a7ed8 into Syncplay:master Jul 28, 2018

Allow collection of server statistical data #196

Allow collection of server statistical data #196

Uh oh!

Conversation

albertosottile commented Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertosottile Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-123 Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uriziel Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uriziel left a comment

Choose a reason for hiding this comment

Uh oh!

Uriziel commented Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertosottile commented Jul 25, 2018 •

edited

Loading

albertosottile Jul 25, 2018 •

edited

Loading

daniel-123 Jul 25, 2018 •

edited

Loading

Uriziel Jul 26, 2018 •

edited

Loading

Uriziel commented Jul 26, 2018 •

edited

Loading