3
\$\begingroup\$

The following code purports to implement a "shared lock" to protect some resource and that allows multiple "reader" threads to acquire concurrent access to that resource while providing a single "writer" thread exclusive access. In this implementation I have attempted to be "fair", i.e. I want to ensure that both readers and writers are ultimately able to acquire the lock (on the assumption that a lock that is acquired is ultimately released).

  • My main concern is its correctness.

  • Are there any scenarios where the code could indefinitely block a thread?

  • Could the code be implemented more simply or more efficiently?

The Code:

"""
This module implements a "shared lock", implemented with a threading.Condition
instance and its underlying threading.Lock instance.

We allow multiple threads that are just "reading" a resource to
concurrently acquire a "non-exclusive lock" on that
resource while preventing a "writer" thread from acquiring an "exclusive lock"
as long as any thread has acquired the lock. Likewise, when a thread has
acquired exclusive access, all other threads are blocked from acquiring the
lock until the exclusive access is released.
"""

from contextlib import contextmanager
from threading import Condition, Thread, current_thread
from collections import deque

class SharedLock:
    """
    Usage:
        shared_lock = SharedLock()
        ...
        with shared_lock(False):  # request shared access
            ...

        with shared_lock(True):   # request exclusive access
            ...

    The code tries to be "fair" to ensure that all threads can ultimately acquire
    the lock by maintaining a queue of waiting threads. Consider this scenario:
    1. Reader thread 1 requests and acquires shared access at time t=0.
    2. Reader thread 2 requests and acquires shared access at time t=1.
    3. Writer thread 3 requests exclusive access at time t=2 but is now blocked
       because the lock is already held.
    4. Reader thread 4 requests shared access at time t=3. Theoretically, we can
       give the thread shared access since only other "reader" threads have access.
       But if we do so, it is possible that there is always at least one reader
       thread holding a shared lock and the writer thread never gets access.

    Instead, we place writer thread 3 in a queue of threads waiting to acquire
    access and reader thread 4 is likewise enqueued behind thread 3. Thus, when
    reader threads 1 and thread 2 ultimately release its shared lock, then
    the thread at the head of thw waiting queue, i.e. writer thread 3, will be
    given exclusive access.
    """

    def __init__(self):
        self._cond = Condition()
        self._readers = 0
        self._writer_running = False
        self._waiting = deque()

    @contextmanager
    def __call__(self, exclusive: bool=False):
        if not exclusive:  # Shared access
            with self._cond:
                if self._writer_running or self._waiting:
                    this_thread = current_thread()
                    # Add us to the end of the waiting queue:
                    self._waiting.append(this_thread)
                    # And wait until there is no writer running and
                    # we are next to run:
                    self._cond.wait_for(lambda:
                        not self._writer_running and
                        self._waiting[0] == this_thread
                    )
                    self._waiting.popleft()
                self._readers += 1
                self._cond.notify_all()  # Other readers may be waiting

            try:
                yield self
            finally:
                with self._cond:
                    self._readers -= 1
                    if not self._readers:
                        self._cond.notify_all()
        else:  # Exclusive access
            with self._cond:
                if self._readers or self._writer_running or self._waiting:
                    this_thread = current_thread()
                    self._waiting.append(this_thread)
                    self._cond.wait_for(lambda:
                        not self._readers and
                        not self._writer_running and
                        self._waiting[0] == this_thread
                    )
                    self._waiting.popleft()
                self._writer_running = True

            try:
                yield self
            finally:
                with self._cond:
                    self._writer_running = False
                    self._cond.notify_all()

if __name__ == '__main__':
    import time

    shared_lock = SharedLock()

    def reader():
        with shared_lock(False):
            print('start reading', time.time())
            time.sleep(1)
            print('end reading', time.time())

    def writer():
        with shared_lock(True):
            print('start writing', time.time())
            time.sleep(1)
            print('end writing', time.time())

    def test():
        reader_threads1 = [
            Thread(target=reader) for _ in range(4)
        ]

        writer_threads = [
            Thread(target=writer) for _ in range(2)
        ]

        reader_threads2 = [
            Thread(target=reader) for _ in range(2)
        ]

        for t in reader_threads1:
            t.start()

        time.sleep(.01) # Give readers a chance to start

        for t in writer_threads:
            t.start()

        time.sleep(.01) # Give writers a chance to start

        for t in reader_threads2:
            t.start()

        for t in reader_threads1:
            t.join()
        for t in writer_threads:
            t.join()
        for t in reader_threads2:
            t.join()
    test()

Prints:

start reading 1767303995.195186
start reading 1767303995.195186
start reading 1767303995.195186
start reading 1767303995.195186
end reading 1767303996.1962683
end reading 1767303996.1963234
end reading 1767303996.1963234
end reading 1767303996.1963234
start writing 1767303996.197262
end writing 1767303997.1977108
start writing 1767303997.1977108
end writing 1767303998.1983483
start reading 1767303998.1983912
start reading 1767303998.1983912
end reading 1767303999.199174
end reading 1767303999.19917
\$\endgroup\$
5
  • \$\begingroup\$ Interestingly this is a classical problem. The readers-writers problem. It would be relevant to compare your solution with the solution described in Wikipedia with semaphores for the third problem. \$\endgroup\$ Commented Jan 2 at 7:36
  • \$\begingroup\$ As this is a classical problem, regardless of the actual implementation there are some issues to think about. If threads may require two or more resources at the same time you may end up in a deadlock. If thread priority is used (not a standard function in Python but can be used in some OS-s), priority inversion may as well lead to deadlocks. \$\endgroup\$ Commented Jan 3 at 9:08
  • \$\begingroup\$ It does not impact your request for code review, but you do know that there are several fairly popular read / write locks already available at PyPI, right? \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ @JohnBollinger No, I did not know and thanks for the info. This post came about after someone had posted a question on stackoverflow.com looking for such a solution and so I came up with this. Had I known I would have just pointed them to PyPi. So which modules should I look at? BTW, I made this post to be sure I had given a reasonably good answer and I have since updated the answer I had originally given based on the comments here. \$\endgroup\$ Commented yesterday
  • \$\begingroup\$ Some mature packages providing read/write locks (a.k.a. reader / writer locks) include readerwriterlock; fasteners, or, for asyncio, aiorwlock. More can be found via PyPi's search feature. \$\endgroup\$ Commented yesterday

3 Answers 3

6
\$\begingroup\$

docstring

The docstring narrative is lovely, and I thank you for it. It helps an application author understand the before & after of calling into your methods. But it could be tighter and more precise.

In the OP title and in the docstring we find too many "quoted" words. It is pretty standard to talk about Reader and Writer threads, with no need for quoting such terms. Similarly the notion of fairness has been well explored in the literature, without need of quoting. It is so mainstream that I don't even feel there's a need to cite a particular author or reference.

The code tries to be "fair" to ensure that all threads can ultimately acquire the lock by maintaining a queue of waiting threads.

I was hoping for greater rigor. The usual terminology around such promises would be that the implementation is free of starvation, or proof there is no deadlock nor livelock.

Talking about the 4 threads is fine, and helpful. But if that's all we have, it feels more like anecdotal evidence rather than proof of some invariant. Remember, the audience for such a docstring is some hapless application author who is trying to figure out the contract: what must I do beforehand? what can I rely on after the call?

count vs. objects

        self._readers = 0

This feels off. It is not a container of the various readers. Maybe _num_readers?

bool parameter in the Public API

    def __call__(self, exclusive: bool = False):

This is nice enough. But in java and in other languages, the collective wisdom around such parameters is that boolean args are ambiguous or are hard to read at the call site. It is easy to lose sight of what they represent. Consider using this signature instead:

    def __call__(self, *, exclusive: bool = False):

That would force callers to mention exclusive= in each call, which has documentation value for maintainers in 2027 and subsequent years.

too long

My screen isn't big enough to show the __call__() definition without vertical scrolling. Now, maybe your screen resolution and font size lets you show a great many angels dancing on the head of a pin. But on my laptop, the OP source code suggests we should have a def _exclusive() helper, and a def _non_exclusive() helper.

Also, it turns out that self._cond is not a helpful identifer. Tell us what it means, rather than what it is.

In particular, I found this hard to read:

        if not self._readers:
            self._cond.notify_all()

Unconditionally sending a notify would be fine, if potentially slow. It is not yet obvious to me why the if makes sense. And I ask that you re-phrase it as if self._readers == 0:

The bigger issue is that the meaning of self._cond is not yet obvious to me.

conditional def's

if __name__ == "__main__":
    import time
        ...
    def reader():
        ...
    def writer():
        ...
    def test():

Please don't import time in that way; it belongs up top. Please don't conditionally def some functions.

Python grants you the freedom to organize your code in modules; please take advantage of that.

Possibly you would prefer to name some of these functions so pytest will discover them as tests. Or you might want them as methods within a class that inherits from unittest.

Also, please black *.py && isort *.py, to improve the legibility of the source code.

\$\endgroup\$
1
  • \$\begingroup\$ The reason for if self._num_readers == 0: self._cond.notify_all() is as follows: First, I assert that when a non-exclusive lock is being released, the self._waiting queue cannot have a reader at the front because a reader will only queue up if the queue is not empty or a writer has access. If anything is on the queue, the front of it must be a writer (for greater efficiency the test should be if self._num_readers == 0 and self._waiting: self._cond_notify_all). So, if num_readers is now 0, then if the queue is non-empty, the front contains a writer that can now be given access. \$\endgroup\$ Commented Jan 2 at 14:29
4
\$\begingroup\$

Imports and testing

The following import should probably be at the top of the file.

if __name__ == '__main__':
    import time

Similarly it feels like the functions you've defined for testing should be outside of the guarded section. Maybe these belong in a separate file that imports SharedLock.

__call__ too long

I would venture that your call function is too long. The two branches should probably be separate functions with __call__ deciding which to call.

The formatting of your test output would benefit from being formatted so the numbers align vertically.

\$\endgroup\$
2
  • \$\begingroup\$ Thanks. If somebody is importing this module I saw no reason to do an import of time that is not going to be used so that is why it is where it is. \$\endgroup\$ Commented Jan 1 at 22:39
  • 1
    \$\begingroup\$ @Booboo in that case, it would probably be better to have your module code and test code separate in your repo/etc. \$\endgroup\$ Commented Jan 2 at 5:51
3
\$\begingroup\$

In addition to the other answers, here are a few more observations.

Layout

I really like the consistent layout of the code.

Simpler

In the __call__ function, this if/else:

if not exclusive:  # Shared access

else:  # Exclusive access

would be simpler without the not and with the statements under each branch swapped:

if exclusive:  # Exclusive access

else:  # Shared access

Then, you arguably don't need the Exclusive access comment.

Output

The output is vague. For example, in the output you posted, these 3 identical lines are shown

end reading 1767303996.1963234
end reading 1767303996.1963234
end reading 1767303996.1963234

It would be helpful to display more context. If possible, show the thread number with each line of output.

Typo

This line in the SharedLock class docstring has a typo:

the thread at the head of thw waiting queue, i.e. writer thread 3, will be

"thw" should be "the". I ran a spell-checker to find this.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.