FUDCon Pune 2015

Pune, 26th to 28th June, 2015


26th to 28th June, 2015 09:00 am - 06:00 pm

Website: FUDCon Pune 2015


MIT College of Engineering
See map

Subscribe & Share

June 27, 2015, 2:40 pm

Improving thread synchronization in GlusterD (Daemon for Glusterfs) using Userspace RCU (Read-copy-update)


Glusterfs is a open source scalable distributed file system which can run in any commodity hardware. glusterD is daemon which manages the cluster configuration.

For more information about Glusterfs please refer to http://www.gluster.org/

What is Big-lock in GlusterD:

GlusterD was originally designed as a single threaded application which
could handle just one transaction at a time. It was made multi-threaded to
improve responsiveness and support handling multiple transactions at a
time. This was needed for newer features like volume snapshots which could
leave GlusterD unresponsive for some periods of time.

Making GlusterD multi-threaded required the creation of a thread
synchronization mechanism, to protect the shared data-structures (mainly
everything under the GlusterD configuration, glusterd_conf_t struct) from
concurrent access from multiple threads. This was accomplished using the

The Big-lock is an exclusive lock, so any threads which needs to use the
protected data need to obtain the Big-lock and give up the Big-lock once

Problem with Big-lock

The Big-lock synchronization solution was added into the GlusterD code to
solve problems that arose when GlusterD was made multi-threaded. This was
supposed to be a quick solution, to allow GlusterD to be shipped.

Big-lock as the name suggests, is a coarse grained lock. The coarseness of
the lock leads to threads contending even when they are accessing unrelated
data, which lead to some deadlocks.

One example of this deadlock is with transactions and RPC. If a thread
holding the Big-lock blocked on network I/O it may result in a deadlock.
This could happen when the remote endpoint is disconnected. The callback
code would be executed in the same thread that has acquired the Big-lock.
All network I/O handlers, including callbacks, are implemented to acquire
the Big-lock before executing. From the above two, we have a deadlock.

To avoid this, we release the Big-lock whenever a thread could block on
network I/O. This comes with a price. This opens up a window of time when
the shared data structures are prone to updates leading to inconsistencies.

The Big-lock, in its current state, doesn’t even fully satisfy the problem
it set out to solve, and has more problems on top of that. These problems
are only going to grow with new features and new code being added to

Possible solutions

The most obvious solution would be to split up the Big-lock into more fine
grained locks. We could go one step further and use replace the mutex locks
(Big-lock is a mutex lock), with readers-writer locks. This will bring in
more flexibility and fine grained control, at the cost of additional
overheads mainly in the complexity of implementation.

As an alternative to readers-writer locks, we propose to use RCU as the
synchronization mechanism. RCU provides several advantages above
readers-writer locks while providing similar synchronization features.
These advantages make it more preferable to readers-writer locks, even
though the implementation complexity remains nearly the same for both

Read-copy-update (RCU)

RCU, short for Read-Copy-Update, is a synchronization mechanism that can be
used as an alternative to reader-writer locks.

A good introduction to RCU can be found in this series of articles on LWN
[1] and [2]. The articles are with respect to the usage of RCU in the
Linux kernel, where it is used heavily.

The advantages that make RCU preferable to RWlocks are the following,

- Wait free reads
RCU readers have no wait overhead. They can never be blocked by writers.
RCU readers need to notify when they are in their critical sections, but
this notification is much lighter than locks.

- Provides existence guarantees
RCU guarantees that RCU protected data in a readers critical section will
remain in existence till the end of the critical section. This is achieved
by having the writers work on a copy of the data, instead of using the
existing data.

- Concurrent readers and writers
Wait-free reads and the existence guarantee mean that it is possible to
have readers and writers in concurrent execution. Any readers in execution,
before a writer starts will continue working with the original copy of the
data. The writer will work on a copy, and will use RCU methods to
swap/replace original data without affecting existing readers. Any readers
coming online after the writer will see the new data.
This does mean that some readers will continue to work with stale data,
but this isn't too big a problem as the data at least remains consistent
till the reader finishes.

- Read-side deadlock immunity
RCU readers always run in a deterministic time as they never block. This
means that they can never become a part of a deadlock.

- No writer starvation
As RCU readers don't block, writers can never starve.

Userspace RCU

The kernel uses features provided by the processor to implement its RCU.
Userspace applications cannot make use of these features, but instead can
use the Userspace RCU library.

liburcu [3] provides a userspace implementation of RCU, which is
portable across multiple platforms and operating systems. liburcu also
provides some common data structures and RCU protected APIs to use them.

An introduction to URCU and its APIs can be found in this article on LWN

[1]: https://lwn.net/Articles/262464/
[2]: https://lwn.net/Articles/263130/
[3]: http://urcu.so/
[4]: https://lwn.net/Articles/573424/

Hall: Seminar Hall [4th Floor] Track: Main Conference Type: Talk