# Sequential Lock

Previous `readers-writer` lock may lead to `writer starvation` problem. A writer process can't acquire a lock as long as at least one reader process which acquired a lock holds it. So, in the situation when contention is high, it will lead to situation when a writer process which wants to acquire a lock will wait for it for a long time.

对于 **写饥饿** 问题，seqlock的解决办法是有一个sequence number。每次写的时候都增加sequence number，其实也就类似于MVCC中的版本号。对于 **读锁** 有两种，第一种是block writer，这种和之前一样。

另一种是不会 block writer，而是在进入critical section之前，先读取sequence number，然后进入critical section，读取完成后，再读取一遍sequence number，跟之前读取的数值比较，如果一样，说明读取中间没有writer操作，就返回。否则继续读取。

实际上，写操作在操作前后需要给 sequence number 分别加1，以区分**正在更改**和**已经改完了**，后面代码会看到


## include/linux/seqlock.h

--------------------------------------------

```c
/*
 * Reader/writer consistent mechanism without starving writers. This type of
 * lock for data where the reader wants a consistent set of information
 * and is willing to retry if the information changes. There are two types
 * of readers:
 * 1. Sequence readers which never block a writer but they may have to retry
 *    if a writer is in progress by detecting change in sequence number.
 *    Writers do not wait for a sequence reader.
 * 2. Locking readers which will wait if a writer or another locking reader
 *    is in progress. A locking reader in progress will also block a writer
 *    from going forward. Unlike the regular rwlock, the read lock here is
 *    exclusive so that only one locking reader can get it.
 *
 * This is not as cache friendly as brlock. Also, this may not work well
 * for data that contains pointers, because any writer could
 * invalidate a pointer that a reader was following.
 *
 * Expected non-blocking reader usage:
 * 	do {
 *	    seq = read_seqbegin(&foo);
 * 	...
 *      } while (read_seqretry(&foo, seq));
 *
 *
 * On non-SMP the spin locks disappear but the writer still needs
 * to increment the sequence variables because an interrupt routine could
 * change the state of the data.
 *
 * Based on x86_64 vsyscall gettimeofday 
 * by Keith Owens and Andrea Arcangeli
 */
```

---------------------------


```c
/*
 * Version using sequence counter only.
 * This can be used when code has its own mutex protecting the
 * updating starting before the write_seqcountbeqin() and ending
 * after the write_seqcount_end().
 */
typedef struct seqcount {
	unsigned sequence;
} seqcount_t;


typedef struct {
	struct seqcount seqcount;
	spinlock_t lock;
} seqlock_t;

```

1. `lock` 是用来隔离不同 writer 以及 exclude模式的reader

2. `sequence` 为偶数，表明已经写完，为奇数表明正在修改critical section
--------------------------

```c
static inline void raw_write_seqcount_begin(seqcount_t *s)
{
	s->sequence++;
	smp_wmb();
}

static inline void raw_write_seqcount_end(seqcount_t *s)
{
	smp_wmb();
	s->sequence++;
}

/**
 * write_seqcount_invalidate - invalidate in-progress read-side seq operations
 * @s: pointer to seqcount_t
 *
 * After write_seqcount_invalidate, no read-side seq operations will complete
 * successfully and see data older than this.
 */
static inline void write_seqcount_invalidate(seqcount_t *s)
{
	smp_wmb();
	s->sequence+=2;
}

```

1. `raw_write_seqcount_begin` 在写开始前调用，使seq++，此时seq为**奇数**

2. `raw_write_seqcount_end` 写完之后调用，使seq++，此时seq为**偶数**

--------------------------

```c

/**
 * raw_write_seqcount_barrier - do a seq write barrier
 * @s: pointer to seqcount_t
 *
 * This can be used to provide an ordering guarantee instead of the
 * usual consistency guarantee. It is one wmb cheaper, because we can
 * collapse the two back-to-back wmb()s.
 *
 *      seqcount_t seq;
 *      bool X = true, Y = false;
 *
 *      void read(void)
 *      {
 *              bool x, y;
 *
 *              do {
 *                      int s = read_seqcount_begin(&seq);
 *
 *                      x = X; y = Y;
 *
 *              } while (read_seqcount_retry(&seq, s));
 *
 *              BUG_ON(!x && !y);
 *      }
 *
 *      void write(void)
 *      {
 *              Y = true;
 *
 *              raw_write_seqcount_barrier(seq);
 *
 *              X = false;
 *      }
 */
static inline void raw_write_seqcount_barrier(seqcount_t *s)
{
	s->sequence++;
	smp_wmb();
	s->sequence++;
}
```

1. 这种方式比consistency guarantee要弱一些。一致性要求 reader 每次读到的X，Y都是一个stable状态的值

而这里的只要求order guarantee，比如当我看到X变了，那么Y一定也变了（但并不一定是在一次write中改变的，可能变了多次）


--------------------------

```c
/**
 * __read_seqcount_begin - begin a seq-read critical section (without barrier)
 * @s: pointer to seqcount_t
 * Returns: count to be passed to read_seqcount_retry
 *
 * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
 * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
 * provided before actually loading any of the variables that are to be
 * protected in this critical section.
 *
 * Use carefully, only in critical code, and comment how the barrier is
 * provided.
 */
static inline unsigned __read_seqcount_begin(const seqcount_t *s)
{
	unsigned ret;

repeat:
	ret = READ_ONCE(s->sequence);
	if (unlikely(ret & 1)) {
		cpu_relax();
		goto repeat;
	}
	return ret;
}


/**
 * raw_seqcount_begin - begin a seq-read critical section
 * @s: pointer to seqcount_t
 * Returns: count to be passed to read_seqcount_retry
 *
 * raw_seqcount_begin opens a read critical section of the given seqcount.
 * Validity of the critical section is tested by checking read_seqcount_retry
 * function.
 *
 * Unlike read_seqcount_begin(), this function will not wait for the count
 * to stabilize. If a writer is active when we begin, we will fail the
 * read_seqcount_retry() instead of stabilizing at the beginning of the
 * critical section.
 */
static inline unsigned raw_seqcount_begin(const seqcount_t *s)
{
	unsigned ret = READ_ONCE(s->sequence);
	smp_rmb();
	return ret & ~1;
}
```

1. `__read_seqcount_begin` 读取 seqcount，只有当为偶数的时候，表明数据已经写完，否则就cpu_relax，继续循环读取

2. `raw_seqcount_begin` 读取当前稳定的seqcount，see the comments.
-------------------------


## include/linux/seqlock.h#latch

一个2版本MVCC结构，comment里面描述的很清楚。不错的实现

1. 相比普通的MVCC，牺牲了读的性能，读取可能失败，必须重试

2. 而general的mvcc，只要某个版本有reader在占用，writer就会写一个全新的，而老版本一直让reader读取。直到老版本所有reader都没有占用了，才去释放

```c
/**
 * raw_write_seqcount_latch - redirect readers to even/odd copy
 * @s: pointer to seqcount_t
 *
 * The latch technique is a multiversion concurrency control method that allows
 * queries during non-atomic modifications. If you can guarantee queries never
 * interrupt the modification -- e.g. the concurrency is strictly between CPUs
 * -- you most likely do not need this.
 *
 * Where the traditional RCU/lockless data structures rely on atomic
 * modifications to ensure queries observe either the old or the new state the
 * latch allows the same for non-atomic updates. The trade-off is doubling the
 * cost of storage; we have to maintain two copies of the entire data
 * structure.
 *
 * Very simply put: we first modify one copy and then the other. This ensures
 * there is always one copy in a stable state, ready to give us an answer.
 *
 * The basic form is a data structure like:
 *
 * struct latch_struct {
 *	seqcount_t		seq;
 *	struct data_struct	data[2];
 * };
 *
 * Where a modification, which is assumed to be externally serialized, does the
 * following:
 *
 * void latch_modify(struct latch_struct *latch, ...)
 * {
 *	smp_wmb();	<- Ensure that the last data[1] update is visible
 *	latch->seq++;
 *	smp_wmb();	<- Ensure that the seqcount update is visible
 *
 *	modify(latch->data[0], ...);
 *
 *	smp_wmb();	<- Ensure that the data[0] update is visible
 *	latch->seq++;
 *	smp_wmb();	<- Ensure that the seqcount update is visible
 *
 *	modify(latch->data[1], ...);
 * }
 *
 * The query will have a form like:
 *
 * struct entry *latch_query(struct latch_struct *latch, ...)
 * {
 *	struct entry *entry;
 *	unsigned seq, idx;
 *
 *	do {
 *		seq = lockless_dereference(latch->seq);
 *
 *		idx = seq & 0x01;
 *		entry = data_query(latch->data[idx], ...);
 *
 *		smp_rmb();
 *	} while (seq != latch->seq);
 *
 *	return entry;
 * }
 *
 * So during the modification, queries are first redirected to data[1]. Then we
 * modify data[0]. When that is complete, we redirect queries back to data[0]
 * and we can modify data[1].
 *
 * NOTE: The non-requirement for atomic modifications does _NOT_ include
 *       the publishing of new entries in the case where data is a dynamic
 *       data structure.
 *
 *       An iteration might start in data[0] and get suspended long enough
 *       to miss an entire modification sequence, once it resumes it might
 *       observe the new entry.
 *
 * NOTE: When data is a dynamic data structure; one should use regular RCU
 *       patterns to manage the lifetimes of the objects within.
 */
```

------------------------------