Skip to content
thesjg edited this page Oct 6, 2010 · 2 revisions

kevent locking strategy.

the data structures:

TAILQ_HEAD(kqlist, knote);
SLIST_HEAD(klist, knote);
struct kqueue {
       struct kqlist   kq_knpend;
       struct kqlist   kq_knlist;
       int             kq_count;               /* number of pending events */
       struct          sigio *kq_sigio;
       struct          selinfo kq_sel; 
       struct          filedesc *kq_fdp;
       int             kq_state;
       u_long          kq_knhashmask;          /* size of knhash */
       struct          klist *kq_knhash;       /* hash table for knotes */
};
struct knote {
        SLIST_ENTRY(knote)      kn_link;        /* for fd */
        TAILQ_ENTRY(knote)      kn_kqlink;      /* for kq_knlist */
        SLIST_ENTRY(knote)      kn_selnext;     /* for struct selinfo */
        TAILQ_ENTRY(knote)      kn_tqe;         /* for kq_head */
        struct                  kqueue *kn_kq;  /* which queue we are on */
        struct                  kevent kn_kevent;
        int                     kn_status;
        int                     kn_sfflags;     /* saved filter flags */
        intptr_t                kn_sdata;       /* saved data field */
        union {
                struct          file *p_fp;     /* file data pointer */
                struct          proc *p_proc;   /* proc pointer */
        } kn_ptr;
        struct                  filterops *kn_fop;
        caddr_t                 kn_hook;
};
struct kevent {
        uintptr_t       ident;          /* identifier for this event */
        short           filter;         /* filter for event */
        u_short         flags;
        u_int           fflags;
        intptr_t        data;
        void            *udata;         /* opaque user data identifier */
};

data structure lifecycle:

The heart of the system is the kqueue structure, these are allocated individually and attached to a lwp in the kernel or accessed via a descriptor from userland. It is possible for multiple threads to be operating on a single kqueue structure concurrently, although this should not be exceptionally common.

The entry point into the kevent subsystem is the kern_kevent function, it takes a number of primary arguments, the kqueue structure to be operated on as well as copyin/copyout callbacks.

kevent structures originate in the copyin function supplied to kern_kevent, of which there are currently three, one each for select, poll and kqueue. kern_kevent retrieves these structures from the copyin function KQ_NEVENTS (currently 8) at a time. After copyin, kern_kevent iterates these struct kevent’s and calls kqueue_register for each structure in turn.

kqueue_register takes two arguments, the kqueue and kevent structures. It first determines the correct dispatch vector based on the filter set in the kevent structure. It then attempts to locate an identical kevent.

  • In the case of a descriptor, it iterates f_klist, a klist within struct file.
  • In the case of a non-descriptor, it attempts to find a knote corresponding to the kevent in the hash located within the kqueue structure.
  • If a corresponding knote cannot be located, a new one is allocated and f_attach is called on it through the dispatch vector.

If a kevent is being ADD’ed, f_event() is called through the dispatch vector, and the knote is activated ala KNOTE_ACTIVATE if f_event() returns boolean truth.

If a kevent is being DELETE’ed, f_detach() is called through the dispatch vector. knote_drop() is subsequently called on the descriptor.

After all registrations have happened, kern_kevent loops around kqueue_scan, which returns events that are in some fashion “ready”.

kqueue_scan iterates the contents of the TAILQ kq_knpend, which is a member of the kqueue structure. Each entry is removed from the TAILQ and operated on in succession. For most events, f_event is called through the dispatch vector to check readiness. Ready events are set to be returned, events which are not ready cause the loop to continue and are not re-added to the TAILQ, kq_count is decremented.

In the event of a ONESHOT event, f_detach will be called through the dispatch vector. Unless some flag stipulates that a ready event should not be re-added to the queue (DISABLED, ONESHOT, CLEAR), it is re-added at the end of the TAILQ. The struct kqueue member kq_count is decremented in the event the knote is not re-added.

Ready events are then passed to the copyout function (the event, kn_kevent, portion of the relevant knotes).

Device drivers that want to provide event notifications maintain their own klist (currently within the selinfo structure). When kqueue_register calls the f_attach function through the dispatch vector, it simply re-calls the device-specific attach routine, passing along the supplied knote. Typically the driver will want to be able to wake up a listener. In this case the driver will add the passed knote to its internal klist.

When a driver is ready to wake a listener, it calls KNOTE on its klist with an optional hint.

KNOTE simply wraps knote(), which iterates the klist (SLIST) and calls KNOTE_ACTIVATE on each knote in succession.

KNOTE_ACTIVATE sets the kn_status member of the knote to ACTIVE, and if the knote is not already queued or disabled, calls knote_enqueue().

knote_enqueue() inserts the knote onto the tail of the kq_knpend TAILQ of the kqueue it is on (a knote will only exist on a single kqueue).


In the driver attach routine the driver may poke around the knote, it is newly allocated and not attached to anything, it will not be modified by another thread.

In the driver detach routine all bets are off, the driver may concurrently be calling KNOTE on its si_list or trying to detach the same knote from another thread or …

The driver can take responsibility for locking si_list, but wrap these in convenience functions and roll a new structure containing a klist pointer and an spl? Call this structure struct kqinfo?

This spl only protects the kn_selnext (to be known as kn_next) pointer within the knote structure.

kevent structures are translated into knote’s, they need nothing

Add a token to struct kqueue responsible for protecting all members of the kqueue structure and all knote’s it holds. Token must be held for the duration of knote creation and teardown.


Issues:

A devices klist will contain knotes from many different kqueue’s, when a device issues a KNOTE it will be grabbing tokens left and right. Maybe this is OK.

KNOTE will need to take the new kqinfo? structure, in order to lock the spl? while iterating. Maybe the list can be RCU’d?

Others?


For now, implement a single global token protecting the entire subsystem. Implement some generic abstractions that could aid in the above strategy, or some other future more fine-grained strategy.

Clone this wiki locally