Skip to content
Ülkü Arslan Aydın edited this page May 11, 2017 · 11 revisions

MAGiC: A Multimodal Framework for Analysing Gaze in Communication

MAGiC focuses on a relatively well-developed subdomain of object recognition: Face recognition. The recognition of faces has been subject to intense research in computer vision due to its potential for practical importance in daily life applications, such as its use in digital camera recordings for security. In social communication research, when eye movements of two participants are recorded, a frame-by-frame analysis of gaze-aversion and gaze-contact behavior is time consuming and error-prone. Similar problems exist in audio recording segmentations. Since manual segmentation of audio recordings into speech and pause components is not efficient and reliable, it might exclude potentially meaningful information. (Goldman-Eisler, 1968; Hieke, Kowal, & O’Connell, 1983). MAGiC is an automated method for eye tracking in dynamic facial scenes by employing automatic face recognition techniques.


Face Tracking Framework

In face tracking, a face in a video-frame is automatically detected first and then it is tracked throughout the stream. In the present study, we extend a face tracking toolkit called OpenFace. It is an open source tool proposed for analyzing facial behaviors. The OpenFace toolkit detects a total of 68 facial landmarks (see Figure-1). Determining the face boundaries based on facial landmarks instead of a rectangle with specific dimensions enables more precise calculations.



Figure-1: On the left side, a video frame that is overlaid with detected facial landmarks is given. A total of 68 landmark positions on a face is presented on the right side.


Speech Segmentation Framework

Speech segmentation is a separation of the audio recordings into units of homogeneous parts such as speech, silence, and laugh. CMUSphinx is integrated with LIUM , an open source toolkit for speaker segmentation and diarization. We extend CMUSphinx Speech Recognition System to perform speech analysis. Figure-2 shows the workflow of speaker diarization.



Figure-2: Classical process for speaker diarization and segmentation(adapted from, [LIUM web-site](http://www-lium.univ-lemans.fr/diarization/doku.php/overview)


Software Architecture

Graphical User Interfaces (GUI) are stored under the View folder and back-end classes are collected under Controller folder. There is a one-to-one relation between GUI and the related controller class. AOI analysis includes OpenFace and dlib executable files with dependent libraries. Similarly, Speech analysis involves executable files of CMUSphinx. All user interfaces are designed programmatically and they all are a member of ParentUI. Home screen implements Navigation Listener interface and ParentUI has a Navigation Listener as a field (see Figure-3)



Figure-3: Software architecture of the MAGiC application


Graphical User Interface

MAGiC is written as a desktop application in C# programming language. As shown in Figure-4, navigation pane is located on the left side and it is separated with working area with a collapsible splitter. There are 4 main modules: Speech Analysis, AOI analysis, Summary and Walkthroughs.



Figure-4: The main panel of MAGiC

There exists easily accessible help page specific to each function describing the step-by-step process to run a function. It briefly states the purpose of use, characterizes the input parameters and it gives a link to access a sample file (see Figure-5).



Figure-5: Walkthorugh Page including buttons for navigating to Walkthrough-tree and to related function-form.

Moreover, tooltips are created for almost all field in interfaces to enhance usability. Before running a function, data validation is performed in order to ensure correctness and consistency of the data. If validation fails, user-friendly error messages are displayed near to non-validated fields. Similarly, success status is displayed to inform a user (see Figure-6 and Figure-7).



Figure-6: A tooltip is diplayed when the mouse hovers over a field.

Figure-7: A red exclamation located next to the field is displayed, if validation fails. Error message is presented when the mouse hovers over a exclamation.