Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Functionalities and Options
Zenvisage offers several querying modalities for expressing visual queries:
Sketching: Sketching a desired pattern as a query, and searching for similar visualizations.
Drag-and-drop: Dragging and dropping a visualization from the result or overview panel, and searching for similar visualizations.
Pattern Loading: Loading the raw x,y data points depicting a pattern or trend, and searching for matching visualizations.
Input Equation: This function allows you to specify a query pattern using an equation, e.g. y=sin(0.5*x).
Option for Filtering and Grouping
Filter Constraint: Applies filtering on the subset of data (defined by Z, X and Y attributes) used for generating visualizations. All valid operators that could be in a SQL WHERE clause and all data columns (including non-visualized attributes) can be used to specify filter constraints. Other examples include:
- Selecting particular objects of interest (e.g. gene="Esrrb")
- Specifying multiple conditions (e.g. mag>20 AND radius <0.5)
Dynamic classes: Using this option, users can dynamically create new classes (in other words, new attribues), representing subsets of data formed by applying constraints on one or more existing attributes. These classes can be used as Z axes for generating visualizations. More details on how to create and use dynamic classes in zenvisage can be found here.
Options for Matching Visualizations
Similarity: This option allows you to select a distance metric or measure for matching visualizations.
- Euclidean (default): Computes point-to-point distance between two visualizations using L2-norm.
- Segmentation: Smooths the visualization using one or more disjoint line-segments, and calculates the distance between the line-segments across the two visualizations. It is more robust to noise.
DTW: Allows many-to-one point comparisons across the two visualizations; in other words, it aligns the points across two trends by warping the time axis iteratively until an optimal match (depicted in the figure below). It is good for matching two visualizations by their overall shape, instead of doing point-to-point comparisons along the x-axis.
- MVIP: Calculates the distance between only the visually important points (e.g., maximas and minimas) in the two visualizations. Visually important points represent the global or coarse trends of two visualizations. MVIP, like Segmentation, is more robust to noise as compared as compared to Euclidean, as well as helps in matching visualizations by their overall shape like DTW. However, MVIP is much faster than both Segmentation and DTW.
In addition to selecting one of distance measures, users can also use Consider x-range to tell whether they want to consider x-range while matching.
Consider x-range: Queries, by default, find patterns that happen at similar x ranges. When turned off, the location of the pattern along the x-axis will not affect pattern search. For example, you might be interested in looking for peaks in your time series, independent of whether the peak occurred in September or March.
Options for Data smoothing
Noise within data often adversely affect the pattern matching. In order to address this, zenvisage supports various smoothing options to capture important features in the trend while reducing the noise. The basic idea behind smoothing is to approximate or modify each value based on their neighbouring values within a small window.
Users can select one of the supported smoothing methods from the drop-down menu, as well as choose the right granularity or amount of smoothing using the “smoothing constant” slider. We, now, briefly describe the supported smoothing methods:
- Moving Average: Takes an equally-weighted average of neighbouring points that lie in a specified window. The smoothing constant determines the size of this boxed window.
- Exponential: Assigns exponentially decreasing weights to points further away from the point being smoothed. The decrease in weights depends on the smoothing parameter.
- Gaussian: Applies Gaussian convolution over the pattern, the width of the Gaussian is specified by the smoothing constant.
- Loess: Locally fits a line using regression over small subsets of data. More specifically, for every point, Loess looks at a small percentage of its neighbouring points, fits a line over them, and then predicts the smoothed value for point under consideration using the slope and intercept of the fitted line. This tends to work well on densely sampled dataset, but can take a while to run.
Options for Display
The following are display settings for the displayed visualizations:
- Show original sketch: Shows the original query pattern as a pale green line on top of the visualization results, checked by default.
- Show scatterplot: Showing the raw data points as a scatterplot.
Reverse y-axis: Flipping the y-axis of all visualizations.
Here is an example showing what these plotting changes look like:
Export to CSV or PNG
You can export each visualization as an image by double-clicking on it. We also support exporting the query result, representative and outlier patterns as csv files through the ⤓ button. Zenvisage offer the following download options :
- Choice of the number of results to download.
- Enforcing a minimum similarity threshold of downloaded output.
- Option to only export the y data. If unchecked, two data files would be download corresponding to the x and y-axis data. This option should be unchecked if your data has the same x-axis across all visualizations.
- Option to include the x and y values of the query pattern as the first row in the downloaded data for future reference.
- Option to download all data and their similarity scores. Clustering results download all data by default. For the query and outlier results, the csv contains the Category axis in the first column, the similarity score in the second column, and the remaining column is populated with the x or y values, with filenames as .csv. For the cluster downloads, all data are downloaded with the first column as the cluster label (i.e. whether this visualization belongs to cluster #1 or cluster #5), with filename representative_.csv.
Other System Settings
- Aggregate: The aggregation function describes how data is aggregated along the y-axis. We offer sum, average, and a None option for no aggregation (raw data points) along the y-axis.
- Cluster Size: KMeans clustering is used for identifying representative trends of the whole dataset. This option specifies the number of clusters to be displayed in the representative trends
- Number of Results: Specifies the number of results to be displayed in the query results panel.
- Similarity Cutoff: Specifies the minimum similarity score for a visualization to be displayed in the query results panel.