Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 1.44 KB

README.md

File metadata and controls

16 lines (11 loc) · 1.44 KB

Stroke sequence dataset

In our approach, we use sketches from TU-Berlin dataset. Each sketch is represented as a sequence of strokes. In the WordGuess-160 dataset, the stroke sequences are paired with corresponding guesswords. As a preprocessing step, each stroke sequence image is morphologically dilated ('thickened'). The dataset of thickened stroke sequence can be accessed at the following link as a .tar.gz archive. The archive contains two directories sketches_png_css_thickened and sketches_png_css_thickened-sym. The files should be accessed from the latter directory (i.e. sketches_png_css_thickened-sym).

Auxiliary data for Sketchguess recurrent model

  • To test the Sketchguess recurrent model,

    • Download this compressed file and unzip it to
      sketchguess/data/.

      • File info: w2v_data.zip
      • Contents : all_w2v.mat -- word embeddings stored per row all_nouns.txt -- list of nouns corresponding to rows in all_w2v.mat
    • Download this python pickled file that
      contains CNN features that are input to the Sketchguess recurrent model. This file should be downloaded to sketchguess/data/