Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
vadimkantorov committed Jul 18, 2014
1 parent 9530b7d commit 7b7593b
Show file tree
Hide file tree
Showing 29 changed files with 3,392 additions and 3 deletions.
116 changes: 113 additions & 3 deletions README.md
@@ -1,4 +1,114 @@
cvpr2014
========
Information & Contact
=====================

Code for "Efficient feature extraction, aggregation and classification for action recognition" (Kantorov, Laptev, CVPR'14)
This code was used to compute the results of the following paper:

>"Efficient feature extraction, encoding and classification for action recognition",
Vadim Kantorov, Ivan Laptev,
In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014

If you use this code, please cite our work:

> @inproceedings{kantorov2014,
      author = {Kantorov, V. and Laptev, I.},
      title = {Efficient feature extraction, encoding and classification for action recognition},
      booktitle = {Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014},
      year = {2014}
}

For any question or bug report, please contact Vadim Kantorov at vadim.kantorov@inria.fr or vadim.kantorov@gmail.com


Description and usage
=====================

We release two tools in this repository. The first tool **motion_descriptors** is a motion feature extractor based on motion vectors from video compression information. The second is a fast Fisher vector computation tool **fv_fast** that uses vector SSE2 CPU instructions.

### motion_descriptors

The tool accepts a video file path as input and writes descriptors to standard output.
##### Command-line options:

Option | Default | Description
--- | --- | ---
-i video.avi | | specifies the path to the input video
--hog yes/no | **yes** | enables/disables HOG descriptor computation
--hof yes/no | **yes** | enables/disables HOF descriptor computation
--mbh yes/no | **yes** | enables/disables MBH descriptor computation
-f 1-10 | whole video | restricts descriptor computation to the given frame range

The output format:
The first two lines of the standard output are comments explaining the format):
> #descr = hog(96) hof(108) mbh(96 + 96)
#x y pts StartPTS EndPTS Xoffset Yoffset PatchWidth PatchHeight descr


+ **x** and **y** are the normalized frame coordinates of the spatio-temporal (s-t) patch
+ **pts** is the frame number of the s-t patch center
+ **StartPTS** and **EndPTS** are the frame numbers of the first and last frames of the s-t patch
+ **Xoffset** and **Yoffset** are the non-normalized frame coordinates of the s-t patch
+ **PatchWidth** and **PatchHeight** are the non-normalized width and height of teh s-t patch
+ **descr** is the array of floats of concatenated descriptors. The size of this array depends on the enabled descriptor types. All values are from zero to one. The first comment line describes the enabled descriptor types, their order in the array, and the dimension of each descriptor in the array.

After the comments every line corresponds to an extracted descriptor of a patch. All numbers in the output are floating point in text format and are separated by tabs.
The standard error contains various debug / diagnostic messages like time measurements and parameters in effect.

##### Examples:
- Compute HOG, HOF, MBH and save the descriptors in descriptors.txt:
> $ ./motion_descriptors -i video.avi > descriptors.txt
- Compute only HOF and MBH from the first 500 frames and save the descriptors in descriptors.txt:
> $ ./motion_descriptors -i video.avi -hog no -hof yes -mbh yes -f 1-500 > descriptors.txt
### fv_fast
The tool accepts descriptors on the standard input and writes Fisher vector (FV) to the standard output or a specified HDF5 file.
##### Command-line options:

Option | Default | Description
--- | --- | ---
--xnpos 0 | | specifies the column with **x** coordinate of the s-t patch in the descriptor array
--xntot 1.0 | 1.0 | specifies the frame width. If the **x** coordinate is non-normalized, this option is mandatory
--ynpos 1 | | specifies the column with **y** coordinate of the s-t patch in the descriptor array
--yntot 1.0 | 1.0 | specifies the frame width. If the **y** coordinate is non-normalized, this option is mandatory
--tnpos 2 | | specifies the column with **t** coordinate of the s-t patch in the descriptor array
--tntot 1.0 | 1.0 | specifies the frame width. If the **t** coordinate is non-normalized, this option is mandatory
-o out.h5 | | specifies the output HDF5 file
--gmm_k 256 | 256 | specifies the number of GMM components used for FV computation
--knn 5 | 5 | FV parts corresponding to these many closest GMM centroids will be updated during processing of every input descriptor
--vocab 9-104 hog_K256.vocab | | specifies descriptor type location and path to GMM vocabs. This option is mandatory, and several options of this kind are allowed.
--grid 1x3x2x | | specifies the layout of the s-t grid (**x** cells times **y** cells times **t** cells). This option is mandatory, and several options of this kind are allowed.
--buildGmmIndex | | this option will have the GMM vocabs computed and saved to the specified path. No Fisher vector will be computed


##### Examples:
- Build GMM vocabulary:
> $ cat descriptors.txt | ./fv_fast --buildGmmIndex
- Compute Fisher vector:
> $ cat descriptors.txt | ./fv_fast
Building from source
====================

### Linux
Make sure you have the dependencies installed and visible to the CC compiler (normally gcc). If the dependencies are installed to a custom path, you may want to adjust CPATH and LIBRARY_PATH environment variables. Then navigate to the correspoding directory in **src** and type:
> $ make
The binaries will be placed in the **build** sub-directory.

Dependencies for **motion_descriptors**:
- opencv (http://opencv.org)
- ffmpeg (http://ffmpeg.org)

Dependencies for **fv_fast**:
- opencv (http://opencv.org)
- yael (http://gforge.inria.fr/projects/yael/) [optional, needed for computing the GMM vocab]
- hdf5 (http://www.hdfgroup.org/HDF5/) [optional, needed for saving the output to an HDF5 file]

The yael and hdf5 dependencies are optional (though enabled by default), you can switch them off by using:
> $ make WITH_HDF5=OFF WITH_YAEL=OFF
### Windows
You have to define %OPENCV_DIR%, %FFMPEG_DIR% and %HDF5_DIR% environment variables. You can switch off HDF5 in config.h. YAEL and computing GMM vocabs is not supported on Windows. You can either generate your vocabs on Linux or use some other GMM code to compute them. You will also need to have a modern Visual Studio (or Visual C++ Express ). Then navigate to the corresponding directory in **src** and open VS.vcxproj.

The binaries will be placed in the **build** sub-directory.
Empty file added bin/fake_bin
Empty file.
Binary file added cvpr2014_kantorov_paper.pdf
Binary file not shown.
Binary file added cvpr2014_kantorov_poster.pdf
Binary file not shown.
Binary file added cvpr2014_kantorov_poster.pptx
Binary file not shown.
36 changes: 36 additions & 0 deletions src/Commons/io_utils.h
@@ -0,0 +1,36 @@
#include <string>
#include <fstream>
#include <stdexcept>

using namespace std;

#ifndef __IO_UTILS_H__
#define __IO_UTILS_H__

bool FileExists(string file)
{
ifstream f(file.c_str());
return f.good();
}

void AssertFileExists(string file, string comment = "")
{
if(!FileExists(file))
throw std::runtime_error("File (" + comment + ") doesn't exist: '" + file +"'");
}

static const char * yes = "yes";
static const char * no = "no";

const char* yesno(bool b)
{
return b ? yes : no;
}

string GetFileExtension(string filePath)
{
int dotPos = filePath.find_last_of(".");
return filePath.substr(dotPos);
}

#endif
70 changes: 70 additions & 0 deletions src/Commons/log.h
@@ -0,0 +1,70 @@
#include <cstdio>
#include <cstdlib>
#include <opencv/cv.h>

#ifndef __LOG_H__
#define __LOG_H__

static bool log_enabled = true;

void log_enable()
{
log_enabled = true;
}

void log_disable()
{
log_enabled = false;
}

void log(FILE* out, const char* fmt, ...)
{
if(log_enabled)
{
va_list argp;
va_start(argp, fmt);
vfprintf(out, fmt, argp);
va_end(argp);
fprintf(out, "\n");
fflush(out);
}
}

void log(const char* fmt, ...)
{
FILE* out = stderr;
if(log_enabled)
{
va_list argp;
va_start(argp, fmt);
vfprintf(out, fmt, argp);
va_end(argp);
fprintf(out, "\n");
fflush(out);
}
}

void logmat(cv::Mat& m, const char* name = "")
{
int type = m.type();
uchar depth = type & CV_MAT_DEPTH_MASK;
uchar chans = 1 + (type >> CV_CN_SHIFT);
string r = "CV_";
switch ( depth ) {
case CV_8U: r += "8U"; break;
case CV_8S: r += "8S"; break;
case CV_16U: r += "16U"; break;
case CV_16S: r += "16S"; break;
case CV_32S: r += "32S"; break;
case CV_32F: r += "32F"; break;
case CV_64F: r += "64F"; break;
default: r += "User"; break;
}

r += "C";
r += (chans+'0');

log("%s: %dx%d %s", strlen(name) == 0 ? "<no name>" : name, m.rows, m.cols, r.c_str());
}

#endif
118 changes: 118 additions & 0 deletions src/Commons/motion_vector_file_utils.h
@@ -0,0 +1,118 @@
#include <string>
#include <cstring>
#include <iostream>
#include <iomanip>
#include <sstream>
#include <fstream>
#include <vector>

using namespace std;

#ifndef __READ_MOTION_VECTOR_FILE_H__
#define __READ_MOTION_VECTOR_FILE_H__

const int NO_MV = -10000;

struct MotionVector
{
int X,Y;
float Dx,Dy;

int Mx, My;
char TypeCode, SegmCode;

bool NoMotionVector()
{
return (Dx == NO_MV && Dy == NO_MV) || (Dx == -NO_MV && Dy == -NO_MV);
}

bool IsIntra()
{
return TypeCode == 'P' || TypeCode == 'A' || TypeCode == 'i' || TypeCode == 'I';
}
};

typedef pair<int, vector<MotionVector> > FlowPoints;

struct MotionVectorFileWriter
{
FILE* motionVectorsFile;

MotionVectorFileWriter(string path = "")
{
if(path != "")
Open(path);
}

void Open(string motionVectorsPath)
{
motionVectorsFile = fopen(motionVectorsPath.c_str(), "w");
fputs("FrameIndex\tX\tY\tDx\tDy\tMacroBlockTopLeftCornerX\tMacroBlockTopLeftCornerY\tTypeSegm\n", motionVectorsFile);
}

void Write(int index, int sx, int sy, double dx, double dy, int mx = -1, int my = -1, char typeCode = '_', char segmCode = '_')
{
fprintf(motionVectorsFile, "%d\t%d\t%d\t%.2f\t%.2f\t%d\t%d\t%c%c\n", index, sx, sy, dx, dy, mx, my, typeCode, segmCode);
}

~MotionVectorFileWriter()
{
fclose(motionVectorsFile);
}
};

struct MotionVectorFileReader2
{
FILE* motionVectorsFile;
char curLine[200];

void Open(string motionVectorsPath)
{
motionVectorsFile = fopen(motionVectorsPath.c_str(), "r");
fgets(curLine,100,motionVectorsFile);
fgets(curLine,100,motionVectorsFile);
}

MotionVectorFileReader2(string motionVectorsPath = "")
{
motionVectorsFile = NULL;
if(motionVectorsPath != "")
Open(motionVectorsPath);
}

FlowPoints ReadFlowPoints()
{
vector<MotionVector> pts;
if(strcmp(curLine,"") == 0)
return make_pair(-1, pts);

int curFrameIndex, frameIndex;
MotionVector mv;

sscanf(curLine, "%d %d %d %f %f %d %d %c%c", &curFrameIndex, &mv.X, &mv.Y, &mv.Dx, &mv.Dy, &mv.Mx, &mv.My, &mv.TypeCode, &mv.SegmCode);
pts.push_back(mv);
while(true)
{
strcpy(curLine, "");
if(fgets(curLine, 100, motionVectorsFile) == NULL)
break;
if(strcmp(curLine, "") == 0)
break;
sscanf(curLine, "%d %d %d %f %f %d %d %c%c", &frameIndex, &mv.X, &mv.Y, &mv.Dx, &mv.Dy, &mv.Mx, &mv.My, &mv.TypeCode, &mv.SegmCode);
if(frameIndex != curFrameIndex)
break;
pts.push_back(mv);
}

return make_pair(curFrameIndex, pts);
}

~MotionVectorFileReader2()
{
if(motionVectorsFile != NULL)
fclose(motionVectorsFile);
}
};


#endif

0 comments on commit 7b7593b

Please sign in to comment.