Skip to content
Browse files

Checking LIBSVM Plus 2.9 changes

  • Loading branch information...
1 parent 037b7de commit bb1f5a11a43abc90f609f91be03f686d9cefd84d @vincenzo committed
View
19 Makefile.am
@@ -0,0 +1,19 @@
+AM_CPPFLAGS = -Wconversion
+LIBSVM_SHARED_LIB_VERSION = @LIBSVM_SHARED_LIB_VERSION@
+
+bin_PROGRAMS = svm-predict svm-train svm-scale
+
+svm_predict_SOURCES = svm-predict.c
+svm_train_SOURCES = svm-train.c
+svm_scale_SOURCES = svm-scale.c
+
+svm_predict_LDADD = svm.$(OBJEXT) -lstdc++
+svm_train_LDADD = svm.$(OBJEXT) -lstdc++
+svm_scale_LDADD = svm.$(OBJEXT) -lstdc++
+
+lib_LTLIBRARIES = libsvm.la
+libsvm_la_SOURCES = svm.cpp
+libsvm_la_LDFLAGS = -version-info $(LIBSVM_SHARED_LIB_VERSION)
+
+libsvm_include_HEADERS = svm.h
+libsvm_includedir=$(includedir)
View
88 README.plus
@@ -0,0 +1,88 @@
+Libsvm Plus is a straightforward improvement of the official
+Libsvm library (http://www.csie.ntu.edu.tw/~cjlin/libsvm).
+
+Author: Vincenzo Russo (http://neminis.org)
+Download: http://neminis.org/software/libsvm-plus
+Version: 2.90
+
+What are the differences?
+=========================
+
+1. Only C++ code supported and mantained, due to the lack of enough time.
+ No Java code provided. Other language interfaces (like Python, etc.)
+ should be work but only providing the original features of LIBSVM.
+ Anyway, no tests were made.
+
+2. Only Unix: for the same reason stated above, I only test on Linux and
+ Mac OS X, which makes LIBSVM Plus likely to work also on other modern Unix
+ systems. Anyway, you could try to use the Makefile.win included in official
+ LIBSVM package to compile and test LIBSVM Plus on Windows platforms;
+
+3. Four additional kernels: Stump, Perceptron, Laplacian, Exponential.
+ Such kernels might be called "infinite ensemble kernels" because a nonlinear
+ SVM which uses them corresponds to a infinite ensemble classifier.
+ Look at the publications of Hsuan-Tien Lin for more theoretical explanations:
+
+ http://www.work.caltech.edu/~htlin/publication/
+
+ The code for realizing the above kernels was back-ported from his LIBSVM fork
+
+ http://www.work.caltech.edu/~htlin/program/libsvm/#infensemble
+
+ based on the older 2.8 version;
+
+4. Three additional SVM models: Classification (C-SVM) via L2SVM,
+ Support Vector Domain Description (SVDD) via L1SVM and via L2SVM.
+ The code was back-ported from a LIBSVM tool:
+
+ Calculating the radius of the smallest sphere containing all training data.
+ http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#18
+
+ The SVDD can be used as One Class SVM alternative. More theoretical explanations
+ about SVDD can be found in David J. Tax PhD thesis and other papers
+
+ http://www-ict.ewi.tudelft.nl/~davidt/papers.html
+
+
+Minor changes
+=============
+
+Some additional comments to the source code are provided and some C structures
+(svm_model and decision_function) were moved from the svm.cpp to svm.h to allow
+third part softwares to access them in a easier way. Moreover, the svm_model
+structure now provides three new members: SV_idx (indices of the SVs in the
+original dataset), BSV_idx (indices of the BSVs in the original dataset) and
+lbsv (the number of BSVs). Finally, the enumeration element RBF (which in the
+original LIBSVM refers to the Gaussian kernel) was renamed GAUSSIAN, because
+there are several kernels which belong to the RBF class, not only the Gaussian one.
+
+Windows
+=======
+
+(by Vladislavs Dovgalecs, Universite Bordeaux I, FRANCE)
+
+To compile for Windows, you will need :
+
+1. Microsoft Visual Studio (worked with Visual C++ 2008 Express)
+2. nmake.exe, link.exe and cl.exe are usually found in VC bin directory
+3. Make sure your PATH variable reflects the location of Visual Studio bin directory
+(cl.exe, nmake.exe and link.exe)
+
+Compilation is done following few steps :
+1. Open the console and run VCVARS32.bat. This will setup VC global variables.
+2. Run 'nmake -f Makefile.win'
+
+
+License
+=======
+
+For this first release of the LIBSVM Plus we choose to use the same licensing
+of the original LIBSVM library.
+
+
+Version number
+==============
+
+As long as LIBSVM Plus will be a straightforwardly augmented version of the official
+LIBSVM, it will have the same version number of the original LIBSVM code used for
+making the release.
View
1,578 autogen.sh
1,578 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
View
28 configure.ac
@@ -0,0 +1,28 @@
+# -*- Autoconf -*-
+# Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.59)
+AC_INIT(LIBSVM-Plus, 2.90, nemo@neminis.org)
+AM_INIT_AUTOMAKE([dist-bzip2 foreign])
+AC_PROG_LIBTOOL
+AC_CONFIG_FILES([Makefile])
+AC_CONFIG_SRCDIR([svm.cpp])
+
+AC_PROG_CXXCPP
+AC_PROG_CXX
+AC_PROG_CPP
+
+AC_LANG([C++])
+
+AC_HEADER_STDC
+AC_CHECK_HEADERS([float.h math.h stdio.h ctype.h stdlib.h string.h errno.h])
+AC_FUNC_MALLOC
+
+AC_SUBST([AM_CXXFLAGS])
+AC_SUBST([AM_LDFLAGS])
+
+LIBSVM_SHARED_LIB_VERSION="1:0:0"
+AC_SUBST(LIBSVM_SHARED_LIB_VERSION)
+
+AC_OUTPUT
+
View
47 msvc/msvc.sln
@@ -0,0 +1,47 @@
+
+Microsoft Visual Studio Solution File, Format Version 10.00
+# Visual C++ Express 2008
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "LIBSVM-Plus", "msvc.vcproj", "{399AA6EA-189D-4B80-BB6C-506FD5FF46E9}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "svm-predict", "svm-predict\svm-predict.vcproj", "{6FB3C9B1-652B-465A-A0AE-C69FBAA23358}"
+ ProjectSection(ProjectDependencies) = postProject
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9} = {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}
+ EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "svm-scale", "svm-scale\svm-scale.vcproj", "{698E030B-4BCA-4F3E-AEC4-CF486534ED9E}"
+ ProjectSection(ProjectDependencies) = postProject
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9} = {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}
+ EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "svm-train", "svm-train\svm-train.vcproj", "{57B3C50D-A021-49B7-85A2-CA648EBB1487}"
+ ProjectSection(ProjectDependencies) = postProject
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9} = {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}
+ EndProjectSection
+EndProject
+Global
+ GlobalSection(SolutionConfigurationPlatforms) = preSolution
+ Debug|Win32 = Debug|Win32
+ Release|Win32 = Release|Win32
+ EndGlobalSection
+ GlobalSection(ProjectConfigurationPlatforms) = postSolution
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}.Debug|Win32.ActiveCfg = Debug|Win32
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}.Debug|Win32.Build.0 = Debug|Win32
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}.Release|Win32.ActiveCfg = Release|Win32
+ {399AA6EA-189D-4B80-BB6C-506FD5FF46E9}.Release|Win32.Build.0 = Release|Win32
+ {6FB3C9B1-652B-465A-A0AE-C69FBAA23358}.Debug|Win32.ActiveCfg = Debug|Win32
+ {6FB3C9B1-652B-465A-A0AE-C69FBAA23358}.Debug|Win32.Build.0 = Debug|Win32
+ {6FB3C9B1-652B-465A-A0AE-C69FBAA23358}.Release|Win32.ActiveCfg = Release|Win32
+ {6FB3C9B1-652B-465A-A0AE-C69FBAA23358}.Release|Win32.Build.0 = Release|Win32
+ {698E030B-4BCA-4F3E-AEC4-CF486534ED9E}.Debug|Win32.ActiveCfg = Debug|Win32
+ {698E030B-4BCA-4F3E-AEC4-CF486534ED9E}.Debug|Win32.Build.0 = Debug|Win32
+ {698E030B-4BCA-4F3E-AEC4-CF486534ED9E}.Release|Win32.ActiveCfg = Release|Win32
+ {698E030B-4BCA-4F3E-AEC4-CF486534ED9E}.Release|Win32.Build.0 = Release|Win32
+ {57B3C50D-A021-49B7-85A2-CA648EBB1487}.Debug|Win32.ActiveCfg = Debug|Win32
+ {57B3C50D-A021-49B7-85A2-CA648EBB1487}.Debug|Win32.Build.0 = Debug|Win32
+ {57B3C50D-A021-49B7-85A2-CA648EBB1487}.Release|Win32.ActiveCfg = Release|Win32
+ {57B3C50D-A021-49B7-85A2-CA648EBB1487}.Release|Win32.Build.0 = Release|Win32
+ EndGlobalSection
+ GlobalSection(SolutionProperties) = preSolution
+ HideSolutionNode = FALSE
+ EndGlobalSection
+EndGlobal
View
BIN msvc/msvc.suo
Binary file not shown.
View
190 msvc/msvc.vcproj
@@ -0,0 +1,190 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="9.00"
+ Name="LIBSVM-Plus"
+ ProjectGUID="{399AA6EA-189D-4B80-BB6C-506FD5FF46E9}"
+ RootNamespace="msvc"
+ TargetFrameworkVersion="196613"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="2"
+ CharacterSet="2"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ WarningLevel="3"
+ DebugInformationFormat="4"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\win32\lib\LIBSVM-Plus.dll"
+ GenerateDebugInformation="true"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="2"
+ CharacterSet="2"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="2"
+ EnableIntrinsicFunctions="true"
+ RuntimeLibrary="2"
+ EnableFunctionLevelLinking="true"
+ WarningLevel="3"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\win32\lib\LIBSVM-Plus.dll"
+ GenerateDebugInformation="true"
+ OptimizeReferences="2"
+ EnableCOMDATFolding="2"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ <File
+ RelativePath="..\svm.cpp"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Header Files"
+ Filter="h;hpp;hxx;hm;inl;inc;xsd"
+ UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+ >
+ <File
+ RelativePath="..\svm.h"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
View
195 msvc/svm-predict/svm-predict.vcproj
@@ -0,0 +1,195 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="9.00"
+ Name="svm-predict"
+ ProjectGUID="{6FB3C9B1-652B-465A-A0AE-C69FBAA23358}"
+ RootNamespace="svmpredict"
+ Keyword="Win32Proj"
+ TargetFrameworkVersion="196613"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="4"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-predict.exe"
+ LinkIncremental="2"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="2"
+ EnableIntrinsicFunctions="true"
+ PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE"
+ RuntimeLibrary="2"
+ EnableFunctionLevelLinking="true"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-predict.exe"
+ LinkIncremental="1"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ OptimizeReferences="2"
+ EnableCOMDATFolding="2"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ <File
+ RelativePath="..\..\svm-predict.c"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Header Files"
+ Filter="h;hpp;hxx;hm;inl;inc;xsd"
+ UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+ >
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
View
195 msvc/svm-scale/svm-scale.vcproj
@@ -0,0 +1,195 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="9.00"
+ Name="svm-scale"
+ ProjectGUID="{698E030B-4BCA-4F3E-AEC4-CF486534ED9E}"
+ RootNamespace="svmscale"
+ Keyword="Win32Proj"
+ TargetFrameworkVersion="196613"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="4"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-scale.exe"
+ LinkIncremental="2"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="2"
+ EnableIntrinsicFunctions="true"
+ PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE"
+ RuntimeLibrary="2"
+ EnableFunctionLevelLinking="true"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-scale.exe"
+ LinkIncremental="1"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ OptimizeReferences="2"
+ EnableCOMDATFolding="2"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ <File
+ RelativePath="..\..\svm-scale.c"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Header Files"
+ Filter="h;hpp;hxx;hm;inl;inc;xsd"
+ UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+ >
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
View
195 msvc/svm-train/svm-train.vcproj
@@ -0,0 +1,195 @@
+<?xml version="1.0" encoding="Windows-1252"?>
+<VisualStudioProject
+ ProjectType="Visual C++"
+ Version="9.00"
+ Name="svm-train"
+ ProjectGUID="{57B3C50D-A021-49B7-85A2-CA648EBB1487}"
+ RootNamespace="svmtrain"
+ Keyword="Win32Proj"
+ TargetFrameworkVersion="196613"
+ >
+ <Platforms>
+ <Platform
+ Name="Win32"
+ />
+ </Platforms>
+ <ToolFiles>
+ </ToolFiles>
+ <Configurations>
+ <Configuration
+ Name="Debug|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="0"
+ PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
+ MinimalRebuild="true"
+ BasicRuntimeChecks="3"
+ RuntimeLibrary="3"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="4"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-train.exe"
+ LinkIncremental="2"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ <Configuration
+ Name="Release|Win32"
+ OutputDirectory="$(SolutionDir)$(ConfigurationName)"
+ IntermediateDirectory="$(ConfigurationName)"
+ ConfigurationType="1"
+ CharacterSet="1"
+ WholeProgramOptimization="1"
+ >
+ <Tool
+ Name="VCPreBuildEventTool"
+ />
+ <Tool
+ Name="VCCustomBuildTool"
+ />
+ <Tool
+ Name="VCXMLDataGeneratorTool"
+ />
+ <Tool
+ Name="VCWebServiceProxyGeneratorTool"
+ />
+ <Tool
+ Name="VCMIDLTool"
+ />
+ <Tool
+ Name="VCCLCompilerTool"
+ Optimization="2"
+ EnableIntrinsicFunctions="true"
+ PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE"
+ RuntimeLibrary="2"
+ EnableFunctionLevelLinking="true"
+ UsePrecompiledHeader="0"
+ WarningLevel="3"
+ DebugInformationFormat="3"
+ />
+ <Tool
+ Name="VCManagedResourceCompilerTool"
+ />
+ <Tool
+ Name="VCResourceCompilerTool"
+ />
+ <Tool
+ Name="VCPreLinkEventTool"
+ />
+ <Tool
+ Name="VCLinkerTool"
+ OutputFile="..\..\win32\bin\svm-train.exe"
+ LinkIncremental="1"
+ GenerateDebugInformation="true"
+ SubSystem="1"
+ OptimizeReferences="2"
+ EnableCOMDATFolding="2"
+ TargetMachine="1"
+ />
+ <Tool
+ Name="VCALinkTool"
+ />
+ <Tool
+ Name="VCManifestTool"
+ />
+ <Tool
+ Name="VCXDCMakeTool"
+ />
+ <Tool
+ Name="VCBscMakeTool"
+ />
+ <Tool
+ Name="VCFxCopTool"
+ />
+ <Tool
+ Name="VCAppVerifierTool"
+ />
+ <Tool
+ Name="VCPostBuildEventTool"
+ />
+ </Configuration>
+ </Configurations>
+ <References>
+ </References>
+ <Files>
+ <Filter
+ Name="Source Files"
+ Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+ UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+ >
+ <File
+ RelativePath="..\..\svm-train.c"
+ >
+ </File>
+ </Filter>
+ <Filter
+ Name="Header Files"
+ Filter="h;hpp;hxx;hm;inl;inc;xsd"
+ UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+ >
+ </Filter>
+ <Filter
+ Name="Resource Files"
+ Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+ UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+ >
+ </Filter>
+ </Files>
+ <Globals>
+ </Globals>
+</VisualStudioProject>
View
147 py-tools/README
@@ -0,0 +1,147 @@
+This directory includes some useful codes:
+
+1. subset selection tools.
+2. parameter selection tools.
+3. LIBSVM format checking tools
+
+Part I: Subset selection tools
+
+Introduction
+============
+
+Training large data is time consuming. Sometimes one should work on a
+smaller subset first. The python script subset.py randomly selects a
+specified number of samples. For classification data, we provide a
+stratified selection to ensure the same class distribution in the
+subset.
+
+Usage: subset.py [options] dataset number [output1] [output2]
+
+This script selects a subset of the given data set.
+
+options:
+-s method : method of selection (default 0)
+ 0 -- stratified selection (classification only)
+ 1 -- random selection
+
+output1 : the subset (optional)
+output2 : the rest of data (optional)
+
+If output1 is omitted, the subset will be printed on the screen.
+
+Example
+=======
+
+> python subset.py heart_scale 100 file1 file2
+
+From heart_scale 100 samples are randomly selected and stored in
+file1. All remaining instances are stored in file2.
+
+
+Part II: Parameter Selection Tools
+
+Introduction
+============
+
+grid.py is a parameter selection tool for C-SVM classification using
+the RBF (radial basis function) kernel. It uses cross validation (CV)
+technique to estimate the accuracy of each parameter combination in
+the specified range and helps you to decide the best parameters for
+your problem.
+
+grid.py directly executes libsvm binaries (so no python binding is needed)
+for cross validation and then draw contour of CV accuracy using gnuplot.
+You must have libsvm and gnuplot installed before using it. The package
+gnuplot is available at http://www.gnuplot.info/
+
+On Mac OSX, the precompiled gnuplot file needs the library Aquarterm,
+which thus must be installed as well. In addition, this version of
+gnuplot does not support png, so you need to change "set term png
+transparent small" and use other image formats. For example, you may
+have "set term pbm small color".
+
+Usage: grid.py [-log2c begin,end,step] [-log2g begin,end,step] [-v fold]
+ [-svmtrain pathname] [-gnuplot pathname] [-out pathname] [-png pathname]
+ [additional parameters for svm-train] dataset
+
+The program conducts v-fold cross validation using parameter C (and gamma)
+= 2^begin, 2^(begin+step), ..., 2^end.
+
+You can specify where the libsvm executable and gnuplot are using the
+-svmtrain and -gnuplot parameters.
+
+For windows users, please use pgnuplot.exe. If you are using gnuplot
+3.7.1, please upgrade to version 3.7.3 or higher. The version 3.7.1
+has a bug. If you use cygwin on windows, please use gunplot-x11.
+
+Example
+=======
+
+> python grid.py -log2c -5,5,1 -log2g -4,0,1 -v 5 -m 300 heart_scale
+
+Users (in particular MS Windows users) may need to specify the path of
+executable files. You can either change paths in the beginning of
+grid.py or specify them in the command line. For example,
+
+> grid.py -log2c -5,5,1 -svmtrain c:\libsvm\windows\svm-train.exe -gnuplot c:\tmp\gnuplot\bin\pgnuplot.exe -v 10 heart_scale
+
+Output: two files
+dataset.png: the CV accuracy contour plot generated by gnuplot
+dataset.out: the CV accuracy at each (log2(C),log2(gamma))
+
+Parallel grid search
+====================
+
+You can conduct a parallel grid search by dispatching jobs to a
+cluster of computers which share the same file system. First, you add
+machine names in grid.py:
+
+ssh_workers = ["linux1", "linux5", "linux5"]
+
+and then setup your ssh so that the authentication works without
+asking a password.
+
+The same machine (e.g., linux5 here) can be listed more than once if
+it has multiple CPUs or has more RAM. If the local machine is the
+best, you can also enlarge the nr_local_worker. For example:
+
+nr_local_worker = 2
+
+Example:
+
+> python grid.py heart_scale
+[local] -1 -1 78.8889 (best c=0.5, g=0.5, rate=78.8889)
+[linux5] -1 -7 83.3333 (best c=0.5, g=0.0078125, rate=83.3333)
+[linux5] 5 -1 77.037 (best c=0.5, g=0.0078125, rate=83.3333)
+[linux1] 5 -7 83.3333 (best c=0.5, g=0.0078125, rate=83.3333)
+.
+.
+.
+
+If -log2c, -log2g, or -v is not specified, default values are used.
+
+If your system uses telnet instead of ssh, you list the computer names
+in telnet_workers.
+
+Part III: LIBSVM format checking tools
+
+Introduction
+============
+
+`svm-train' conducts only a simple check of the input data. To do a
+detailed check, we provide a python script `checkdata.py.'
+
+Usage: checkdata.py dataset
+
+This tool is written by Rong-En Fan at National Taiwan University.
+
+Example
+=======
+
+> cat bad_data
+1 3:1 2:4
+> python checkdata.py bad_data
+line 1: feature indices must be in an ascending order, previous/current features 3:1 2:4
+Found 1 lines with error.
+
+
View
27 py-tools/README.plus
@@ -0,0 +1,27 @@
+This directory includes some useful codes:
+
+1. subset selection tools.
+2. parameter selection tools.
+3. LIBSVM format checking tools
+4. a converter from sparse LIBSVM file format to a classic "dense" file format
+5. a converter from a classic "dense" file format to the sparse LIBSVM file format
+
+The first three tools are inehrited from the official LIBSVM distribution, with
+differences in tools for the parameters selection:
+
+ a. easy.py now accepts a third parameter (numeric, integer) which represent the
+ kernel to use in the process
+ b. grid.py explicit handle the kernel in input by using the switch "-t <kernel_number>"
+ c. both of the above scripts are not subject to the presence of the gnuplot anymore:
+ the scripts test for gnuplot existence and if it is not installed, they simply
+ do not use it.
+
+The tools 4 and 5 were originally developed by Hsuan-Tien Lin and I get them from
+
+ http://www.work.caltech.edu/~htlin/program/libsvm/#dense
+
+and included as they are.
+
+
+--
+Vincenzo Russo
View
106 py-tools/checkdata.py
@@ -0,0 +1,106 @@
+#!/usr/bin/env python
+
+#
+# A format checker for LIBSVM
+#
+
+#
+# Copyright (c) 2007, Rong-En Fan
+#
+# All rights reserved.
+#
+# This program is distributed under the same license of the LIBSVM package.
+#
+
+from sys import argv, exit
+import os.path
+
+def err(line_no, msg):
+ print "line %d: %s" % (line_no, msg)
+
+# works like float() but does not accept nan and inf
+def my_float(x):
+ if x.lower().find("nan") != -1 or x.lower().find("inf") != -1:
+ raise ValueError
+
+ return float(x)
+
+def main():
+ if len(argv) != 2:
+ print "Usage: %s dataset" % (argv[0])
+ exit(1)
+
+ dataset = argv[1]
+
+ if not os.path.exists(dataset):
+ print "dataset %s not found" % (dataset)
+ exit(1)
+
+ line_no = 1
+ error_line_count = 0
+ for line in open(dataset, 'r'):
+ line_error = False
+
+ # each line must end with a newline character
+ if line[-1] != '\n':
+ err(line_no, "missing a newline character in the end")
+ line_error = True
+
+ nodes = line.split()
+
+ # check label
+ try:
+ label = nodes.pop(0)
+
+ if label.find(',') != -1:
+ # multi-label format
+ try:
+ for l in label.split(','):
+ l = my_float(l)
+ except:
+ err(line_no, "label %s is not a valid multi-label form" % label)
+ line_error = True
+ else:
+ try:
+ label = my_float(label)
+ except:
+ err(line_no, "label %s is not a number" % label)
+ line_error = True
+ except:
+ err(line_no, "missing label, perhaps an empty line?")
+ line_error = True
+
+ # check features
+ prev_index = -1
+ for i in range(len(nodes)):
+ try:
+ (index, value) = nodes[i].split(':')
+
+ index = int(index)
+ value = my_float(value)
+
+ # precomputed kernel's index starts from 0 and LIBSVM
+ # checks it. Hence, don't treat index 0 as an error.
+ if index < 0:
+ err(line_no, "feature index must be positive; wrong feature %s" % nodes[i])
+ line_error = True
+ elif index < prev_index:
+ err(line_no, "feature indices must be in an ascending order, previous/current features %s %s" % (nodes[i-1], nodes[i]))
+ line_error = True
+ prev_index = index
+ except:
+ err(line_no, "feature '%s' not an <index>:<value> pair, <index> integer, <value> real number " % nodes[i])
+ line_error = True
+
+ line_no += 1
+
+ if line_error:
+ error_line_count += 1
+
+ if error_line_count > 0:
+ print("Found %d lines with error." % (error_line_count))
+ else:
+ print("No error.")
+
+main()
+
View
19 py-tools/dense2sparse.py
@@ -0,0 +1,19 @@
+#!/usr/local/bin/python2.0
+
+import os, sys
+
+from string import *
+
+argv=sys.argv
+argc=len(argv)
+
+raw = map(split, open(argv[1]).readlines())
+
+for line in raw:
+ print line[-1],
+ m=1
+ for token in line[:-1]:
+ if atof(token) != 0:
+ print "%d:%s"%(m,token),
+ m=m+1
+ print
View
82 py-tools/easy.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+import sys
+import os
+from subprocess import *
+
+if len(sys.argv) <= 1:
+ print('Usage: %s training_file [testing_file]' % sys.argv[0])
+ raise SystemExit
+
+# svm, grid, and gnuplot executable files
+
+is_win32 = (sys.platform == 'win32')
+if not is_win32:
+ svmscale_exe = "../svm-scale"
+ svmtrain_exe = "../svm-train"
+ svmpredict_exe = "../svm-predict"
+ grid_py = "./grid.py"
+ gnuplot_exe = "/usr/bin/gnuplot"
+else:
+ # example for windows
+ svmscale_exe = r"..\windows\svm-scale.exe"
+ svmtrain_exe = r"..\windows\svm-train.exe"
+ svmpredict_exe = r"..\windows\svm-predict.exe"
+ gnuplot_exe = r"c:\tmp\gnuplot\bin\pgnuplot.exe"
+ grid_py = r".\grid.py"
+
+assert os.path.exists(svmscale_exe),"svm-scale executable not found"
+assert os.path.exists(svmtrain_exe),"svm-train executable not found"
+assert os.path.exists(svmpredict_exe),"svm-predict executable not found"
+
+# gnuplot is not necessary for the process
+# assert os.path.exists(gnuplot_exe),"gnuplot executable not found"
+
+assert os.path.exists(grid_py),"grid.py not found"
+
+train_pathname = sys.argv[1]
+assert os.path.exists(train_pathname),"training file not found"
+file_name = os.path.split(train_pathname)[1]
+scaled_file = file_name + ".scale"
+model_file = file_name + ".model"
+range_file = file_name + ".range"
+
+if len(sys.argv) > 2:
+ test_pathname = sys.argv[2]
+ file_name = os.path.split(test_pathname)[1]
+ assert os.path.exists(test_pathname),"testing file not found"
+ scaled_test_file = file_name + ".scale"
+ predict_test_file = file_name + ".predict"
+
+cmd = '%s -s "%s" "%s" > "%s"' % (svmscale_exe, range_file, train_pathname, scaled_file)
+print 'Scaling training data...'
+call(cmd, shell = True)
+
+cmd = '%s -svmtrain "%s" -gnuplot "%s" "%s"' % (grid_py, svmtrain_exe, gnuplot_exe, scaled_file)
+print('Cross validation...')
+f = Popen(cmd, shell = True, stdout = PIPE).stdout
+
+line = ''
+while True:
+ last_line = line
+ line = f.readline()
+ if not line: break
+c,g,rate = map(float,last_line.split())
+
+print('Best c=%s, g=%s CV rate=%s' % (c,g,rate))
+
+cmd = '%s -c %s -g %s "%s" "%s"' % (svmtrain_exe,c,g,scaled_file,model_file)
+print('Training...')
+Popen(cmd, shell = True, stdout = PIPE).communicate()
+
+print('Output model: %s' % model_file)
+if len(sys.argv) > 2:
+ cmd = '%s -r "%s" "%s" > "%s"' % (svmscale_exe, range_file, test_pathname, scaled_test_file)
+ print('Scaling testing data...')
+ Popen(cmd, shell = True, stdout = PIPE).communicate()
+
+ cmd = '%s "%s" "%s" "%s"' % (svmpredict_exe, scaled_test_file, model_file, predict_test_file)
+ print('Testing...')
+ Popen(cmd, shell = True).communicate()
+
+ print('Output prediction: %s' % predict_test_file)
View
371 py-tools/grid.py
@@ -0,0 +1,371 @@
+#!/usr/bin/env python
+
+
+
+import os, sys, traceback
+import getpass
+from threading import Thread
+from subprocess import *
+
+if(sys.hexversion < 0x03000000):
+ import Queue
+else:
+ import queue as Queue
+
+
+# svmtrain and gnuplot executable
+
+is_win32 = (sys.platform == 'win32')
+if not is_win32:
+ svmtrain_exe = "../svm-train"
+ gnuplot_exe = "/usr/bin/gnuplot"
+else:
+ # example for windows
+ svmtrain_exe = r"..\windows\svm-train.exe"
+ gnuplot_exe = r"c:\tmp\gnuplot\bin\pgnuplot.exe"
+
+# global parameters and their default values
+
+fold = 5
+c_begin, c_end, c_step = -5, 15, 2
+g_begin, g_end, g_step = 3, -15, -2
+global dataset_pathname, dataset_title, pass_through_string
+global out_filename, png_filename
+
+# experimental
+
+telnet_workers = []
+ssh_workers = []
+nr_local_worker = 1
+
+# process command line options, set global parameters
+def process_options(argv=sys.argv):
+
+ global fold
+ global c_begin, c_end, c_step
+ global g_begin, g_end, g_step
+ global dataset_pathname, dataset_title, pass_through_string
+ global svmtrain_exe, gnuplot_exe, gnuplot, out_filename, png_filename
+
+ usage = """\
+Usage: grid.py [-log2c begin,end,step] [-log2g begin,end,step] [-v fold]
+[-svmtrain pathname] [-gnuplot pathname] [-out pathname] [-png pathname]
+[additional parameters for svm-train] dataset"""
+
+ if len(argv) < 2:
+ print(usage)
+ sys.exit(1)
+
+ dataset_pathname = argv[-1]
+ dataset_title = os.path.split(dataset_pathname)[1]
+ out_filename = '%s.out' % dataset_title
+ png_filename = '%s.png' % dataset_title
+ pass_through_options = []
+
+ i = 1
+ while i < len(argv) - 1:
+ if argv[i] == "-log2c":
+ i = i + 1
+ (c_begin,c_end,c_step) = map(float,split(argv[i],","))
+ elif argv[i] == "-log2g":
+ i = i + 1
+ (g_begin,g_end,g_step) = map(float,split(argv[i],","))
+ elif argv[i] == "-v":
+ i = i + 1
+ fold = argv[i]
+ elif argv[i] in ('-c','-g'):
+ print("Option -c and -g are renamed.")
+ print(usage)
+ sys.exit(1)
+ elif argv[i] == '-svmtrain':
+ i = i + 1
+ svmtrain_exe = argv[i]
+ elif argv[i] == '-gnuplot':
+ i = i + 1
+ gnuplot_exe = argv[i]
+ elif argv[i] == '-out':
+ i = i + 1
+ out_filename = argv[i]
+ elif argv[i] == '-png':
+ i = i + 1
+ png_filename = argv[i]
+ else:
+ pass_through_options.append(argv[i])
+ i = i + 1
+
+ pass_through_string = " ".join(pass_through_options)
+ assert os.path.exists(svmtrain_exe),"svm-train executable not found"
+
+ # gnuplot is not necessary for the process
+ #assert os.path.exists(gnuplot_exe),"gnuplot executable not found"
+
+ assert os.path.exists(dataset_pathname),"dataset not found"
+ gnuplot = None
+ if os.path.exists(gnuplot_exe):
+ gnuplot = Popen(gnuplot_exe,stdin = PIPE).stdin
+
+
+def range_f(begin,end,step):
+ # like range, but works on non-integer too
+ seq = []
+ while True:
+ if step > 0 and begin > end: break
+ if step < 0 and begin < end: break
+ seq.append(begin)
+ begin = begin + step
+ return seq
+
+def permute_sequence(seq):
+ n = len(seq)
+ if n <= 1: return seq
+
+ mid = int(n/2)
+ left = permute_sequence(seq[:mid])
+ right = permute_sequence(seq[mid+1:])
+
+ ret = [seq[mid]]
+ while left or right:
+ if left: ret.append(left.pop(0))
+ if right: ret.append(right.pop(0))
+
+ return ret
+
+def redraw(db,best_param,tofile=False):
+ if len(db) == 0: return
+ begin_level = round(max(x[2] for x in db)) - 3
+ step_size = 0.5
+
+ best_log2c,best_log2g,best_rate = best_param
+
+ if tofile:
+ if gnuplot != None:
+ gnuplot.write("set term png transparent small\n")
+ gnuplot.write("set output \"%s\"\n" % png_filename.replace('\\','\\\\'))
+ #gnuplot.write("set term postscript color solid\n")
+ #gnuplot.write("set output \"%s.ps\"\n" % dataset_title)
+ elif is_win32:
+ if gnuplot != None:
+ gnuplot.write("set term windows\n")
+ else:
+ if gnuplot != None:
+ gnuplot.write("set term x11\n")
+ gnuplot.write("set xlabel \"log2(C)\"\n")
+ gnuplot.write("set ylabel \"log2(gamma)\"\n")
+ gnuplot.write("set xrange [%s:%s]\n" % (c_begin,c_end))
+ gnuplot.write("set yrange [%s:%s]\n" % (g_begin,g_end))
+ gnuplot.write("set contour\n")
+ gnuplot.write("set cntrparam levels incremental %s,%s,100\n" % (begin_level,step_size))
+ gnuplot.write("unset surface\n")
+ gnuplot.write("unset ztics\n")
+ gnuplot.write("set view 0,0\n")
+ gnuplot.write("set title \"%s\"\n" % dataset_title)
+ gnuplot.write("unset label\n")
+ gnuplot.write("set label \"Best log2(C) = %s log2(gamma) = %s accuracy = %s%%\" \
+ at screen 0.5,0.85 center\n" % \
+ (best_log2c, best_log2g, best_rate))
+ gnuplot.write("set label \"C = %s gamma = %s\""
+ " at screen 0.5,0.8 center\n" % (2**best_log2c, 2**best_log2g))
+ gnuplot.write("splot \"-\" with lines\n")
+ def cmp (x,y):
+ if x[0] < y[0]: return -1
+ if x[0] > y[0]: return 1
+ if x[1] > y[1]: return -1
+ if x[1] < y[1]: return 1
+ return 0
+ db.sort(cmp)
+ prevc = db[0][0]
+ for line in db:
+ if prevc != line[0]:
+ prevc = line[0]
+ if gnuplot != None:
+ gnuplot.write("\n")
+ if gnuplot != None:
+ gnuplot.write("%s %s %s\n" % line)
+ if gnuplot != None:
+ gnuplot.write("e\n")
+ gnuplot.write("\n") # force gnuplot back to prompt when term set failure
+ gnuplot.flush()
+
+
+def calculate_jobs():
+ c_seq = permute_sequence(range_f(c_begin,c_end,c_step))
+ g_seq = permute_sequence(range_f(g_begin,g_end,g_step))
+ nr_c = float(len(c_seq))
+ nr_g = float(len(g_seq))
+ i = 0
+ j = 0
+ jobs = []
+
+ while i < nr_c or j < nr_g:
+ if i/nr_c < j/nr_g:
+ # increase C resolution
+ line = []
+ for k in range(0,j):
+ line.append((c_seq[i],g_seq[k]))
+ i = i + 1
+ jobs.append(line)
+ else:
+ # increase g resolution
+ line = []
+ for k in range(0,i):
+ line.append((c_seq[k],g_seq[j]))
+ j = j + 1
+ jobs.append(line)
+ return jobs
+
+class WorkerStopToken: # used to notify the worker to stop
+ pass
+
+class Worker(Thread):
+ def __init__(self,name,job_queue,result_queue):
+ Thread.__init__(self)
+ self.name = name
+ self.job_queue = job_queue
+ self.result_queue = result_queue
+ def run(self):
+ while True:
+ (cexp,gexp) = self.job_queue.get()
+ if cexp is WorkerStopToken:
+ self.job_queue.put((cexp,gexp))
+ # print 'worker %s stop.' % self.name
+ break
+ try:
+ rate = self.run_one(2.0**cexp,2.0**gexp)
+ if rate is None: raise "get no rate"
+ except:
+ # we failed, let others do that and we just quit
+
+ traceback.print_exception(sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2])
+
+ self.job_queue.put((cexp,gexp))
+ print('worker %s quit.' % self.name)
+ break
+ else:
+ self.result_queue.put((self.name,cexp,gexp,rate))
+
+class LocalWorker(Worker):
+ def run_one(self,c,g):
+ cmdline = '%s -c %s -g %s -v %s %s %s' % \
+ (svmtrain_exe,c,g,fold,pass_through_string,dataset_pathname)
+ result = Popen(cmdline,shell=True,stdout=PIPE).stdout
+ for line in result.readlines():
+ if str(line).find("Cross") != -1:
+ return float(line.split()[-1][0:-1])
+
+class SSHWorker(Worker):
+ def __init__(self,name,job_queue,result_queue,host):
+ Worker.__init__(self,name,job_queue,result_queue)
+ self.host = host
+ self.cwd = os.getcwd()
+ def run_one(self,c,g):
+ cmdline = 'ssh -x %s "cd %s; %s -c %s -g %s -v %s %s %s"' % \
+ (self.host,self.cwd,
+ svmtrain_exe,c,g,fold,pass_through_string,dataset_pathname)
+ result = Popen(cmdline,shell=True,stdout=PIPE).stdout
+ for line in result.readlines():
+ if str(line).find("Cross") != -1:
+ return float(line.split()[-1][0:-1])
+
+class TelnetWorker(Worker):
+ def __init__(self,name,job_queue,result_queue,host,username,password):
+ Worker.__init__(self,name,job_queue,result_queue)
+ self.host = host
+ self.username = username
+ self.password = password
+ def run(self):
+ import telnetlib
+ self.tn = tn = telnetlib.Telnet(self.host)
+ tn.read_until("login: ")
+ tn.write(self.username + "\n")
+ tn.read_until("Password: ")
+ tn.write(self.password + "\n")
+
+ # XXX: how to know whether login is successful?
+ tn.read_until(self.username)
+ #
+ print('login ok', self.host)
+ tn.write("cd "+os.getcwd()+"\n")
+ Worker.run(self)
+ tn.write("exit\n")
+ def run_one(self,c,g):
+ cmdline = '%s -c %s -g %s -v %s %s %s' % \
+ (svmtrain_exe,c,g,fold,pass_through_string,dataset_pathname)
+ result = self.tn.write(cmdline+'\n')
+ (idx,matchm,output) = self.tn.expect(['Cross.*\n'])
+ for line in output.split('\n'):
+ if str(line).find("Cross") != -1:
+ return float(line.split()[-1][0:-1])
+
+def main():
+
+ # set parameters
+
+ process_options()
+
+ # put jobs in queue
+
+ jobs = calculate_jobs()
+ job_queue = Queue.Queue(0)
+ result_queue = Queue.Queue(0)
+
+ for line in jobs:
+ for (c,g) in line:
+ job_queue.put((c,g))
+
+ job_queue._put = job_queue.queue.appendleft
+
+
+ # fire telnet workers
+
+ if telnet_workers:
+ nr_telnet_worker = len(telnet_workers)
+ username = getpass.getuser()
+ password = getpass.getpass()
+ for host in telnet_workers:
+ TelnetWorker(host,job_queue,result_queue,
+ host,username,password).start()
+
+ # fire ssh workers
+
+ if ssh_workers:
+ for host in ssh_workers:
+ SSHWorker(host,job_queue,result_queue,host).start()
+
+ # fire local workers
+
+ for i in range(nr_local_worker):
+ LocalWorker('local',job_queue,result_queue).start()
+
+ # gather results
+
+ done_jobs = {}
+
+
+ result_file = open(out_filename, 'w')
+
+
+ db = []
+ best_rate = -1
+ best_c1,best_g1 = None,None
+
+ for line in jobs:
+ for (c,g) in line:
+ while (c, g) not in done_jobs:
+ (worker,c1,g1,rate) = result_queue.get()
+ done_jobs[(c1,g1)] = rate
+ result_file.write('%s %s %s\n' %(c1,g1,rate))
+ result_file.flush()
+ if (rate > best_rate) or (rate==best_rate and g1==best_g1 and c1<best_c1):
+ best_rate = rate
+ best_c1,best_g1=c1,g1
+ best_c = 2.0**c1
+ best_g = 2.0**g1
+ print("[%s] %s %s %s (best c=%s, g=%s, rate=%s)" % \
+ (worker,c1,g1,rate, best_c, best_g, best_rate))
+ db.append((c,g,done_jobs[(c,g)]))
+ redraw(db,[best_c1, best_g1, best_rate])
+ redraw(db,[best_c1, best_g1, best_rate],True)
+
+ job_queue.put((WorkerStopToken,None))
+ print "%s %s %s" % (best_c, best_g, best_rate)
+main()
View
35 py-tools/sparse2dense.py
@@ -0,0 +1,35 @@
+#!/usr/local/bin/python2.0
+
+import os, sys
+
+from string import *
+
+argv=sys.argv
+argc=len(argv)
+
+raw = map(split, open(argv[1]).readlines())
+
+m=-1
+data=[]
+for line in raw:
+ dline = [line[0]]
+ begin=1
+ for token in line[1:]:
+ both=split(token, ":")
+ next=atoi(both[0])
+ if next>m:
+ m=next
+ for i in range(begin, next):
+ dline.append("0")
+ dline.append(both[1])
+ begin=next+1
+ data.append(dline)
+
+for dline in data:
+ for token in dline[1:]:
+ print token,
+ for i in range(len(dline), m+1):
+ print 0,
+ print dline[0]
+
+
View
146 py-tools/subset.py
@@ -0,0 +1,146 @@
+#!/usr/bin/env python
+from sys import argv, exit, stdout, stderr
+from random import randint
+
+method = 0
+global n
+global dataset_filename
+subset_filename = ""
+rest_filename = ""
+
+def exit_with_help():
+ print("""\
+Usage: %s [options] dataset number [output1] [output2]
+
+This script selects a subset of the given dataset.
+
+options:
+-s method : method of selection (default 0)
+ 0 -- stratified selection (classification only)
+ 1 -- random selection
+
+output1 : the subset (optional)
+output2 : rest of the data (optional)
+If output1 is omitted, the subset will be printed on the screen.""" % argv[0])
+ exit(1)
+
+def process_options():
+ global method, n
+ global dataset_filename, subset_filename, rest_filename
+
+ argc = len(argv)
+ if argc < 3:
+ exit_with_help()
+
+ i = 1
+ while i < len(argv):
+ if argv[i][0] != "-":
+ break
+ if argv[i] == "-s":
+ i = i + 1
+ method = int(argv[i])
+ if method < 0 or method > 1:
+ print("Unknown selection method %d" % (method))
+ exit_with_help()
+ i = i + 1
+
+ dataset_filename = argv[i]
+ n = int(argv[i+1])
+ if i+2 < argc:
+ subset_filename = argv[i+2]
+ if i+3 < argc:
+ rest_filename = argv[i+3]
+
+def main():
+ class Label:
+ def __init__(self, label, index, selected):
+ self.label = label
+ self.index = index
+ self.selected = selected
+
+ process_options()
+
+ # get labels
+ i = 0
+ labels = []
+ f = open(dataset_filename, 'r')
+ for line in f:
+ labels.append(Label(float((line.split())[0]), i, 0))
+ i = i + 1
+ f.close()
+ l = i
+
+ # determine where to output
+ if subset_filename != "":
+ file1 = open(subset_filename, 'w')
+ else:
+ file1 = stdout
+ split = 0
+ if rest_filename != "":
+ split = 1
+ file2 = open(rest_filename, 'w')
+
+ # select the subset
+ warning = 0
+ if method == 0: # stratified
+ labels.sort(key = lambda x: x.label)
+
+ label_end = labels[l-1].label + 1
+ labels.append(Label(label_end, l, 0))
+
+ begin = 0
+ label = labels[begin].label
+ for i in range(l+1):
+ new_label = labels[i].label
+ if new_label != label:
+ nr_class = i - begin
+ k = i*n//l - begin*n//l
+ # at least one instance per class
+ if k == 0:
+ k = 1
+ warning = warning + 1
+ for j in range(nr_class):
+ if randint(0, nr_class-j-1) < k:
+ labels[begin+j].selected = 1
+ k = k - 1
+ begin = i
+ label = new_label
+ elif method == 1: # random
+ k = n
+ for i in range(l):
+ if randint(0,l-i-1) < k:
+ labels[i].selected = 1
+ k = k - 1
+ i = i + 1
+
+ # output
+ i = 0
+ if method == 0:
+ labels.sort(key = lambda x: int(x.index))
+
+ f = open(dataset_filename, 'r')
+ for line in f:
+ if labels[i].selected == 1:
+ file1.write(line)
+ else:
+ if split == 1:
+ file2.write(line)
+ i = i + 1
+
+ if warning > 0:
+ stderr.write("""\
+Warning:
+1. You may have regression data. Please use -s 1.
+2. Classification data unbalanced or too small. We select at least 1 per class.
+ The subset thus contains %d instances.
+""" % (n+warning))
+
+ # cleanup
+ f.close()
+
+ file1.close()
+
+ if split == 1:
+ file2.close()
+
+main()

0 comments on commit bb1f5a1

Please sign in to comment.
Something went wrong with that request. Please try again.