Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tlsh_unittest shared-linkable #111

Open
mapreri opened this issue Oct 12, 2021 · 7 comments
Open

Make tlsh_unittest shared-linkable #111

mapreri opened this issue Oct 12, 2021 · 7 comments

Comments

@mapreri
Copy link
Contributor

mapreri commented Oct 12, 2021

Hi!

I understand that you don't seem to appreciate shared linked binaries, but in Debian we tend to at least have a preference for them.

Currently forcing tlsh_unittest to link libtlsh.so fails with this:

/usr/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function `trendLSH_ut(char*, char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, int, int, char*, int)':
tlsh_unittest.cpp:(.text+0x459): undefined reference to `set_input_desc(char*, char*, int, int, char*, char*, int, int, char*, InputDescr*, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0x4a6): undefined reference to `read_file_eval_tlsh(char*, Tlsh*, int, int, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0x594): undefined reference to `freeFileName(FileName*, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0x7da): undefined reference to `convert_special_chars(char*, char*, unsigned long, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0x7fb): undefined reference to `convert_special_chars(char*, char*, unsigned long, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0xa30): undefined reference to `convert_special_chars(char*, char*, unsigned long, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0xb01): undefined reference to `convert_special_chars(char*, char*, unsigned long, int)'
/usr/bin/ld: tlsh_unittest.cpp:(.text+0xba3): undefined reference to `convert_special_chars(char*, char*, unsigned long, int)'
/usr/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o:tlsh_unittest.cpp:(.text+0xbc4): more undefined references to `convert_special_chars(char*, char*, unsigned long, int)' follow
/usr/bin/ld: CMakeFiles/tlsh_unittest.dir/tlsh_unittest.cpp.o: in function `trendLSH_ut(char*, char*, char*, char*, int, int, char*, char*, int, bool, int, int, int, int, int, int, char*, int)':
tlsh_unittest.cpp:(.text+0xe06): undefined reference to `freeFileName(FileName*, int)'
collect2: error: ld returned 1 exit status

This is a partial result of using -fvisibility=internal (which is great!), but it also means that the tlsh_unittest binary is making use of functions that are not part of the public API nor ABI.

I made it build with this patch, that exposes the missing function in the library:

--- a/src/input_desc.cpp
+++ b/src/input_desc.cpp
@@ -153,6 +153,7 @@
         return (strcmp(r1->full_fname, r2->full_fname));
 }
 
+__attribute__ ((visibility ("default")))
 int set_input_desc(char *dirname, char *listname, int listname_col, int listname_csv,
 	char *fname, char *digestname, int show_details, int fc_cons_option, char *splitlines, struct InputDescr *inputd, int showvers)
 {
--- a/src/shared_file_functions.cpp
+++ b/src/shared_file_functions.cpp
@@ -138,6 +138,7 @@
 	}
 }
 
+__attribute__ ((visibility ("default")))
 const char *convert_special_chars(char *filename, char *buf, size_t bufSize, int output_json)
 {
 	int len = strlen(filename);
@@ -182,6 +183,7 @@
 
 ////////////////////////////////////////////////////////////////////////////////
 
+__attribute__ ((visibility ("default")))
 int read_file_eval_tlsh(char *fname, Tlsh *th, int show_details, int fc_cons_option, int showvers)
 {
 	///////////////////////////////////////
@@ -402,6 +404,7 @@
 	return(err);
 }
 
+__attribute__ ((visibility ("default")))
 void freeFileName(struct FileName *fnames, int count)
 {
     for (int i=0; i<count; i++) {

However, that's clearly not an optimal solution. I think it would be best if one of the followings could happen (this is not exhaustive, just what came to my mind):

  1. move those functions into the official public API/ABI
  2. make those functions static or inline, that should make the compiler pick them up earlier
  3. ... not sure about other options right away ...

Besides this, it would be nice if you could elaborate on what "problems" it was causing when "compiling / testing tools on Linux", as it's mentioned in src/CMakeLists.txt.

Thank you for considering at least! :)

@mapreri
Copy link
Contributor Author

mapreri commented Oct 12, 2021

Oh, right. In my mind another option would be to stop shipping tlsh_unittest, and instead create a new tlsh binary to replace the current symlink.

@jonjoliver
Copy link
Collaborator

There were 2 problems.

(1) When introducing people to the tool, they would sometimes (or quite often) not set the LD_LIBRARY_PATH correctly.
Sometimes this would result in me helping them fix it.
Sometimes they would just assume that the tool did not work, and move on.
This problem went away when I make the default behaviour to statically link the program.

(2) sdhash (another fuzzy hash) had problems with shared libraries / environment.
We discovered this when I went to Taipei and ran a class on fuzzy hashes
We had to include a specific version of a shared library with the course material (a crypto library).
And we also discovered that the hash value for a file would depend on the locale of the computer evaluating the hash (this is a disaster from a security perspective)
The computers in Taipei would evaluate a different hash value than computers in USA or Australia
A horrible fix was to
$ export LC_ALL="en_US.UTF-8"

Our approach to avoid problems like this was that TLSH should be

  • completely self contained (as much as possible) and avoid any dependancies
  • statically linked

@jonjoliver
Copy link
Collaborator

I think lets do your solution
"stop shipping tlsh_unittest, and instead create a new tlsh binary to replace the current symlink."

@mapreri
Copy link
Contributor Author

mapreri commented Oct 12, 2021

Hi! Thanks for following up! :)

(1) When introducing people to the tool, they would sometimes (or quite often) not set the LD_LIBRARY_PATH correctly.

Though this is really due to them probably being new to compiled programs. I think this can easily be a problem for people building the tool themselves indeed.

That said it can't be problems for distributors since they would place the shared library in a standard location anyway.

(2) sdhash (another fuzzy hash) had problems with shared libraries / environment. We discovered this when I went to Taipei and ran a class on fuzzy hashes We had to include a specific version of a shared library with the course material (a crypto library). And we also discovered that the hash value for a file would depend on the locale of the computer evaluating the hash

This is concerning though. But how would this be fixed by statically linking the tool? Was that perhaps a bug in their libc implementation (which went away since you started to use -static-libc) or some other library? Locale settings change all the time so those really are separate bugs.

(incidentally, this could most likely be attributed to LC_COLLATE which changes the behaviour of many sorting algorithms)

@jonjoliver
Copy link
Collaborator

Do you have a preferred way to
"stop shipping tlsh_unittest, and instead create a new tlsh binary to replace the current symlink"
I can do it by a simple copy in CMakefile
But if we do this the preferred Debian way, then I am guessing that will work smoothly for nearly everyone else

@mapreri
Copy link
Contributor Author

mapreri commented Oct 13, 2021

Mh, perhaps it wasn't quite as obvious what I was meaning.

In my mind I was thinking of a tool (which I suppose would be pretty much what tlsh_unittest is today) that makes use only of public API functions, leaving private functions needed for the testsuite into the current binary. Now I actually looked into how tlsh_unittest is done and the flow of it, and I'm not sure if that's actually a proper way forward for this. mh.

@cgull
Copy link
Contributor

cgull commented Nov 24, 2021

I agree with @mapreri that a tlsh program (for users) separate from tlsh_unittest (for testing) would make sense, but I haven't looked closely at its source code.

I also want to point out that working with packaging and distributions has a lot of benefits for you as the original developer. Packaging that installs into standard system locations will address most of the problems you've mentioned, such as shared libraries. It tends to split work between you (the original developer) and groups of users (using different OSes, with different needs, who need to port and package for their environment). It greatly reduces the work needed for new OS versions, and now that tlsh is building in a way that works well with packaging, it also reduces the work to upgrade to new tlsh versions. It reduces duplicated work. It tends to bring wider usage of your software, more bug reports, and more eyeballs. Distros and users will do various sorts of automated fuzzing and static analysis and other testing that will be an unexpected benefit to you. That doesn't necessarily mean less work for you, but it often means you do better, more meaningful work.

mavaddat added a commit to mavaddat/tlsh that referenced this issue Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants