Skip to content

Float16 TFLite models with multiple interpreters doesn't share weights but bloats memory #48027

@barabanus

Description

@barabanus

System information

Describe the current behavior

When we create multiple interpreters per single model (e.g. for thread-safety reasons) the expected behavior is that all interpreters share model weights. And that's statement is true when the model is float32 model:

$ ./minimal mem-bloat-float32.tflite        
initial mem 1 MB
interpreter #1, mem 139 MB
interpreter #2, mem 141 MB
interpreter #3, mem 144 MB
interpreter #4, mem 146 MB
interpreter #5, mem 148 MB
interpreter #6, mem 150 MB
interpreter #7, mem 152 MB
interpreter #8, mem 154 MB
interpreter #9, mem 156 MB

But when the same model had been converted to be float16 the weights are not shared anymore and memory is bloated so that each interpreter has its own copy of model weights:

$ ./minimal mem-bloat-float16.tflite
initial mem 1 MB
interpreter #1, mem 206 MB
interpreter #2, mem 341 MB
interpreter #3, mem 476 MB
interpreter #4, mem 610 MB
interpreter #5, mem 745 MB
interpreter #6, mem 880 MB
interpreter #7, mem 1015 MB
interpreter #8, mem 1149 MB
interpreter #9, mem 1284 MB

Describe the expected behavior

The expected behavior is that multiple interpreters share weights for the same model, whenever it's float16 or float32 model.

Standalone code to reproduce the issue

  • unzip TFLite models: mem-bloat-tflite.zip
  • build TFLite minimal example with the following code within tensorflow/tensorflow/lite/examples/minimal.cc:
#include <cstdio>

#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/optional_debug_tools.h"

#if defined(_WIN32)
#include <windows.h>
#include <psapi.h>

#elif defined(__unix__) || defined(__unix) || defined(unix) || (defined(__APPLE__) && defined(__MACH__))
#include <unistd.h>
#include <sys/resource.h>

#if defined(__APPLE__) && defined(__MACH__)
#include <mach/mach.h>

#elif (defined(_AIX) || defined(__TOS__AIX__)) || (defined(__sun__) || defined(__sun) || defined(sun) && (defined(__SVR4) || defined(__svr4__)))
#include <fcntl.h>
#include <procfs.h>

#elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__)
#include <stdio.h>

#endif

#else
#error "Cannot define getCurrentRSS( ) for an unknown OS."
#endif


/**
 * Returns the current resident set size (physical memory use) measured
 * in bytes, or zero if the value cannot be determined on this OS.
 */
size_t getCurrentRSS( )
{
#if defined(_WIN32)
    /* Windows -------------------------------------------------- */
    PROCESS_MEMORY_COUNTERS info;
    GetProcessMemoryInfo( GetCurrentProcess( ), &info, sizeof(info) );
    return (size_t)info.WorkingSetSize;

#elif defined(__APPLE__) && defined(__MACH__)
    /* OSX ------------------------------------------------------ */
    struct mach_task_basic_info info;
    mach_msg_type_number_t infoCount = MACH_TASK_BASIC_INFO_COUNT;
    if ( task_info( mach_task_self( ), MACH_TASK_BASIC_INFO,
        (task_info_t)&info, &infoCount ) != KERN_SUCCESS )
        return (size_t)0L;      /* Can't access? */
    return (size_t)info.resident_size;

#elif defined(__linux__) || defined(__linux) || defined(linux) || defined(__gnu_linux__)
    /* Linux ---------------------------------------------------- */
    long rss = 0L;
    FILE* fp = NULL;
    if ( (fp = fopen( "/proc/self/statm", "r" )) == NULL )
        return (size_t)0L;      /* Can't open? */
    if ( fscanf( fp, "%*s%ld", &rss ) != 1 )
    {
        fclose( fp );
        return (size_t)0L;      /* Can't read? */
    }
    fclose( fp );
    return (size_t)rss * (size_t)sysconf( _SC_PAGESIZE);

#else
    /* AIX, BSD, Solaris, and Unknown OS ------------------------ */
    return (size_t)0L;          /* Unsupported. */
#endif
}


#define TFLITE_MINIMAL_CHECK(x)                              \
  if (!(x)) {                                                \
    fprintf(stderr, "Error at %s:%d\n", __FILE__, __LINE__); \
    exit(1);                                                 \
  }

int main(int argc, char* argv[])
{
  if (argc != 2) {
    fprintf(stderr, "minimal <tflite model>\n");
    return 1;
  }
  const char* filename = argv[1];

  printf("initial mem %d MB\n", getCurrentRSS() >> 20);

  // Load model
  std::unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(filename);
  TFLITE_MINIMAL_CHECK(model != nullptr);

  tflite::ops::builtin::BuiltinOpResolver resolver;
  tflite::InterpreterBuilder builder(*model, resolver);

  std::unique_ptr<tflite::Interpreter> interpreter_list[9];
  for (auto &interpreter : interpreter_list) {

    builder(&interpreter, 1);
    TFLITE_MINIMAL_CHECK(interpreter != nullptr);
    TFLITE_MINIMAL_CHECK(interpreter->AllocateTensors() == kTfLiteOk);
    interpreter->Invoke();

    printf("interpreter #%d, mem %d MB\n", &interpreter - &interpreter_list[0] + 1, getCurrentRSS() >> 20);
  }

  return 0;
}

Other info / logs
Both versions of model had been converted from this protobuf file: mem-bloat-protobuf.zip

Metadata

Metadata

Labels

comp:liteTF Lite related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authortype:featureFeature requeststype:performancePerformance Issue

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions