Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoly leak when use python-wrapper and input string is too long #47

Open
ankokumoyashi opened this issue Nov 28, 2018 · 0 comments
Open

Comments

@ankokumoyashi
Copy link

memoly leak When the following conditions are fullfilled

  • use python wrapper("-C (allocate sentence)" option is ON)
  • use same lattice instance within each loop
  • input bytes over 5534

How to reproduce

  • versions

    • Python 3.5.1
    • mecab of 0.996
  • code

    import MeCab
    import os
    import psutil
    import sys
    pid = os.getpid()
    py = psutil.Process(pid)
    
    
    class CheckMemoryLeak():
        def __init__(self):
            self.lattice = MeCab.Lattice()
    
        def mecab_set_sentence(self, text):
            self.lattice.set_sentence(text)
    
    
    if __name__ == '__main__':
        Mecab = CheckMemoryLeak()
        sentence = 'あ' * 2730
        print('input bytes:', sys.getsizeof(sentence))
        while True:
            Mecab.mecab_set_sentence(sentence)
            memoryUse = py.memory_info()[0]
            print('memory use:', memoryUse)
  • result

    input bytes: 5534
    memory use: 13950976
    ・・・(about 10 times mecab_set_sentence)
    memory use: 14221312
    ・・・(about 10 times mecab_set_sentence)
    memory use: 14491648
    ・・・(after 30 seconds)
    memory use: 2043158528
    

However, in the case of the following code

sentence = 'あ' * 2729
  • result
    input bytes: 5532
    memory use: 13950976
    ・・・(about 10 times mecab_set_sentence)
    memory use: 14155776
    ・・・(after 30 seconds)
    memory use: 14155776
    ・・・(after 10 minutes)
    memory use: 14155776
    

Probable Cause

  • It is not checked that the number of bytes of input_str is less than or equal to BUF_SIZE.
  • It is considered that a memory leak has occurred when allocating a character string of a size exceeding BUF_SIZE after allocating an area for BUF_SIZE.
  • BUF_SIZE, MIN_INPUT_BUFFER_SIZE, MAX_INPUT_BUFFER_SIZE can not be set with setting file, options, etc. only input-buffer-size

char *alloc(size_t size) {
if (!char_freelist_.get()) {
char_freelist_.reset(new ChunkFreeList<char>(BUF_SIZE));
}
return char_freelist_->alloc(size + 1);
}
char *strdup(const char *str, size_t size) {
char *n = alloc(size + 1);
std::strncpy(n, str, size + 1);
return n;
}

Temporary solution

  1. Edit BUF_SIZE

mecab/mecab/src/common.h

Lines 72 to 74 in 3a07c4e

#define MIN_INPUT_BUFFER_SIZE 8192
#define MAX_INPUT_BUFFER_SIZE (8192*640)
#define BUF_SIZE 8192

  • before

    #define MIN_INPUT_BUFFER_SIZE 8192
    #define MAX_INPUT_BUFFER_SIZE (8192*640)
    #define BUF_SIZE 8192
  • after

    #define MIN_INPUT_BUFFER_SIZE 16384
    #define MAX_INPUT_BUFFER_SIZE (16384*640)
    #define BUF_SIZE 16384
  1. rebuild&reinstall
make
sudo make install

Proposed solution

The problem is that execution will not stop even if a memory leak occurs

  • Warn if input string exceeds BUF_SIZE also python-wrapper
@ankokumoyashi ankokumoyashi changed the title Memoly leak when use python-wrapper and input too long sentence Memoly leak when use python-wrapper and input string is too long Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant