Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge leaks #4

Closed
jakoss opened this issue May 9, 2016 · 8 comments
Closed

Huge leaks #4

jakoss opened this issue May 9, 2016 · 8 comments

Comments

@jakoss
Copy link

jakoss commented May 9, 2016

I'm not sure if it's C# wrappers fault or RocksDB issue itself.

I started a huge bulk seed for our production database. After running for about 5 hours RocksDB was taking 7,5 GB of RAM. ANTS 8 shows that this memory is unmanaged memory taken by librocksdb.dll. Same thing when i leave application working in backgroud. We are doing a lot of random access on database with about 550M keys. Application slowly but progressively is taking all ram on computer. I tried releasing database - helps little bit, but still memory gets "eaten".

Seems like sth isn't closing up, i can't really find out what ;/

My (simplified) code of the bulk seed is here:

            ColumnFamilies _columnFamilies;
            DbOptions _dbOptions;

            _dbOptions = new DbOptions()
                .SetCreateIfMissing(true)
                .SetCreateMissingColumnFamilies(true)
                .PrepareForBulkLoad()
                .IncreaseParallelism(4);
            _columnFamilies = new ColumnFamilies
            {
                {"index", new ColumnFamilyOptions().OptimizeForPointLookup(512)},
                {"store", new ColumnFamilyOptions().OptimizeForPointLookup(512)},
                {"metadata", new ColumnFamilyOptions()}
            };

            RocksDb _dbContext = null;
            RocksDBSerializer _rocksDBSerializer = null;
            ColumnFamilyHandle _indexColumn = null;
            ColumnFamilyHandle _storeColumn = null;
            ColumnFamilyHandle _metadataColumn = null;
            int counter = 0;
            Int64 trackId = -1;

            foreach (var path in files)
            {
                // every 100 passes release database handle
                // not much result unfortunetely, memory is leaking somewhere else 
                if (counter == 99) counter = 0;
                if (counter == 0) {
                    if (_dbContext != null)
                    {
                        _rocksDBSerializer = null;
                        _indexColumn.Dispose();
                        _storeColumn.Dispose();
                        _metadataColumn.Dispose();
                        _dbContext.Dispose();
                    }
                    _dbContext = RocksDb.Open(_dbOptions, "storeDatabase", _columnFamilies);
                    _indexColumn = _dbContext.GetColumnFamily("index");
                    _storeColumn = _dbContext.GetColumnFamily("store");
                    _metadataColumn = _dbContext.GetColumnFamily("metadata");

                    _rocksDBSerializer = new RocksDBSerializer(_dbContext, _indexColumn, _storeColumn, _metadataColumn);

                    trackId = _rocksDBSerializer.GetLastTrackId();
                }
                counter++;

                var dirs = path.Split('\\');
                var fileName = dirs[dirs.Length - 1];
                Int64 mediaObjectId;
                if(!Int64.TryParse(fileName , out mediaObjectId))
                {
                    Console.WriteLine("Error bad media object id");
                    Console.WriteLine("File name: {0} id: {1}", fileName, idString);
                    return;
                }

                // shortCodes is a array of short ints
                // we have 70k files, each one generates about 8000 shortCodes

                foreach (var shortCode in shortCodes)
                {
                    trackId++;
                    Byte[] key = BitConverter.GetBytes(trackId);

                    _rocksDBSerializer.AddToStore(key, shortCode);

                    _rocksDBSerializer.AddToMetadata(mediaObjectId, trackId, key);

                    _rocksDBSerializer.AddToIndex(trackId, shortCode);
                }
            }
            _dbContext.Dispose();

RocksDBSerializer is here:

        public void AddToIndex(long trackId, List<CodeSegment> shortCode)
        {
            foreach (var codeSegment in shortCode)
            {
                Byte[] indexKey = BitConverter.GetBytes(codeSegment.Code);
                Byte[] indexValue = _dbContext.Get(indexKey, _indexColumn);
                if (indexValue != null)
                {
                    List<Int64> tracks = Enumerable.Range(0, indexValue.Length / 8)
                        .Select(i => BitConverter.ToInt64(indexValue, i * 8))
                        .ToList();
                    if (!tracks.Contains(trackId))
                    {
                        tracks.Add(trackId);
                        indexValue = tracks.SelectMany(BitConverter.GetBytes).ToArray();
                        _dbContext.Put(indexKey, indexValue, _indexColumn);
                    }
                }
                else
                {
                    List<Int64> tracks = new List<Int64> { trackId };
                    indexValue = tracks.SelectMany(BitConverter.GetBytes).ToArray();
                    _dbContext.Put(indexKey, indexValue, _indexColumn);
                }
            }
        }

        public void AddToMetadata(long mediaObjectId, long trackId, byte[] key)
        {
            Byte[] mediaObjectIdAsByteArray = BitConverter.GetBytes(mediaObjectId);
            Byte[] metadataValue = _dbContext.Get(key, _metadataColumn);
            if (metadataValue != null && metadataValue.Equals(mediaObjectIdAsByteArray))
            {
                throw new Exception(
                    String.Format("This mapping already exists and has diferent value. TrackId: {0}; MediaObjectId: {1}", trackId, mediaObjectId)
                    );
            }

            _dbContext.Put(key, mediaObjectIdAsByteArray, _metadataColumn);
        }

        public void AddToStore(Byte[] key, List<CodeSegment> codeSegments)
        {
            Byte[] value;

            using (var ms = new MemoryStream())
            {
                Serializer.Serialize(ms, codeSegments);
                value = ms.ToArray();
            }

            _dbContext.Put(key, value, _storeColumn);
        }

        public Int64 GetLastTrackId()
        {
            Iterator iter = _dbContext.NewIterator(_storeColumn);
            iter.SeekToLast();
            Int64 trackId = -1;
            if (iter.Valid())
            {
                Byte[] firstKey = iter.Key();
                trackId = BitConverter.ToInt64(firstKey, 0);
            }
            iter.Dispose();
            return trackId;
        }

        #endregion

I tried this code with WriteBatch, performance is better but leak is still there :(

@warrenfalk
Copy link
Owner

The code is too complicated for me to see an obvious flaw, and is missing enough that I can't run it, and missing enough detail in the comments to reconstruct the missing parts, which makes it difficult to be of much help. If you can reproduce the problem with simpler code, it would make it a lot simpler. I will see if I can reproduce a memory leak with similar operations.

@warrenfalk
Copy link
Owner

How big is the database, on disk, when this runs out of memory?

@jakoss
Copy link
Author

jakoss commented May 11, 2016

Well, that's the weird part, it's kinda.. random to me. Or i just can't find the pattern here. Sometimes it have 260 MB and sometimes it stops on like 90 MB. The same database takes like 4 GB when i try to build it on LMDB!

I will try to create small project to reproduce this.

@jakoss
Copy link
Author

jakoss commented May 11, 2016

I wasn't able to create small project since this whole thing is just POC and project structure is kinda big mess. You can get whole solution and build RocksDbBuilder to reproduce this problem. Links:

Solution: https://mega.nz/#!pYMgiB6b!-3me6SnpTWTthmhLpZX14EFCbAlofoPAxzOz6ct6gmw
Data (use this directory as application param): https://mega.nz/#!dAdjhZJL!GdfdEPSlwqLI-LEjzK4Dd_Bc-dCIIGXxb1Sb89WdMEk

@chester89
Copy link

chester89 commented May 11, 2016

@NekroMancer Have you looked at memory usage wiki page?

@jakoss
Copy link
Author

jakoss commented May 11, 2016

@chester89 I looked - honestly RocksDB is pretty overwhelming for me with its all configuration options. And i tried to find answer to the one question that really interests me - is it even possible to setup something like "max heap" for RocksDB instance. But without any luck..

@warrenfalk Sorry, my bad. I updated links

@warrenfalk
Copy link
Owner

Indeed there was a memory leak in the .Net wrapper when doing reads (writes were not affected). This has been corrected, now. Sorry it took a while without enough to reproduce, I had tried myself but focused on the puts when the problem was the gets.

I agree, RocksDB is overwhelming. It's awesome but the documentation has some catching up to do.

@chester89
Copy link

Awesome news - I wanted to try out rocksdb, but didn't have enough time.
Now I'll try again with even better client library

2016-08-24 15:28 GMT+03:00 Warren Falk notifications@github.com:

Closed #4 #4.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALnfPVBMG-7lcW8ymJr4CcQDBOyd9tlks5qjDjbgaJpZM4IZ2oE
.

Yours faithfully,
Gleb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants