Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcmalloc large alloc crashes #245

Closed
hosom opened this issue Jan 15, 2019 · 16 comments
Closed

tcmalloc large alloc crashes #245

hosom opened this issue Jan 15, 2019 · 16 comments

Comments

@hosom
Copy link
Contributor

hosom commented Jan 15, 2019

Since 2.6, I've noticed an increase in these crashes in my environments. I'm able to replicate it across several installations monitoring the same traffic. Unfortunately, I'm unable to provide a pcap of the issue.

tcmalloc: large alloc 18446744073661849600 bytes == (nil) @  0x7f9330fb21e1 0x8db1e8 0x8e06be 0x8e0bc7 0x8e1353 0x8e1589 0x8e18e5 0x8e1a5d 0x8da2fc 0x934def 0x93543c 0x935501 0x930ea8 0x88d16f 0x88d3a8 0x88df90 0x88e603 0x88e7a5 0x88e9da 0x88eac5 0x862239 0x96ddb9 0x96ea8e 0x8cafa9 0x8cb2c9 0x8cacff 0x8c917b 0x8c5d9f 0x96dca7 0x5c2c86 0x679159
out of memory in new.
1547475466.125616 fatal error: out of memory in new.

I went ahead and printed all the symbols.

(gdb) info symbol 0x679159
NetSessions::DoNextPacket(double, Packet const*, IP_Hdr const*, EncapsulationStack const*) + 1897 in section .text
(gdb) info symbol 0x5c2c86
Connection::NextPacket(double, int, IP_Hdr const*, int, int, unsigned char const*&, int&, int&, Packet const*) + 166 in section .text
(gdb) info symbol 0x96dca7 
analyzer::Analyzer::NextPacket(int, unsigned char const*, bool, unsigned long, IP_Hdr const*, int) + 103 in section .text
(gdb) info symbol 0x8c5d9f
analyzer::tcp::TCP_Analyzer::DeliverPacket(int, unsigned char const*, bool, unsigned long, IP_Hdr const*, int) + 1935 in section .text
(gdb) info symbol 0x8c917b
analyzer::tcp::TCP_Endpoint::DataSent(double, unsigned long, int, int, unsigned char const*, IP_Hdr const*, tcphdr const*) + 187 in section .text
(gdb) info symbol 0x8cacff
analyzer::tcp::TCP_Reassembler::DataSent(double, unsigned long, int, unsigned char const*, analyzer::tcp::TCP_Flags, bool) + 111 in section .text
(gdb) info symbol 0x8cb2c9
analyzer::tcp::TCP_Reassembler::BlockInserted(DataBlock*) + 105 in section .text
(gdb) info symbol 0x8cafa9
analyzer::tcp::TCP_Reassembler::DeliverBlock(unsigned long, int, unsigned char const*) + 153 in section .text
(gdb) info symbol 0x96ea8e
analyzer::Analyzer::ForwardStream(int, unsigned char const*, bool) + 158 in section .text
(gdb) info symbol 0x96ddb9
analyzer::Analyzer::NextStream(int, unsigned char const*, bool) + 73 in section .text
(gdb) info symbol 0x862239
analyzer::smb::SMB_Analyzer::DeliverStream(int, unsigned char const*, bool) + 89 in section .text
(gdb) info symbol 0x88eac5
binpac::SMB::SMB_Flow::NewData(unsigned char const*, unsigned char const*) + 101 in section .text
(gdb) info symbol 0x88e9da
binpac::SMB::SMB_TCP::ParseBuffer(binpac::FlowBuffer*, binpac::SMB::ContextSMB*) + 458 in section .text
(gdb) info symbol 0x88e7a5
binpac::SMB::SMB_Protocol_Identifier::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 261 in section .text
(gdb) info symbol 0x88e603
binpac::SMB::SMB2_PDU::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 435 in section .text
(gdb) info symbol 0x88df90
binpac::SMB::SMB2_Message::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 816 in section .text
(gdb) info symbol 0x88d3a8
binpac::SMB::SMB2_read_response::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*, int) + 312 in section .text
(gdb) info symbol 0x88d16f
binpac::SMB::SMB_Conn::proc_smb2_read_response(binpac::SMB::SMB2_Header*, binpac::SMB::SMB2_read_response*) + 367 in section .text
(gdb) info symbol 0x930ea8
file_analysis::Manager::DataIn(unsigned char const*, unsigned long, unsigned long, analyzer::Tag, Connection*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 280 in section .text
(gdb) info symbol 0x935501
file_analysis::File::DataIn(unsigned char const*, unsigned long, unsigned long) + 49 in section .text
(gdb) info symbol 0x93543c 
file_analysis::File::DeliverChunk(unsigned char const*, unsigned long, unsigned long) + 1068 in section .text
(gdb) info symbol 0x934def
file_analysis::File::DeliverStream(unsigned char const*, unsigned long) + 415 in section .text
(gdb) info symbol 0x8da2fc
file_analysis::PE::DeliverStream(unsigned char const*, unsigned long) + 76 in section .text
(gdb) info symbol 0x8e1a5d
binpac::PE::File::NewData(unsigned char const*, unsigned char const*) + 61 in section .text
(gdb) info symbol 0x8e18e5
binpac::PE::PE_File::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*) + 597 in section .text
(gdb) info symbol 0x8e1589
binpac::PE::Portable_Executable::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*) + 89 in section .text
(gdb) info symbol 0x8e1353
binpac::PE::Headers::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*, int) + 291 in section .text
(gdb) info symbol 0x8e0bc7
binpac::PE::NT_Headers::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*, int) + 407 in section .text
(gdb) info symbol 0x8e06be
binpac::PE::Optional_Header::Parse(unsigned char const*, unsigned char const*, binpac::PE::ContextPE*, int) + 574 in section .text
(gdb) info symbol 0x8db1e8
binpac::PE::RVAS::Parse(unsigned char const*, unsigned char const*, int) + 136 in section .text

This may not be an issue with 2.6 specifically. I cannot confirm if the traffic pattern this is happening on was taking place prior to my 2.6 upgrade.

@JustinAzoff
Copy link
Contributor

FWIW,

18446744073661849600 == 0xFFFFFFFFFD282000

@jsiwek
Copy link
Contributor

jsiwek commented Jan 16, 2019

Did you mean you have a pcap to reproduce and just can't provide it (but maybe willing to dig on your own with hints) or don't have a pcap at all ?

I'm a bit stumped by the stack, which ends in RVAS::Parse of the PE file analyzer. Here's a snip of generated code from my build/src/file_analysis/analyzer/pe/pe_pac.cc

int RVAS::Parse(const_byteptr const t_begin_of_data, const_byteptr const t_end_of_data, int t_byteorder)
    {                                                                           
    // Parse "rvas"                                                             
    int t_rvas__arraylength;                                                    
    t_rvas__arraylength = 0;                                                    
    t_rvas__arraylength = num();                                                
    if ( t_rvas__arraylength < 0 )                                              
        {                                                                       
        throw binpac::ExceptionOutOfBound("RVAS:rvas",                          
          t_rvas__arraylength, (t_end_of_data) - (t_begin_of_data));            
        }                                                                       
    // Check bounds for static-size array: RVAS:rvas                            
    if ( t_begin_of_data + ((8) * (t_rvas__arraylength)) > t_end_of_data || t_begin_of_data + ((8) * (t_rvas__arraylength)) < t_begin_of_data )
        throw binpac::ExceptionOutOfBound("RVAS:rvas",                          
          ((8) * (t_rvas__arraylength)), (t_end_of_data) - (t_begin_of_data));  
    rvas__elem_ = 0;                                                            
    int t_rvas__elem__it;                                                       
    t_rvas__elem__it = 0;                                                       
    int t_rvas__size;                                                           
    rvas_ = new vector<RVA *>;                                                  
    rvas_->reserve(t_rvas__arraylength);  

The only allocation in this function seems to be that last line doing a std::vector::reserve(size_t). It is given a uint32-converted-to-int value as argument, opening the door to possible integer overflow action on conversion to size_t (as one may expect given the large number in the tcmalloc message), but there's also clearly a check for negative values followed by an additional bounds check that the length is within the bounds of the underlying data buffer. So if I'm reading everything right, the parser should already catch such problems before performing such a large allocation.

I did happen to notice this possibly similar/related report: gperftools/gperftools#497. Not that it's necessarily the same thing, just that there's precedent for possible bugs in tcmalloc causing such an issue and so maybe worth testing if things are more stable without tcmalloc.

@hosom
Copy link
Contributor Author

hosom commented Jan 17, 2019

Large mallocs don't generate a core file, so it is a little difficult for me to identify the stream / file that is causing this issue. I don't actually have a pcap or file that generates this issue.

@sethhall
Copy link
Member

sethhall commented Feb 8, 2019

Have you tried using jemalloc to see if the problem exists there? I've been meaning to make a change for quite some time that switches to jemalloc by default. tcmalloc seems to have bitrotted in the past few years. It would be nice to know if the problem is in the PE analyzer or the allocator as Jon suggested.

@hosom
Copy link
Contributor Author

hosom commented Feb 20, 2019

I rebuilt my development environment to use jemalloc and waited. We saw the same exact crash under jemalloc that we saw with tcmalloc. Unfortunately, jemalloc doesn't provide the nice stack that tcmalloc does, so I don't have much debug info to provide.

@JustinAzoff
Copy link
Contributor

I think this might have been fixed either with 78dcbcc or 6598fe9

@jsiwek
Copy link
Contributor

jsiwek commented May 31, 2019

The commits previously mentioned do address one potential way to end up with large allocations and they're in v2.6.2 now, so would be great to hear feedback if that version helps here.

@hosom
Copy link
Contributor Author

hosom commented Jun 26, 2019

I'm still seeing large alloc crashes in the PE analyzer under 2.6.2...

No core file found.

Bro 2.6.2
Linux 4.4.0-139-generic

Bro plugins:
Bro::Napatech - Packet acquisition via Napatech NTAPI (dynamic, version 1.0)

==== No reporter.log

==== stderr.log

tcmalloc: large alloc 18446744073518718976 bytes == (nil) @  0x7f90c22191e1 0x8e26c8 0x8e7b9a 0x8e8087 0x8e8843 0x8e8a89 0x8e8df5 0x8e8f6d 0x8e17c3 0x93c2ff 0x93c94c 0x93ca11 0x9383b8 0x876c35 0x88bb17 0x8953a5 0x895b03 0x895ca5 0x895ee0 0x895fd5 0x869789 0x9752c9 0x975f9e 0x8d2479 0x8d2799 0x8d21cf 0x8d064b 0x8cd26f 0x9751b7 0x5c4b36 0x67afb9
out of memory in new.
1561503382.124484 fatal error: out of memory in new.
(gdb) info symbol 0x67afb9
NetSessions::DoNextPacket(double, Packet const*, IP_Hdr const*, EncapsulationStack const*) + 1897 in section .text
(gdb) info symbol 0x5c4b36
Connection::NextPacket(double, int, IP_Hdr const*, int, int, unsigned char const*&, int&, int&, Packet const*) + 166 in section .text
(gdb) info symbol 0x9751b7
analyzer::Analyzer::NextPacket(int, unsigned char const*, bool, unsigned long, IP_Hdr const*, int) + 103 in section .text
(gdb) info symbol 0x8cd26f
analyzer::tcp::TCP_Analyzer::DeliverPacket(int, unsigned char const*, bool, unsigned long, IP_Hdr const*, int) + 1935 in section .text
(gdb) info symbol 0x8d064b
analyzer::tcp::TCP_Endpoint::DataSent(double, unsigned long, int, int, unsigned char const*, IP_Hdr const*, tcphdr const*) + 187 in section .text
(gdb) info symbol 0x8d21cf
analyzer::tcp::TCP_Reassembler::DataSent(double, unsigned long, int, unsigned char const*, analyzer::tcp::TCP_Flags, bool) + 111 in section .text
(gdb) info symbol 0x8d2799
analyzer::tcp::TCP_Reassembler::BlockInserted(DataBlock*) + 105 in section .text
(gdb) info symbol 0x8d2479
analyzer::tcp::TCP_Reassembler::DeliverBlock(unsigned long, int, unsigned char const*) + 153 in section .text
(gdb) info symbol 0x975f9e
analyzer::Analyzer::ForwardStream(int, unsigned char const*, bool) + 158 in section .text
(gdb) info symbol 0x9752c9
analyzer::Analyzer::NextStream(int, unsigned char const*, bool) + 73 in section .text
(gdb) info symbol 0x869789
analyzer::smb::SMB_Analyzer::DeliverStream(int, unsigned char const*, bool) + 89 in section .text
(gdb) info symbol 0x895fd5
binpac::SMB::SMB_Flow::NewData(unsigned char const*, unsigned char const*) + 101 in section .text
(gdb) info symbol 0x895ee0
binpac::SMB::SMB_TCP::ParseBuffer(binpac::FlowBuffer*, binpac::SMB::ContextSMB*) + 464 in section .text
(gdb) info symbol 0x895ca5
binpac::SMB::SMB_Protocol_Identifier::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 261 in section .text
(gdb) info symbol 0x895b03
binpac::SMB::SMB2_PDU::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 435 in section .text
(gdb) info symbol 0x8953a5
binpac::SMB::SMB2_Message::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*) + 581 in section .text
(gdb) info symbol 0x88bb17
binpac::SMB::SMB2_write_request::Parse(unsigned char const*, unsigned char const*, binpac::SMB::ContextSMB*, int) + 375 in section .text
(gdb) info symbol 0x876c35
binpac::SMB::SMB_Conn::proc_smb2_write_request(binpac::SMB::SMB2_Header*, binpac::SMB::SMB2_write_request*) + 245 in section .text
(gdb) info symbol 0x9383b8
file_analysis::Manager::DataIn(unsigned char const*, unsigned long, unsigned long, analyzer::Tag, Connection*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 280 in section .text
(gdb) info symbol 0x93ca11
file_analysis::File::DataIn(unsigned char const*, unsigned long, unsigned long) + 49 in section .text
(gdb) info symbol 0x93c94c
file_analysis::File::DeliverChunk(unsigned char const*, unsigned long, unsigned long) + 1068 in section .text
(gdb) info symbol 0x93c2ff
file_analysis::File::DeliverStream(unsigned char const*, unsigned long) + 415 in section .text
(gdb) info symbol 0x8e17c3
file_analysis::PE::DeliverStream(unsigned char const*, unsigned long) + 67 in section .text
(gdb) info symbol 0x8e8f6d
binpac::PE::File::NewData(unsigned char const*, unsigned char const*) + 61 in section .text
(gdb) info symbol 0x8e8df5
binpac::PE::PE_File::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*) + 613 in section .text
(gdb) info symbol 0x8e8a89
binpac::PE::Portable_Executable::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*) + 89 in section .text
(gdb) info symbol 0x8e8843
binpac::PE::Headers::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*, int) + 291 in section .text
(gdb) info symbol 0x8e8087
binpac::PE::NT_Headers::ParseBuffer(binpac::FlowBuffer*, binpac::PE::ContextPE*, int) + 407 in section .text
(gdb) info symbol 0x8e7b9a
binpac::PE::Optional_Header::Parse(unsigned char const*, unsigned char const*, binpac::PE::ContextPE*, int) + 570 in section .text
(gdb) info symbol 0x8e26c8
binpac::PE::RVAS::Parse(unsigned char const*, unsigned char const*, int) + 136 in section .text

@jsiwek
Copy link
Contributor

jsiwek commented Jun 27, 2019

@hosom if you're able to test, interested to know if this proposed patch has any effect:

zeek/binpac@0eee7ee

Can't say if it's directly related to this problem: a signed integer overflow is Undefined Behavior and hard to reason what sorts of weirdness to expect out of it, so best take it off the table before further troubleshooting.

@jsiwek
Copy link
Contributor

jsiwek commented Jul 12, 2019

@hosom any feedback for whether the last binpac patch (it's also now in zeek/master branch) may help or alter the situation ?

If not, maybe the only idea left to explore is trying to fuzz the PE analyzer to get a repro. In case the specific platform/compiler end up mattering, would be nice if you can give that version info.

@hosom
Copy link
Contributor Author

hosom commented Jul 12, 2019 via email

@hosom
Copy link
Contributor Author

hosom commented Jul 15, 2019

Installed master today. I'll report back on whether or not I see crashes.

@hosom
Copy link
Contributor Author

hosom commented Jul 25, 2019

It's been a while and I have not seen a crash since I updated to master. It looks like the binpac patch may have been the solution here.

@jsiwek
Copy link
Contributor

jsiwek commented Jul 29, 2019

Thanks for the feedback. Previously, what was the frequency of crashes (daily, weekly, etc.) ? Just trying to judge how likely it is to really be fixed.

@hosom
Copy link
Contributor Author

hosom commented Aug 6, 2019

We were seeing these crashes several times a day at the highest frequency. We haven't seen one in 20 days or so.

@jsiwek
Copy link
Contributor

jsiwek commented Aug 9, 2019

The binpac patch is now incorporated in v2.6.3.

@jsiwek jsiwek closed this as completed Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants