Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Apr 2013 16:13:02 -0400
From:  <jfoug@....net>
To: john-dev@...ts.openwall.com
Subject: New addition to JtR.  memdbg, self-validating  memory management
 functions

Introducing memdbg for JtR.  This is set of shell functions around all normal memory allocation routines (malloc, calloc, realloc, strdup, free, AND JtR's mem_alloc, mem_calloc and MEM_FREE).  This entire set of code is driven to be on or off, by a single #define.  That is #define MEMDBG_ON   If that is set (it is in memdbg_defines.h, and is OFF by default, in JtR releases), then the basic memory debugging is enabled. If this #define is not defined (it is commented out normally, and will not be defined, unless uncommented prior to a build), then malloc == malloc, free == free, etc. In other words, JtR will behave 100% the same as it does today, with absolutely no debugging code wrapping the memory management.

What the debugging code does (with MEMDBG_ON defined), is that every malloc/alloc/strdup/mem_alloc/mem_free will be called in a way, where the filename and line number where the allocation occurs will be passed to the allocation function 'wrapper'. This wrapper will allocate the final buffer.  The amount of memory allocated is larger than what was requested by the client code. This is because there is a structure added to the start of the buffer, and a structure added to the end (after the amount of memory requested). The trailing buffer is just a fence post (4 byte), for overflow detection.  It is butted tight against the tail end of the memory buffer returned to the client.  The first structure contains pointers for a linked list, a pointer pointing to the 2nd buffer, the size of the memory allocated, and some other data, including another fence post right against the buffer returned to the client (for under flow detection).  Then within each free, the buffer is 'validated'. The fence posts both before and after the buffer are validated, then the buffer is unlinked from the linked list, and optionally freed.  Then, upon exit, a single call is made, to a function, that will display any non-freed memory, which will include things like the size, allocation count, filename and line where it was allocated, etc.  It makes finding memory leaks, or unfreed memory, very easy to do, and make the run of JtR almost force them to be fixed, because they are the last thing shown on screen (stderr).

There is a 2nd level, (again controlled by a single #define).  This level, the free calls smash the fence posts (under/overflow data) and also smash the memory buffer.  They unlink the memory item from the allocation linked list. However, instead of simply freeing the original buffer, this buffer is added to a 2nd linked list. This linked list is a list of freed objects.  Then at many times, this linked list is also validated (similar to the real memory list).  During this check, there are different 'levels' of checking.  If the define (MEMDBG_EXTRA_CHECKS) is not set, which turns on keeping the freed buffers, only the allocation linked list is checked (checking for over/underflows). If MEMDBG_EXTRA_CHECKS is set, then there will be a freed memory list.  The checking function can check data in this freed memory list.  The checks can be much stronger, since we know a lot more about the freed memory, than about the allocated memory.   Within the free memory, we check the fence posts (they were smashed, but done so with another constant that we can check). Also, the entire original client buffer gets smashed with \xCD\xCD\xCD ... bytes.  We can check part of this, or check the entire buffer (looking for wild stray pointers, client code using pointers after freed, etc.  NOTE, this code (checking function), can be called anywhere within client code. It may easily help track down problems during development.  NOTE, it does slow down runtime, possibly significantly. Also, on formats which allocate a LOT of memory, this method may not be functional (or may consume a lot of memory).  I have added code, that when allocations return NULL, that the oldest items from the freed list are fully freed from memory, until the allocation succeeds (or until we are out of free list).  On my win32 builds (including cygwin), this works, and formats such as rar (which allocate a LOT of memory), work just fine.  However, I build on a 64 bit BackTrack VM, with low memory, and  a small swap, and the code ran until memory was full, and until the swap was full, and then the alloc simply exited, and did not return null.

To make all of this work, every .c file would simply have a #include "memdbg.h" as the very last include.  It has to be the last include for technical reasons.  If there is any header that tries to declare malloc/free/calloc, etc after the  memdbg.h file has been included, then that declaration will end up 'broken', sort of a 1/2 declaration and 1/2 definition. The memdbg.h technically does not have to be absolute last, but that is by far the safest. Then within the .c code, ANY memory management function will be properly handled, depending upon the #define of MEMDBG_ON.

So, to wrap things up.  Normally, users would NOT be building with this memory debugging shell code being turned on at all. They would never gain any benefit, but also would have no impact at all. The code will be 100% identical to JtR without the memdbg code in it at all.  Normally, a JtR developer build JtR with at least MEMDBG_ON turned on, and often with MEMDBG_EXTRA_CHECKS also defined.  That way, any memory issue would show up right away, AND be very easy to track down.

It will probably be wise, to add memdbg_defines.h into the .gitignore file, so that a users/developers local changes to that file are kept local to that user.

I am sure there will be other questions over time.  Just note, that the code should be very easy to get benefit from. Simply put an #include "memdbg.h" in every .c file and uncomment the #define MEMDBG_ON during all development (or just leave it uncompiled ALL the time), all of the work will get done, and memory issues will be shown instantly and automatically.

Jim.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.