Memory representation of a structure

When creating a structure you do not really care about the representation of this structure in memory. You expect the structure's size to be the sum of its components sizes. Unfortunately the size of a structure also depends on other parameters.

These parameters are mainly:

  • The CPU architecture (32 or 64 bits)
  • Some optimizations done by the compiler
  • The order in which the components appears in the structure

Lets have a look at the following basic structure:

struct
{
  int i_age;
  char *psz_name;
  int i_level;
} people_t;

For a 64 bits CPU, the structure will look like this in memory:

people_t non-packed structure compiled for a 64 bits CPU

As you noticed, the structure is full of holes when compiled for a 64 bits processor: it uses 50% more memory than the sum of the size of its elements. The explanation is really simple: the CPU can read faster aligned memory that non-aligned memory. In the case of a 64 bits processor, the alignment corresponds to 64 bits of memory (for a 32 bits CPU the right alignment is 32 bits). To improve performances, the compiler tries to align each variable of the structure on 64 bits.

Pahole : finding holes in your structures

Pahole is a tool that help you find out holes in your structures. On Debian you can install Pahole with the package dwarves.

gcc -g -o test test.c
pahole test
typedef struct {
  int i_age; /* 0 4 */
  
  /* XXX 4 bytes hole, try to pack */
  
  char *psz_name; /* 8 8 */
  int i_level; /* 16 4 */
  
  /* size: 24, cachelines: 1 */
  /* sum members: 16, holes: 1, sum holes: 4 */
  /* padding: 4 */
  /* last cacheline: 24 bytes */
} people_t;	/* definitions: 1 */

Pahole analyzes the binary produced by GCC (do not forget the -g switch to enable debug symbols) and lists the structures that contain holes. Pahole shows that:

  • There is a 4 bytes (32 bits) long hole between i_level and psz_name
  • The compiler adds 4 bytes of padding to fill the structure at the end
  • The size of the structure is 24 bytes though the sum of its members is only 16 bytes

We can now reorganize the elements inside the structure to reduce the size of this structure

struct
{
  char *psz_name;
  int i_age;
  int i_level;
} people_t;

The structure now looks like this in memory:

people_t packed structure compiled for a 64 bits CPU

Important structures in VLC

Let's have a look at the memory footprint of VLC media player when VLC isn't doing anything. Of course most of the memory is used by the Qt4 interface. Let's restart VLC without the Qt4 interface to look deeper in the core memory footprint.

Most of the memory used by an instance of VLC (without any interface) comes from the module bank. This structure lists the properties of every module the current VLC can launch. Actually there are 369 modules in the source tree. Moreover some of these modules depend on the architecture and the Operating System, thus most VLC instances have approximatively 200 modules. For each module, a structure called module_t is created. This structure contains another structure called module_config_t.

Analysis of this structure

With Pahole, we can look at the memory used by one instance of the structure

pahole --class_name=module_config_t src/modules/.libs/libvlccore_la-entry.o
struct module_config_t {
  int                        i_type;               /*     0     4 */
  
  /* XXX 4 bytes hole, try to pack */
  
  char *                     psz_type;             /*     8     8 */
  char *                     psz_name;             /*    16     8 */
  char                       i_short;              /*    24     1 */
  
  /* XXX 7 bytes hole, try to pack */
  
  char *                     psz_text;             /*    32     8 */
  char *                     psz_longtext;         /*    40     8 */
  module_value_t             value;                /*    48     8 */
  module_value_t             orig;                 /*    56     8 */
  /* --- cacheline 1 boundary (64 bytes) --- */
  module_value_t             saved;                /*    64     8 */
  module_value_t             min;                  /*    72     8 */
  module_value_t             max;                  /*    80     8 */
  vlc_callback_t             pf_callback;          /*    88     8 */
  void *                     p_callback_data;      /*    96     8 */
  char * *                   ppsz_list;            /*   104     8 */
  int *                      pi_list;              /*   112     8 */
  char * *                   ppsz_list_text;       /*   120     8 */
  /* --- cacheline 2 boundary (128 bytes) --- */
  int                        i_list;               /*   128     4 */
  
  /* XXX 4 bytes hole, try to pack */
  
  vlc_callback_t             pf_update_list;       /*   136     8 */
  vlc_callback_t *           ppf_action;           /*   144     8 */
  char * *                   ppsz_action_text;     /*   152     8 */
  int                        i_action;             /*   160     4 */
  _Bool                      b_dirty;              /*   164     1 */
  _Bool                      b_advanced;           /*   165     1 */
  _Bool                      b_internal;           /*   166     1 */
  _Bool                      b_restart;            /*   167     1 */
  char *                     psz_oldname;          /*   168     8 */
  _Bool                      b_removed;            /*   176     1 */
  _Bool                      b_autosave;           /*   177     1 */
  _Bool                      b_unsaveable;         /*   178     1 */
  _Bool                      b_safe;               /*   179     1 */
  
  /* size: 184, cachelines: 3 */
  /* sum members: 165, holes: 3, sum holes: 15 */
  /* padding: 4 */
  /* last cacheline: 56 bytes */
};

Pahole shows that the memory used by the structure is 15 bytes bigger than the sum of its elements.

Saving some memory

That's now really easy to save some memory by repacking the structure. The goal is simple: try to fill the holes. For example there are two holes of size 4 (just after i_type and i_list), If i_type and i_list are placed side by side, the hole disappears.

The manual packing was done some months ago in this commit. This change saved some kilo bytes of memory only by repacking one structure.