Valgrind and realtime softwares

Valgrind is a really useful tool that checks for memory leaks, wrong memory access and many more properties. In order to achieve this task, valgrind will scrutinize every instructions that manipulate memory executed by the tested program. This process will slow down the application by a factor from 5 to 100 times.

Most of the time, this slowdown will not change the final behavior and results.

However some applications, like VLC media player, use time dependent algorithms that will be affected by such slow down. For instance, if you run VLC under valgrind while decoding an HD video:

#!/bin/bash
~ valgrind --leak-check=full ./vlc -I dummy -A dummy video.mkv
[...]
VLC media player 2.1.0-git Rincewind (revision 1.3.0-git-3513-g970b2ac)
[0x7115ef8] dummy interface: using the dummy interface module...
[0x7883748] avcodec decoder error: more than 5 seconds of late video -> dropping frame (computer too slow ?)
[0x7883748] avcodec decoder error: more than 5 seconds of late video -> dropping frame (computer too slow ?)
[0x7883748] avcodec decoder error: more than 5 seconds of late video -> dropping frame (computer too slow ?)
[0x7883748] avcodec decoder error: more than 5 seconds of late video -> dropping frame (computer too slow ?)

VLC is printing errors and not displaying the video because of the slow down. The behavior of VLC is highly affected as most pictures will not be decoded in time and will be dropped without being displayed.

Hiding bugs from valgrind

Most people will think that this is not an issue as we only use valgrind for debugging and that we expect the video player not to be usable in this case. In fact, this will change the behavior of VLC and may mask out some bugs in the application.

VLC will not follows the normal path and will take shortcuts to drop the video. If a bug is hidden in the last stage of the decoding pipeline, we won't be able to see it under valgrind monitoring.

Let's look at this simple source code that leaks under normal circumstances but not when ran under valgrind:

#include <stdio.h>
#include <math.h>
#include <time.h>
int main()
{
  int i;
  double j = 0.0;
  time_t begin = time(NULL);
  for(i = 0; i < 100000000; i++)
    j += sqrt(i);
  time_t end = time(NULL);
  /* We cannot wait for more than 5 seconds for this computation */
  if(end - begin < 5)
  {
    char *psz_str;
    asprintf(&psz_str, "Took: %jus\n\tresult: %f", end - begin, j);
    puts(psz_str);
    return 0;
  }
  else
  {
    fprintf(stderr, "Computer too slow for this task!\n");
    return 1;
  }
}

It's obvious that this programming is leaking the memory allocated by asprintf but valgrind will not be able to see. Because of the slow down, the application will take the error path and not call asprintf:

#!/bin/sh
~ valgrind --leak-check=full ./fail
[...]
Computer too slow for this task!
==6941==
==6941== HEAP SUMMARY:
==6941==     in use at exit: 0 bytes in 0 blocks
==6941==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==6941==
==6941== All heap blocks were freed -- no leaks are possible
[...]

Introducing timescaler

Principles

To work around this issue, I developed timescaler. This library hooks time-dependent functions exported by the libc to scale the time seen by the application. For instance, an application will call sleep(s) in order to sleep for s seconds and will in fact sleep for s*scaling seconds.

This scaling will compensate the slow down of the valgrind checks.

Basic examples

In the following basic example, we ask the process to sleep for 1 seconds while measuring the time with another process. In the second and third runs, timescaler is hooking the time-related functions:

#!/bin/sh
~ time sleep 1
1.022 total
~ time env TIMESCALER_SCALE=2 LD_PRELOAD=timescaler.so sleep 1
2.003 total
~ time env TIMESCALER_SCALE=10 LD_PRELOAD=timescaler.so sleep 1
10.003 total

timescaler is using the LD_PRELOAD mechanism to hooks some functions form the libc. The time scaling is controled be the environment variable TIMESCALER_SCALE. When set to 2, the time is running 2 times slower for the application than for the real world.

timescaler on VLC/Valgrind

Looking back at our first example with VLC media player under valgrind monitoring:

#!/bin/bash
~ export TIMESCALER_SCALE=10 LD_PRELOAD=timescaler.so
~ valgrind --leak-check=full ./vlc -I dummy -A dummy video.mkv
[...]
VLC media player 2.1.0-git Rincewind (revision 1.3.0-git-3513-g970b2ac)
[0x710e7c8] dummy interface: using the dummy interface module...
[no avcodec errors]

VLC is no longer dropping images due to the slow down. If you look at the video that VLC is playing, you will see the video playing really slowly, images by images. But for VLC the video is playing at the right speed, without any pictures decoded too late.

Finding back our bugs

We can now try timescaler on our simple example and see if valgrind is now able to find the memory leak:

#!/bin/sh
~ TIMESCALER_SCALE=10 LD_PRELOAD=timescaler.so valgrind --leak-check=full ./fail
[...]
The computation took: 1 seconds
result: 666666661666.567017
==6982==
==6982== HEAP SUMMARY:
==6982==     in use at exit: 93 bytes in 2 blocks
==6982==   total heap usage: 3 allocs, 1 frees, 193 bytes allocated
==6982==
==6982== 61 bytes in 1 blocks are definitely lost in loss record 2 of 2
==6982==    at 0x4C275A2: realloc (vg_replace_malloc.c:525)
==6982==    by 0x5320602: vasprintf (vasprintf.c:86)
==6982==    by 0x5304817: asprintf (asprintf.c:37)
==6982==    by 0x40079B: main (in /tmp/fail)
==6982==
==6982== LEAK SUMMARY:
==6982==    definitely lost: 61 bytes in 1 blocks
==6982==    indirectly lost: 0 bytes in 0 blocks
==6982==      possibly lost: 0 bytes in 0 blocks
==6982==    still reachable: 32 bytes in 1 blocks
==6982==         suppressed: 0 bytes in 0 blocks
==6982== Reachable blocks (those to which a pointer was found) are not shown.

As timescaler compensate the slow down of valgrind, the application behave normaly and the leak is detected by valgrind.

Getting timescaler

The timescaler source code can be found on gihub.

timescaler only hooks a subset of the time-dependent libc functions that it can hooks. This subset is enough to scale the time of most applications including VLC media player. If you want to contribute to timescaler, do not hesitate to send mail, patches, pull request, bug reports ...