A faster YAML loader

sam. 20 août 2016 by Rémi Duraffort

That's the second issue I had when playing with the LAVA log viewer.

In the new versions of LAVA, the logs are formatted in YAML:

- {"dt": "2016-08-18T14:24:01.096308", "lvl": "info", "msg": "start: 1 tftp-deploy (max 300s)"}
- {"dt": "2016-08-18T14:24:01.099413", "lvl": "debug", "msg": "start: 1.1 download_retry (max 300s)"}
- {"dt": "2016-08-18T14:24:01.100674", "lvl": "debug", "msg": "start: 1.1.1 file_download (max 300s)"}

That's really convenient, but when a job is generating a lot of logs, loading this YAML files is becoming longer and longer:

$ time python -c "import yaml; y=yaml.load(open('output.yaml'));"
18,25s user 0,23s system 100% cpu 18,475 total
$ wc -l output.yaml
36817 output.yaml

But 18s to load 36817 lines of text sounds unreasonable. I looked for some explanation and found that by default, the Python YAML parser, is using the Pure-Python loader instead of the (way) faster C implementation.

So, in order to use the faster C implementation, you should use:

$ time python -c "import yaml; y=yaml.load(open('output.yaml'), Loader=yaml.CLoader);"
2,28s user 0,06s system 99% cpu 2,346 total

That's still really slow, but we can live with that for the moment.