Concurrent Patterns and Best Practices
上QQ阅读APP看书,第一时间看更新

Race conditions and heisenbugs

The lost update is an example of a race condition. A race condition means that the correctness of the program (the way it is expected to work) depends on the relative timing of the threads getting scheduled. So sometimes it works right, and sometimes it does not!

This is a situation that is very hard to debug. We need to reproduce a problem to investigate it, possibly running it in a debugger. What makes it hard is that the race condition cannot be reproduced! The sequence of interleaved instructions depends on the relative timing of events that are strongly influenced by the environment. Delays are caused by other running programs, other network traffic, OS scheduling decisions, variations in the processor's clock speed, and so on. A program containing a race condition may exhibit different behavior, at different times.

A heisenbug and race conditions are explained in the diagram:

These are heisenbugsessentially nondeterministic and hard to reproduce. If we try debugging a heisenbug by attaching a debugger, the bug may disappear!  

There is simply no way to debug and fix these. There is some tooling support, such as the tha tool (https://docs.oracle.com/cd/E37069_01/html/E54439/tha-1.html) and helgrind (http://valgrind.org/docs/manual/drd-manual.html); however, these are language or platform specific, and don't necessarily prove the absence of races.

Clearly, we need to avoid race conditions by design, hence the need to study concurrency patterns and this book.