Friday, 15 September 2006

Static Analysis And Scary Headlines

A few days ago Slashdot trumpeted the headline "611 Defects, 71 Vulnerabilities Found In Firefox", based on a post by Adam Harrison who had applied his company's static code analysis tool to the Firefox code. That's not an unfair summary since Harrison's post says "The analysis resulted in 655 defects and 71 potential security vulnerabilities."

The problem is Klocwork, like most other static analysis tools, reports false positives; i.e., it reports problems that are not actually bugs in the code. (More precisely, it may identify error behaviours that actually cannot occur in any run of the program.) That itself is not a problem, but when reporting the results of these tools you must make clear how many error reports the tool produced and how many of those have been verified by humans as corresponding to actual bugs which would affect some run of the program. In this case, it was not clear at all. We're told "655 defects", and then in comments Harrison claims "In this particular analysis we reviewed the entire results to verify the correctness of the defects." But Mozilla developers have been combing through the Klocwork reports and it turns out that most of them are not real bugs.

Here's one example:

902         compNameInFile = (char *) malloc(sizeof(char) * (stbuf.st_size + 1));
...
908 memset(compNameInFile, 0 , (stbuf.st_size + 1));
...
911 rv = fread((void *) compNameInFile, sizeof(char),
912 stbuf.st_size, dlMarkerFD);
...
923 if (strcmp(currComp->GetArchive(), compNameInFile) == 0)

The Klocwork analyzer claims that on line 923, there can be a string overflow reading from compNameInFile because it might not be null terminated. But in fact it's clear that there will always be at least one zero in the buffer.

In fact, people have looked at lot of the Klocwork reports and so far 2-3 of them have been judged genuine bugs (be sure to read the comments in those bugs...).

I'm sympathetic to static analysis tools; I did my PhD in the area :-). I really want Klocwork, Coverity, Oink and the rest to be successful, to become a standard part of the software development toolchain. But we've got to be honest and avoid the scare headlines.




It seems that even today's best general-purpose static analysis tools have a hard time finding high-value bugs in our code. We're having much more success with testing tools. I hypothesize that our interesting bugs are due to the violation of complex invariants that are not tracked by these general-purpose tools. These are invariants such as "at most one of mFoo and mBar are non-null", or "if frame F has a child C, then C's mParent points to F". Inferring these invariants automatically or proving things about them from the code is very hard, not least because they are often violated temporarily. But tools that don't understand these invariants are likely to always have a high level of false positives in our code.

I think hope lies in a few different directions. Tools like Klocwork can be refined to home in on clear errors and throw away the chaff. There are more model-checking-like tools --- really, tools for doing intelligent, directed, systematic testing, incorporating knowledge of the code --- that can find complex error scenarios and guarantee no false positives. And I would like to see frameworks like Oink mature to make it easy to write custom analyses for specific problems in certain applications, probably aided by annotations. I have several ideas for custom analyses in Mozilla.


13 comments:

  1. I think static analysis tools are the way of the future. They have certainly helped me a lot.
    However, like bug counts the numbers it presents are just about meaningless because they don't imply the real severity of the issue. I wish there was a way to prevent anyone from seeing the counts so they have no choice but to look at the actual information.
    Oh, and constant refinement it a must. FxCop is usable because the team is constantly and visible working on reducing false positives.

    ReplyDelete
  2. As Rob points out, many of the real problems in a program are in the violation of invariants that are at much higher levels of abstraction than the basic language level, whereas a generic checker can only check the layers of invariants that are close to the language level. As Scott McPeak points out, it is the developer who knows how the code is supposed to work. Therefore it is the developer who needs to tell the analysis tool how the code is supposed to work.
    Oink provides a dataflow analysis (and Scott's C++ front-end Elsa provides a function-local control-flow analysis). This is enough to check for format-string errors and SQL-injection attacks, etc. but really to get the most utility people are going to write their own analyses. The Oink framework is supposed to 1) minimize the amount of work to do that, and 2) by using the same AST and Type-system, allow analyses to re-use each other's results (by say reading each-other's annotations left on the AST) to build hybrid analyses.
    For this reason, I think it is an advantage that Oink is Open Source (BSD), so use it to write whatever analysis you need; please note that this is an active area of research so be prepared for it to be challenging :-). If you write one that helps you, make sure it is reasonably general and I can ship it with Oink.

    ReplyDelete
  3. That code is pretty troubling in other ways though. What is with this "sizeof(char) *" stuff? Is the implication that sizeof(char) might be different from 1? And if you insist on using it, why not use it in the call to memset too? They're not consistent.
    And then, why cast the first fread parameter to "void *"? Is it because fread is declared to take a void *???? The point of declaring it to take a void * is so you can pass any pointer you like without having to cast it.

    ReplyDelete
  4. what happens if stbuf.st_size is (size_t)(-1)?

    ReplyDelete
  5. Yes, there is a bug memset should be:
    memset(compNameInFile, 0 , sizeof(char)*(stbuf.st_size + 1));
    and not
    memset(compNameInFile, 0 ,(stbuf.st_size + 1));

    ReplyDelete
  6. sizeof(char) is guaranteed to be 1 always in the C standard. A char is one byte, which doesn't have to be 8 bits if you want to be pedantic, and char's signedness is implementation dependent. However, sizeof(char) is always 1.

    ReplyDelete
  7. I am very skeptical that one can formulate these invariants that some code inherently relies on. This is especially the case if you inherit old code where you merely understand what the original author had in mind and you only explore the edge cases by bug reports that flow in. So its no wonder for me that intelligent fuzz testing has a much higher yield rate. It does not require to formulate the invariants. I see it as a good code coverage tool with real data that tests interdependencies. Further if the test cases crash there is no discussion how bad a code flaw is.
    On the other side what I have seen from the coverity bug reports, they had a few shiny diamonds and more than enough boring stuff. What these tools provide is feedback against sloppy coding and one needs to fix this to prevent that the bad code disease further spreads.
    Keeping these possible but never executed 0 derefs hidden beyond a security protection for a long time seems to me overreach. I think we shall open the test results once the initial analysis has been completed. This would first debunk noisy security claims by tool vendors as everybody would see that 90% is just non critical stuff. Further it would allow newbies to get their foots wet as a lot of the bugs are just code cosmetics. Third it would free developers that have serious work to do to wade through that drivel. Our current handling is so against the open source paradigm of "Given enough eyeballs, all bugs are shallow." that I think we should change it.

    ReplyDelete
  8. The thing is, trying to fit static analysis around the asm-preproccesor called C, is just plainly insane.
    Just use a statically type checked language. I'm not talking about the so-called-statically-typed-languages that still require casts and thereby are as safe as a dynamic language. I'm talking about _real_ static type checking. Preferebly with the possibility to add your own constraints.
    Say Haskell. Say Mercury. Say Curry. Say Dylan.
    And the days that these types of languages were any slower than c/c++ are long gone. Mostly because of the size of modern applications. Code is not developped to be fast: its developped to be maintainable.
    So when we stop taking advantage of inlining ASM code into our C/C++, then it's time to move to a declarative programming language.

    ReplyDelete
  9. Robert O'Callahan18 September 2006 21:56

    I don't think even Simon Peyton-Jones would claim that Haskell is as fast as C/C++. He certainly wouldn't claim it for a huge modern application, because no-one's ever written one in Haskell.
    I would say the same for the other languages you mention, except that I've never met the designers of those languages so I don't know if they have any sense.

    ReplyDelete
  10. Reference on the sizeof c standard issue
    http://anubis.dkuug.dk/jtc1/sc22/open/n2794/n2794.txt
    """
    6.5.3.4 The sizeof operator
    [snip]
    [#3] When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
    """

    ReplyDelete
  11. Folks interested in FxCop might be interested by the tool NDepend:
    http://www.NDepend.com
    NDepend analyses source code and .NET assemblies. It allows controlling the complexity, the internal dependencies and the quality of .NET code.
    NDepend provides a language (CQL Code Query Language) dedicated to query and constraint a codebase.
    It also comes from with advanced code visualization (Dependencies Matrix, Metric treemap, Box and Arrows graph...), more than 60 metrics, facilities to generate reports and to be integrated with mainstream build technologies and development tools.
    NDepend also allows to compare precisely different versions of your codebase.

    ReplyDelete
  12. In regards to the comment about application-specific properties, this is certainly where the next level needs to be. Checking for bogon errors is important in the real world, but we got a lot of value out of using MOPS to do model checking on application-specific control flow invariants. Uno is another interesting tool to look at in this regard.

    ReplyDelete