Skip to content

The True Cost of "False Positives" in Application Security

    
crying-wolf-the-true-cost-of-false-positive-vulnerabilities-1.jpg

crying-wolf-the-true-cost-of-false-positive-vulnerabilities.jpgRemember the story of the boy who cried wolf?  His pranks were "false alarms" - defined as "a mistaken or intentionally misleading alert that something is wrong and needs attention."  False alarms from application security tools are certainly annoying, but how do they affect the overall economics of an application security program? As it turns out, they make all the difference.

False alarms prevent you from fixing true vulnerabilities

When the boy cried wolf, the villagers didn't know whether the wolf was real or not.  They all had to grab their torches and pitchforks and run out to the field to chase off the perceived threat every time, real or not.  It's the same for application security vulnerability false alarms. Put another way, you can't just investigate the false alarms... you have to investigate them all!

Figuring out if a tool-reported vulnerability is true or not can take anywhere from ten minutes (if you're really good) to many hours. If you're resource constrained -- and just about every company is -- then you simply can't investigate every single vulnerability that your tools report.

For example, imagine that I use a static or dynamic scanner on an application and the tool generates 400 possible vulnerabilities, of which only 40 are true positive vulnerabilities. If you use these tools, then you know these are very conservative numbers.

crying-wolf-the-true-cost-of-false-positive-vulnerabilities-1.jpg

Let's also imagine that I only have time to go through 100 of these “possibles”. Even if I'm really fast, it's going to take me 10 minutes to investigate each of these. This adds up to over eight days to do all 400.  But I only have two days, so I'll just analyze 25%. That means I'll confirm 10 true positives and miss the other 30 real vulnerabilities in my application.

So finding true positives is important, but it can backfire if they’re buried in a ton of false alarms. In this case, my tool's false positives prevented me from knowing anything about 75% of the vulnerabilities in my application, much less fixing them.

You have to measure coverage and accuracy in application security

Dynamic and static scanners have been around for over a decade, but they still have serious accuracy problems (and aren't likely to improve). Therefore it's absolutely critical to understand exactly what the strengths and weaknesses of each tool are. Remember, while false positives are expensive and annoying, "false negatives" will kill you. These are true vulnerabilities that are simply not detected by a tool. All you will see is a PDF report, and you'll have a false sense of security -- blissfully unaware of the serious vulnerabilities your tool missed.

Many organizations evaluate application security tools by running them on a few of their applications, or possibly a vulnerable application like OWASP WebGoat, and asking "did it find anything?"  But this is a *terrible* way to measure tools! Because nobody knows exactly what the expected results are supposed to look like, it's impossible to say anything meaningful about the accuracy of the tool.

If you don't know what your tool is good and bad at, try running it on the OWASP Benchmark. It's a collection of thousands of test cases designed to measure whether your application security tools have certain basic capabilities. There are no surprises in the OWASP Benchmark, all the tests are free and open, and anyone can reproduce the results. But I guarantee that you'll be surprised by the results.

Scaling application security is a people problem

Application security is the leading cause of breaches and the most significant enterprise security challenge. The scale of this problem absolutely demands automation. But automation that produces high levels of false alarms requires human experts to triage the output. That’s not really automation and it doesn’t really scale. Neither does the "tool soup" offered by some vendors. In fact, having more tools exacerbates the people problem.

This is not to say that noisy tools can't be useful. But you have to really understand the use case.  Security researchers might use tools that have very high false alarm rates because they are looking to augment a manual review and they have the skill to triage the results. However, pushing these same tools to developers will end in frustration and minimal (or even negative) benefits.

And I certainly don't mean to imply that you don’t need application security experts. But we don’t need to waste their time "triaging" false positives. We need them doing threat modeling, improving enterprise defenses, application security architecture, monitoring new attack techniques, and other strategic activities.

Bottom line: accuracy matters!

runtime-application-self-protection-rasp

 

Jeff Williams, Co-Founder, Chief Technology Officer

Jeff Williams, Co-Founder, Chief Technology Officer

Jeff brings more than 20 years of security leadership experience as co-founder and Chief Technology Officer of Contrast Security. He recently authored the DZone DevSecOps, IAST, and RASP refcards and speaks frequently at conferences including JavaOne (Java Rockstar), BlackHat, QCon, RSA, OWASP, Velocity, and PivotalOne. Jeff is also a founder and major contributor to OWASP, where he served as Global Chairman for 9 years, and created the OWASP Top 10, OWASP Enterprise Security API, OWASP Application Security Verification Standard, XSS Prevention Cheat Sheet, and many more popular open source projects. Jeff has a BA from Virginia, an MA from George Mason, and a JD from Georgetown.