Prevalence vs. Provenance: The McAfee Incident

April 27, 2010

April 21, 2010 was a bad day for our friends at McAfee.   Let’s call it Black Wednesday for the purposes of this blog.

First some background:

There has been a lot of rhetoric around “Reputation Services.”  Here is the lay person, real world idea behind Reputation Services:

If I have seen you before, and I think I recognize you, then I’ll add you to my “reputation list.”

This is the idea of “prevalence-based” whitelisting.  All of the big guys (Symantec, McAfee, Trend) have some form of this.  Not that these “voting-based” methods aren’t useful, they are, but they are “community-based” and therefore have some major limitations.  Given that reputation still falls under the category of “Assumed Trust,” rather than “Explicit Trust,” it still has a significant chance of errors (False Positive, False Negative).

On Black Wednesday McAfee learned an important  lesson the hard way.  My point here is NOT to pile on the effort to vilify McAfee.  It could, and might have happened with any other vendor that relies on Assumed Trust methods.

Here are some of the posts from the McAfee web site:

Let me break this down a bit more:

For those of you that commented in the McAfee Blog “What the heck is a false positive?” – let me provide a quick answer.

In blacklisting methods like AV, it is incorrectly reporting something that is “good” as “bad”.  Reciprocally, a False Negative is incorrectly reporting that something that is “bad” as “good”. On Black Wednesday Mcafee’s software incorrectly reported a critical Windows executable as bad, and subsequently quarantined the file.

Whitelist or blacklist aside for the moment, the issue is really all about signal to noise ratio. How do I attenuate the signal, and make it unambiguous?  I can either pump up the “detection” amplitude, or reduce the “noise” – preferably both.

I wrote a three-part blog on this a few months ago for geeks like me to go deeper into this subject. Here are the references to that work:

So what is “Software Provenance” and why does it matter? Think about it this way. All software has DNA.  If we take a swab of that DNA upstream from the community, preferably when the software is being “born” at the actual manufacturing site, then we have a real edge, which is:

  • We have supply chain knowledge (I can tell whether code found in the wild is actually the code that was built and shipped by the named vendor).
  • We have more certainty that the code is “real” and not just assumed real as is often the case with graylist and reputation-based systems.

True whitelisting does this.  It provides a mechanism to set and enforce EXPLICIT TRUST with individual software components, packages, and indeed entire business services from power-on through cursor move.  All other methods are just extensions of the old assumed trust methods, and are subject to sever signal to noise issues.

End-to-end explicit trust based or high-resolution reference image management, sourced with known-provenance whitelisting is on its way.    This transition will demark our graduation to real Security 2.0 methods

We need Security 2.0 now.   Until we fully grasp this, I can almost guarantee we will see another Black Wednesday.