Why Software Provenance Matters

May 29, 2009

We have announced and talked about the concept of “known provenance” as a crucial software-assurance and IT-lifecycle-management metric for some time, but it struck me today that I haven’t really underscored some of the reasons and use cases that led us to this conclusion.

Firstly, there are multiple dimensions to software integrity assurance that leverage cryptographic validation (hashing) methods including:

1. Do I know that the software elements that I am loading and running on my platform ARE what they say they are?)

2. Security quality assurance – can I couple (1) with a quantitative expression of code vulnerability statements? (Is it the code it purports to be and is it secure?) For example, our recent work with Veracode.

3. And what proof do I have that this was the code that I am using was actually built by the named vendor? (The filesystem may think that it is XYZ ISV, but it is the software that is vouching for itself, perhaps with the aid of a installer-embedded certificate. Inconclusive at best, especially after installation)

So we have taken the position early on that we need PROOF that the code was actually built by the named supplier as a crucial attribute of software and device validation or attestation. We call this (AKA Source Origin or Known Provenance.

The road to obtaining provenance and delivering it across various use cases is clearly the harder road when collecting software measurements. It requires “quality over quantity” dedication…..means that ISV’s and other software producers and integrators need to be involved. After all, true known provenance can only be delivered with a certifiable “chain of custody” all the way from the original software vendor and then managed all the way to the end system.

Open standards in method and schema are key, and the industry has done a decent job at collaborating on these—with additional iterations now pending.

But back to the title of this blog: So What?

Here is quick snapshot of use cases where provenance is increasingly critical:

1. Software Forensics – The objective here is to identify the problem by definitively separately the “good” from the “bad” (and the “unknown”). This is simply common sense, as our objective with forensics is to spend our time as efficiently as possible while looking for the “needle in the haystack”. Efficiency demands that we make the haystack smaller ASAP in our quest for the needle.

2. Supply Chain Assurance – This one deals with both the supply and purchaser concern of “Is this the device that I think it is? (i.e., was it in fact built by the named supplier, and is the h/w and s/w integrity demonstrable?).

3. Service Level Assurance (SLA) Management – This is the classic issue of “Ok, something doesn’t work, and whose fault is it?” (I’m sure you’ve never seen finger pointing on this one.)

4. Compliance – Needless to say, when provenance is clear and trusted, we can improve statements of compliance as well. (I know that I have the right software build in place—i.e., the right software manifest and integrity; and I prove the software and work product in the build is from the named author?)

So the power to prove, enabled by software and hardware provenance, is not a luxury item. In this age of globalization and outsourcing of design, manufacture, distribution, and systems management, we must establish and maintain “trust chains” to all the devices that we build and supply.

For suppliers of complex hardware and software, provenance is a cradle-to-grave issue. How can we truly and cost-effectively own and support our “Brand” without it?

So, as we enter the next chapter of ubiquitous computing (aka Web 2.0), our ability to design trust into our devices early in their lifecycle and systematically validate and pass that trust through the lifecycle of our “brand” in a non-repudiated manner will become a key market differentiator.

One might even go as far as to say:

Those who do not embrace this view may not survive the next wave of consolidation (which by the way is already well underway).



Enter Configuration-Based Whitelisting

May 27, 2009

This post is going to tie a couple of prior discussions together (I hope).

In August 2008, I posted a blog entitled:

Whitelist Emerges from the Shadows: Re-enforcing the Three-Tier Security and Systems Management Model

And in my most recent post entitled:

The “Whitelist Space” seems to be heating up a bit….

I took a stab as creating a taxonomy for whitelisting methods, as this space is really just taking shape – and clearly not all “code-whitelisting methods” are created equally.

So the “dot-connection” is this:

Effective whitelisting is really about total configuration enforcement, not just blocking individual elements. And as I stressed in the first blog, it is really a THREE-TIER architectural challenge, not a traditional two-tier problem like blacklist solutions.

And interestingly, the “heavy lifting” to make all this work is not at the ends of the architecture (Tier 1 or Tier 3) but in the middle – Tier 2.

(Refresher: IMHV, Tier 1 is the whitelist cloud services, Tier 2 is the domain whitelist caching and the reference-configuration management), and Tier 3 is the endpoint measurement and policy enforcement agent/client/OS/Hypervisor support).

We think that the real power, manageability and scalability of the method comes into view when we move from just “Good File” to “Configuration-based Whitelisting”, where we pass more whitelist “intelligence” to the method (things like parent-child relationships of the elements and provenance of the elements being enforced).

Clearly, the cloud and local whitelist agents are needed to collect and pass that information – but the key is supplementing that information with additional domain-specific configuration and element data, and organizing the entire lot into configuration-setting and software stacks that should be present on the platform under management.

And all of this must be platform/device decoupled, must be data-type independent (files, registry, config settings database fields, etc), must be mappable in the reference configurations and must be vendor/platform/software-type neutral.

Whew. Sorry, that was a mouthful.

Real and immediate use cases for this include requirements like Federal Desktop Core Configuration (FDCC) and other compliance issues.

These are exciting times for the space, IMHV. Stay tuned for more.


The “Whitelist Space” seems to be heating up a bit….

May 21, 2009

These pages have been talking about the bigger issues of “IT in Transition” for a long while. The shift to “defense in depth”, with the AV players adding whitelist methods, has been a persistent theme on these and other blog pages.

Well in the last few weeks, we’ve seen a couple major moves: first, Microsoft endorsing the concept and working with us to provide their signatures to the market, and now a significant move with the imminent acquisition of Solidcore by McAfee (MFE).


It is interesting that MFE will assimilate Solidcore in the Governance, Risk and Compliance Business Unit. It is what I would consider a “bite-size” move to application enforcement based on whitelisting by MFE. Recently, Solidcore has done a good job delivering value to fairly static endpoint devices – largely focused on the embedded device, ATM, and POS market spaces.

There is also mention in the release of SCADA devices commonly used to control physical infrastructure devices such as electrical and water control/management systems. This could bolster work that MFE may be targeting in Government, where they have done well with the ePO platform.
Solidcore describes their method as “dynamic whitelisting” – also pretty good marketing IMHO. So now we have another bullet on the whitelist method slide. So far we have:

  • Application Whitelisting or Allow Listing (single executable locking/blocking/allowance)
  • Dynamic Whitelisting (aka Self-Referencing – see below)
  • Whitelist Caching (this is what Symantec is doing in their latest Norton offerings so that they don’t have to rescan “known code” again with their malicious detection tools)
  • Comprehensive Whitelisting (this is a superset of Application Whitelisting where entire applications or software stacks may be “measured”, and based on device health” determined by these broader measurements – certain policies may be invoked (like allow/deny platform access to other resources)

(These are the just the “code signing” methods. There are other whitelisting and reputation services being employed for email and URL filtering that is another category entirely.)

Dynamic whitelisting is basically a synonym for “self-learned or self-referencing” configuration image and integrity models where the “whitelist” is derived from the device(s) themselves. Tripwire has been doing this pretty well for a few years.

(Full disclosure again – I co-founded Tripwire, and Solidcore competes directly with Tripwire in the desktop and server integrity market space)

While Self-Referencing whitelisting can be useful, it has a number of limitations and drawbacks. Scalability, manageability, and noise management are just a few of them. By “noise management”, I mean too many false positives, such as when merely upgrading a version of software generates thousands of “file-changed” hits. Also, what if your reference master was corrupted? Or … ?

So, on the one hand, we are happy to see a major AV player dip a toe into the whitelist waters as another validation for the space. We’ll be even more excited when customers and vendors really stretch their legs – and push the envelope with deep and comprehensive whitelisting and reference configuration management methods.

Let’s move beyond executable-lock-and-block methods, and configuration monitoring based on self-learned methods – and get to full and scalable compute-platform attestation, with both root of trust (Trust PROOF built INTO the platform) and known-provenance, list-based whitelisting (PROOF that the code was built by the named authors).

Connecting these dots is necessary to have true platform-intrinsic, end-to-end trust – not just to validate the “easy devices” like POS – but for the more complex servers and workstations use cases.
Yes, it is hard. But the pain will be worth the gain.

It’s time to build more trust into our systems.