Increasing the Security of Software #

When it comes to using SBOMs to keep software secure, there are two main use cases:

Identifying vulnerable components. By enumerating the components contained in a given software artifact, you can identify which components deployed on your infrastructure are vulnerable to known explots.
Ensuring the integrity of software components. Likewise, ensuring the integrity of your build artifacts requires keep records that uniquely identify an asset and its provenance. It’s impossible to ensure that you are only deploying trusted components on your infrastructure without records of all of the trusted components that are available for deployment.

In both of these cases, the SBOM plays the critical role of providing a complete inventory of the components in use by a given software project—without understanding what components comprise your software artifacts, it’s impossible to know whether any of them are vulnerable or have been tampered with. In security applications, for this reason, SBOMs primarily serve as a source of truth for the components comprising a given asset.

Take a look at the following excerpt from a CycloneDX SBOM generated by Codenotary’s CAS tool for the python:3.10-alpine Docker image:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.3",
  "version": 1,
  "metadata": {
    "tools": [
      {
        "vendor": "Codenotary",
        "name": "cas",
        "version": "v1.0.3"
      }
    ],
    "component": {
      "type": "application",
      "name": "python:3.10-alpine",
      "version": ""
    }
  },
  "components": [
    {
      "bom-ref": "python:3.10-alpine-1",
      "type": "library",
      "name": ".python-rundeps",
      "version": "20220907.223701",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "f3105e48f2a5caae5d0d2b6cbba5468a06a111c2"
        }
      ],
      "purl": "pkg:generic/.python-rundeps@20220907.223701",
      "properties": [
        {
          "name": "LinkType",
          "value": "Static"
        }
      ]
    },
    {
      "bom-ref": "python:3.10-alpine-2",
      "type": "library",
      "name": "alpine-baselayout",
      "version": "3.2.0-r22",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "3c6c70ccb77b490fd2663506ae7727a638eda4a6"
        }
      ],
      # # #
    },
    # # #

In this example, the SBOM begins by defining the software artifact that was scanned, as well as the tool that was used to complete the scan. Beyond that, we have a set of components that were discovered in the Docker image, using the native package manager of this image itself (Alpine’s apk). The object defining each components gives the component’s name, the type of component (in this case, library components), the specific version of the component in the image, and a cryptographic hash identifying each component (SHA-1, in this case, as that is the algorithm used by apk to generate hashes).

Also included is information describing the runtime environment of each package in the operating system of the image, such as whether it’s a static or dynamic binary.

If we use the same tool to generate an SBOM in a different format, you will see that the scanner returns the same information, but formatted according to a different specification. For example, here is the same SBOM as above in SPDX format:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: python:3.10-alpine
DocumentNamespace: http://spdx.org/spdxdocs/python:3.10-alpine-64ef73cf-f862-4602-b384-42a9803c8098
Creator: Tool: Codenotary cas
Created: 2022-09-30T18:44:21Z

##### Software components

PackageName: .python-rundeps
SPDXID: SPDXRef-Package-1
PackageVersion: 20220907.223701
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageChecksum: SHA1: f3105e48f2a5caae5d0d2b6cbba5468a06a111c2
PackageSourceInfo: <text>pkg:generic/.python-rundeps@20220907.223701</text>
PackageLicenseConcluded: NOASSERTION
PackageLicenseDeclared: NOASSERTION
PackageCopyrightText: NOASSERTION
PackageComment: <text>Static, Direct</text>

PackageName: alpine-baselayout
SPDXID: SPDXRef-Package-2
PackageVersion: 3.2.0-r22
PackageDownloadLocation: NOASSERTION
FilesAnalyzed: false
PackageChecksum: SHA1: 3c6c70ccb77b490fd2663506ae7727a638eda4a6
PackageSourceInfo: <text>pkg:generic/alpine-baselayout@3.2.0-r22</text>
PackageLicenseConcluded: GPL-2.0-only
PackageLicenseDeclared: NOASSERTION
PackageCopyrightText: NOASSERTION
PackageComment: <text>Static, Direct</text>

. . .

As before, the SBOM defines the software artifact that was scanned, as well as the packages found within the Docker image. This time, the SBOM is formatted according to the SPDX specification, which is a popular format for software bill of materials. The metadata provided by the SPDX specification is a bit more verbose than the CycloneDX specification, but it can also provide more metadata describing the provenance of each component, including the license under which each component is distributed.

If we have a way to indentify components, how can we use that information to ensure the integrity of our software artifacts? And more importantly, how can we use that information to ensure we aren’t deploying vulnerable components on our servers?

Ensuring the Integrity of Software Artifacts #

Although identifying and recording assets is critical to ensuring their integrity, an SBOM alone is not an indicator of the trust of the software—only a manifest of the components that comprise the artifact. When you have a record of the hash of each component, it becomes possible to ensure that a software component in use hasn’t been modified since the last time it was scanned.

How can we use these manifests to ensure we are only deploying trusted software? When you are using a tool like Codenotary’s cas, the workflow for keeping software artifacts secure generally has three main steps:

Cryptographically signing the Software Bill of Materials generated for each software asset (codebase, container image, library package, build artifact, etc.).
Storing the signed SBOM in a trusted location, along with any relevant supporting metadata (such as which components are trusted, which are unsupported, etc.).
Verifying the integrity of the software asset before it is deployed, by scanning the asset and comparing the result to the signed SBOM.

Each of these three steps, however, raises a number of practical considerations:

How do we sign the SBOM?
How do we indicate our level of trust for each component?
How do we make a SBOM accessible from a trusted location?
How do we even deploy a “trusted location”?
How do we prevent someone from tampering with our “trusted location”?
How do we securely update our asset metadata (e.g. the “trust level” of each component)?
How do we verify a software asset is safe to deploy?

One solution to many of these problems is to use a cryptographically-verifiable immutable database to store SBOMs and their associated metadata.

Introduction to immudb, or: What is an immutable database? #

An immutable database stores data in cryptographically-verifiable data structures that make it possible to verify that the data has not been tampered with or corrupted in any way.

In an immutable database such as immudb, the data is stored in such a way that:

Existing data is never overwritten (that is, the database is append-only and individual records are immutable).
All changes to the database—such as record creation (by appending a new record) or modification (by appending a new version of an existing record)—are tracked and auditable.
Any tampering of the data is detectable by the database itself and by any client that interacts with the database (by verifying the hash tree of the data).
Responses from the database can be cryptographically signed by the server and verified by the client (to ensure the source of the data is authentic).

These characteristics of an immutable database offer an answer to most of the practical considerations raised above: an immutable database can be used as a “trusted location,” because you only need to trust well-known cryptographic hash algorithms to verify the integrity and authenticity of the stored data.

Keeping Deployments Safe with SBOMs #

When security metadata is stored in an immutable database, it becomes possible to create a verifiable chain of custody for each component deployed by your team. When you create an SBOM using one of Codenotary’s tools (cas with Community Attestation Service or vcn with Trustcenter), that manifest can be signed with a key that is associated with an individual developer or CI/CD pipeline.

Signing software assets #

This notarization process creates an immutable record of an asset’s manifest, the level of trust declared for each component, and a cryptographic signature which authenticates the source of the SBOM. This process creates cryptographic proof that an SBOM was created by a trusted source, and that it has not been modified after it was signed.

Now, consider what happens when a Docker image is signed with cas:

❯ cas notarize --bom docker://python:3.9-alpine

Resolving dependencies...

Authenticating dependencies...
 100% |██████████████████████████████████████████████████████| (36/36, 58 it/s)

Notarizing 36 dependencies ...
 100% |██████████████████████████████████████████████████████| (36/36, 39 it/s)

.python-rundeps@20220907.231849   b9bddeccfab9c3d7731f6b39360dcf3cfdeb1b7f Trusted
alpine-baselayout@3.2.0-r22       3c6c70ccb77b490fd2663506ae7727a638eda4a6 Trusted
alpine-baselayout-data@3.2.0-r22  d6554033bbe7f571edc82954fd97e59aa4c7f045 Trusted
. . .

Your assets will not be uploaded. They will be processed locally.

Kind:   docker
Name:   docker://python:3.9-alpine
Hash:   c9b90024bc4d49b1fa0ea4673b6eb1db1058cd1cba4b840d336bedf803a0afcf
Metadata: architecture="arm64"
    docker={
        "Architecture": "arm64",
        "Created": "2022-09-07T23:19:03.452996827Z",
        "DockerVersion": "20.10.12",
        "Id": "sha256:c9b90024bc4d49b1fa0ea4673b6eb1db1058cd1cba4b840d336bedf803a0afcf",
        . . .
    }
    platform="linux"
    version="3.9-alpine"

SignerID: bmlja0Babcdefgh12345==
Apikey revoked: no
Status:   TRUSTED
Dependencies:
    .python-rundeps@20220907.231849   b9bddeccfab9c3d7731f6b39360dcf3cfdeb1b7f
    alpine-baselayout@3.2.0-r22       3c6c70ccb77b490fd2663506ae7727a638eda4a6
    alpine-baselayout-data@3.2.0-r22  d6554033bbe7f571edc82954fd97e59aa4c7f045
    alpine-keys@2.4-r1                cffd2a49107574ba448f4b23b4bfc597676b9054
    apk-tools@2.12.9-r3               bd9d72a8be3f3e5f046759c4e82086b6b7195622
    busybox@1.35.0-r17                31ea3e2c718f4a2dee63d808a2e1156fdcfc15ba
    . . .

We can see from this output the different ways our practical considerations from above are addressed:

How do we sign the SBOM?: The SBOM is signed with a key that is associated with the developer or CI/CD pipeline that created it.
How do we indicate our level of trust for each component?
How do we make an SBOM accessible from a trusted location?
How do we prevent someone from tampering with our "trusted location"?: We associate a trust status with each component in the SBOM, referenced by the hash of each component. That transaction is signed by the developer or CI/CD pipeline, and is stored in a database that can be cryptographically verified.
How do we even deploy a "trusted location"?: The immutable database (immudb, specifically) backing the Community Attestation Service and Trustcenter product allows clients to act as auditors of the database, alerting any interested parties if the state of the database fails to validate in any way. CAS, for example, is audited by clients operated by Codenotary and by the community.
How do we securely update our asset metadata (e.g. the "trust level" of each component)?: Because an immutable database is append-only, the only way to update a value in a record is to create a new version of that record with the updated value. Not only does this ensure that any changes to past data result in an invalid hash tree, but it also means each update is associated with a signing key and recorded in a transaction log.
How do we verify a software asset is safe to deploy?: We authenticate the asset against a stored SBOM!

That brings us to…

Authenticating software assets #

When you authenticate an asset with cas or vcn, you are verifying that the asset’s manifest matches the signed SBOM data stored in an immutable database.

These tools scan a local asset to build a manifest of the components within, then check the hashes of the asset itself and each component against the trust values stored in the database. Then, if:

all of the hashes are found in the database,
the trust level of each component is at least as high as the level specified by the user (i.e. untrusted, unsupported, unknown, or trusted, with a default minimum of trusted), and
the trusted components were signed by a trusted key (i.e. one that has not been revoked),

then the asset is considered authenticated.

Connecting the Pipelines #

Fundamentally, an SBOM-based mechanism for authenticating software assets bridges the gaps in the chain of custody of a software asset, from the developer’s machine to any environment where it is eventually deployed.

At the beginning of any segment of the software supply chain where a software component is received from another party, we can enumerate the components in the asset, authenticate the manifest against an SBOM assembled by the asset’s creator, verify the level of trust for each component in the manifest (failing if any component is untrusted), and then proceed to use that authenticated asset in the next step of our software development pipeline.

At the end of any segment of the software supply chain where we hand off a software component to another custodian or deployment environment, we can generate an SBOM for our final artifact, authenticate the SBOM against our own records (failing if any component included in the final deliverable is untrusted), sign the SBOM to enable recipients of the asset to authenticate it, and then hand that notarized asset off to the next party in the software supply chain.

Of course, with the right tools, these steps can be reduced to a single command at each end of the pipeline. When an asset enters the pipeline:

cas authenticate --bom git://<source-code-for-build>
# or
vcn authenticate --bom git://<source-code-for-build>

This ensures we’re only dealing with the assets we’re expecting to see. Then, when the build pipeline is finished:

cas notarize --bom docker://<image-built-from-source>
# or
vcn notarize --bom image://<image-built-from-source>

Notarizing the SBOM for the final artifact ensures that any downstream consumers of the asset can authenticate it and verify the chain of custody!