Adding Auditing to Pip

westurner · on Aug 10, 2022

Looks like I'm a little bit late to this article about mailing list discussion about adding ~CVE search to pip.

The broader trend here is to identify the names, versions, and hashes of all software installed packages in all languages and present an SBOM [1][2]. Does/would `pip audit` also lookup CVE vulns for extension modules written in other programming languages like C, Go, and Rust; or do existing tools that also already lookup vulns for Python packages

[1] https://westurner.github.io/hnlog/ ; Ctrl-F "SBOM"

[2] "Existing artifact vuln scanners, databases, and specs?" https://github.com/google/osv.dev/issues/55#issue-802542447

FWIW, from the OSV README https://github.com/google/osv.dev:

> This is an ongoing project. We encourage open source ecosystems to adopt the OpenSSF Vulnerability format to enable open source users to easily aggregate and consume vulnerabilities across all ecosystesm. See our blog post for more details.

> The following ecosystems have vulnerabilities encoded in this format:

> GitHub Advisory Database (CC-BY 4.0), PyPI Advisory Database (CC-BY 4.0), Go Vulnerability Database (CC-BY 4.0), Rust Advisory Database (CC0 1.0), Global Security Database (CC0 1.0) OSS-Fuzz (CC-BY 4.0)

> Together, these include vulnerabilities from:

> npm, Maven, Go, NuGet, PyPI, RubyGems, crates.io, Packagist, Linux, OSS-Fuzz

woodruffw · on Aug 11, 2022

> Does/would `pip audit` also lookup CVE vulns for extension modules written in other programming languages like C, Go, and Rust

For the time being, the plan is to focus `pip-audit` on vulnerabilities in Python packages themselves, not native dependencies of packages that happen to contain binary extensions. The line is, however, blurry: if a Python package is written in C or Rust and has a vulnerability therein, that would be considered a Python ecosystem vulnerability for the purposes of the PYSEC DB. A vulnerability in libssl that happens to be exposed via a Python extension, however, would not be.

westurner · on Aug 11, 2022

So something more comprehensive for the complete SBOM for all languages and extension modules is also advisable for "[Software] Component Inventory and Risk Assessment".

From the article:

> The PyPA maintains an advisory database that stores vulnerabilities affecting PyPI packages in YAML format. For example, pip-audit reported that the version of Babel on my system is vulnerable to PYSEC-2021-421, which is a local code-execution flaw. That PYSEC advisory refers to CVE-2021-42771, which is how the flaw is known to the wider world.

> As it turns out, my system is actually not vulnerable to CVE-2021-42771, as the Ubuntu security entry shows. The pip-audit tool looks at the version numbers of the installed PyPI package to decide which are vulnerable, but Linux distributions regularly backport fixes into earlier versions so the PyPI package version number does not tell the whole story—at least for those who get their PyPI packages from distributions rather than via pip.

pypa/advisory-database: https://github.com/pypa/advisory-database

woodruffw · on Aug 10, 2022

Oh hello, that’s my project. Happy to answer questions!

nighthawk454 · on Aug 10, 2022

Thanks for your work! Reading the article, seems like the main debate is on whether this should be included in pip. Fwiw, as a user experience, including it seems best. I think the experience npm (tries to) provide is a fine goal. However, whether it's a priority for pip and burdening maintainers is another matter.

In any case, I'm more concerned with not repeating npm audit's mistakes. Abramov's blog post linked in the article [1] popped to my mind right away.

Can you say any more about how pip-audit won't "cry wolf"? Especially since Python doesn't have great packaging standardization around tools, dev-dependencies, build dependencies, pip-tools vs pipenv vs poetry, etc.

[1] https://overreacted.io/npm-audit-broken-by-design/

woodruffw · on Aug 11, 2022

Sorry for the late response!

Yes, this is a great point: we absolutely do not want either the current instantiation of pip-audit or any potential integration to cause security fatigue. Our proposed integration plan was made with this anti-goal in mind: audits will never be “automatic” the way they are in npm, meaning that users will not be suddenly put on the spot to make security decisions about their dependencies. Instead, auditing will always be an intentional action: users will have to explicitly run `pip audit` to receive notices, which they can address synchronously.

Separately, on the data side: the JS ecosystem suffers from a deluge of low-value vulnerabilities (things like ReDoS weaknesses), which in turn produce more fatigue. The Python ecosystem avoids those kinds of vulnerabilities mostly incidentally, since Python is not (generally) run on the client side of the browser in the way that JS is. But more formally, our hope is that the community vulnerability database will help us balance fatigue concerns with “completeness” and expose sufficient metadata to allow users to make policy-style decisions about which potential vulnerabilities matter to them.

nighthawk454 · on Aug 11, 2022

Thanks! I’m not feeling as strongly about needing it to be explicit-run only. Realistically, I’m gonna set up dependabot or something anyway. I actually like that part of npm audit - when I’m doing “package management things” it’s convenient to get those alerts.

To me, the problem of noise is wholly separate to when/how I run it. Solving noise by making it run less frequently is a bit of a “you’re holding it wrong”.

But I see your point that the nature of Python and hopefully the community database are the real solutions.

I did install and try pip-audit though and it had very very few notices and none spurious! So already I like it better than npm audit :)

thenerdhead · on Aug 11, 2022

How do you currently feel about attaching the experience to install or when restoring packages? Various ecosystems do this and some get flak for it because of how many transitive dependencies and known vulnerabilities in comparison to others. I'm mostly curious because I'm working on a similar proposal here:

https://github.com/NuGet/Home/pull/11549

There's definitely a fine balance of noise, but how do you feel about it?

woodruffw · on Aug 11, 2022

IMO, the experience in ecosystems like npm has shown that attaching auditing directly to package management steps causes security fatigue: users who are trying to get something done are suddenly burdened with the potential security repercussions of their action, which frazzles them and causes them to mis-value the audit results.

The worst outcome is the one you see in pretty much every nontrivial JS stack: dozens of “critical” vulnerabilities reported without user interaction, all ignored because the developer has learned that “critical” really means “I can’t silence this warning that something bad might happen in a specific set of circumstances.”

When you make auditing opt-in (like we intend for `pip audit`) even “noisy” results do not cause as much fatigue, since the user has explicitly asked for them.

rrdharan · on Aug 11, 2022

I wish they would bring back pip search.