13  Cybersecurity and Open Source Software

Open source software, and the particular way it is produced, connect with the security of computer systems (usually called cybersecurity) in a few specific ways.

13.1 Readings

Instructor setup

Students work in collaborative Google Docs to answer the three discussion exercises below. Before class:

  1. Create one Google Doc per group in UT Mail Google Drive (drive.utmail.utexas.edu) — choose “Convert to Google Docs” on upload if starting from a template
  2. Set sharing on each to “Anyone with the link can edit” (Share > Change > Anyone with the link > Editor). Note: each copy needs this set individually.
  3. Add the three exercise headings as sections in each doc
  4. Post links via Canvas announcement before class

13.2 How software gets attacked

Before examining the relationship between open source and security, it helps to understand the common types of security vulnerabilities and how attackers exploit them.

Common causes of security flaws and how they are exploited

Memory safety errors (common in C and C++): Buffer overflows, use-after-free. An attacker sends more data than a program expects, overwriting adjacent memory and potentially gaining control of execution. Many critical infrastructure libraries (OpenSSL, the Linux kernel) are written in C.

Injection attacks: SQL injection, command injection, cross-site scripting (XSS). User-supplied input is interpreted as code rather than data. Still one of the most prevalent vulnerability classes year after year.

Authentication and authorization failures: Weak passwords, broken session management, missing access controls — the system never checks whether you are who you claim to be, or whether you are allowed to do what you are doing.

Cryptographic weaknesses: Using outdated algorithms (MD5, SHA-1), misimplementing correct algorithms, or relying on random number generators that are not truly random.

Logic errors: The code does exactly what it was written to do — but the design itself has a flaw. An attacker finds a sequence of valid operations that produces an unintended result.

Supply chain attacks: Malicious code is inserted into a dependency, build tool, or development environment rather than into the target software directly. The attacker compromises something the target trusts.

How vulnerabilities are exploited: Researchers and attackers discover flaws through code review, fuzzing (automated random-input testing), and reverse engineering. Serious vulnerabilities receive a CVE (Common Vulnerabilities and Exposures) number — a public identifier used to track and communicate about them. A zero-day is a vulnerability known to attackers but not yet disclosed or patched publicly. Once a CVE is published, a race begins: defenders patching their systems vs. attackers exploiting systems that have not yet been patched. This race is why the time to patch matters enormously.

13.3 Many eyes make all bugs shallow?

Eric Raymond coined “Linus’s Law” in The Cathedral and the Bazaar (1999):

“Given enough eyeballs, all bugs are shallow.”

The argument: because open source code is publicly readable, a large community of developers can inspect it, find bugs, and fix them. A vulnerability that might hide for years in proprietary code will quickly be spotted in open source — before attackers can exploit it.

This is the central security argument for open source. But the evidence is more complicated.

13.3.1 Heartbleed (2014)

OpenSSL is the library that encrypts a large fraction of internet traffic. It is open source. A buffer over-read vulnerability — allowing an attacker to read up to 64 kilobytes of memory from a server — was present in the code for two years before it was discovered. It was found not by the community passively reviewing code, but by a security engineer at Google and a researcher at Codenomicon doing active, funded security auditing.

The lesson: many eyes don’t look. Most contributors to a project focus on features, not security review. Critical infrastructure code gets audited only when someone specifically funds that work.

13.3.2 The XZ Utils backdoor (2024)

The XZ incident reveals a different kind of failure: not a bug, but a deliberate, years-long attack on the community trust process itself.

xz is a widely-used compression library present on most Linux systems. In early 2024, versions 5.6.0 and 5.6.1 were found to contain a backdoor that would have allowed remote code execution via SSH — essentially a hidden key to millions of servers running systemd-based Linux distributions.

The attacker, operating under the name “Jia Tan,” spent approximately two years building credibility in the xz-utils project. They made legitimate, high-quality contributions. They built a relationship of trust with the sole maintainer, who was experiencing burnout and was being pressured by other fake accounts to hand over commit access. Eventually “Jia Tan” gained direct commit access and inserted the backdoor in a highly obfuscated way spread across several commits — designed to be invisible to casual review.

It was caught accidentally by Andres Freund, a Microsoft engineer, who noticed SSH logins on his machine were 500ms slower than expected. He traced the slowdown to liblzma (part of xz) and then found the backdoor. The full timeline is in the pre-reading at https://research.swtch.com/xz-timeline.

What this shows:

  • Sophisticated attackers can operate on timescales of years, not hours
  • Burnout and understaffing make maintainers vulnerable to social engineering
  • “Many eyes” failed: the malicious code was reviewed and merged without detection
  • But: the attack was caught before it reached stable major distributions — partly because of the open nature of the process (build logs were public, making the timing anomaly visible to someone who was paying attention)

Exercise 1:

“Many eyes make all bugs shallow.” In what ways does inspection and shared bug-fixing outweigh the advantages an attacker gets by looking directly at the source code? In what ways does it not?

Working in your group’s Google Doc:

  1. List two arguments for Linus’s Law — ways that openness genuinely helps security
  2. List two arguments against — ways it fails, using Heartbleed and/or XZ as evidence
  3. Find one recent security incident (from the last two years, not XZ or Heartbleed) and classify it: Was the vulnerability in open or closed source code? How long had it existed? How was it discovered? Would “more eyes” have helped?

13.4 The supply chain problem

Modern software is assembled from components. A typical application uses hundreds of open source libraries, which themselves depend on other libraries. This creates a dependency graph — and a vulnerability anywhere in the graph can affect everything above it.

13.4.1 Log4Shell (2021)

Log4j is a Java logging library — software that records what an application is doing. It is used in an enormous fraction of Java-based enterprise software, often as a transitive dependency: applications use it without explicitly listing it as a dependency, because something they depend on depends on it.

In December 2021, a critical vulnerability (CVE-2021-44228) was disclosed: by sending a specially crafted string to any application that logs it, an attacker could execute arbitrary code on the server. The vulnerability affected major products from Apple, Amazon, Cloudflare, Twitter, and thousands of other organizations. Many organizations didn’t know they had the dependency at all.

The aftermath was as revealing as the vulnerability itself. Nine months after Log4Shell became global news, 30% of applications using Log4j still used a vulnerable version. Patching requires knowing you have the dependency — and then getting an update deployed across all your systems.

13.4.2 Sustainability as the root problem

The Lawfare reading argues: “open source doesn’t have a security problem, it has a sustainability problem.”

Log4j was maintained by a tiny team of volunteers, largely unfunded. OpenSSL — encrypting much of the internet — operated for years on a budget of around $2,000/year. The pattern recurs: critical infrastructure maintained by a handful of people, without resources for systematic security auditing. Attackers know which libraries are critical and understaffed. The XZ attack specifically targeted a one-person project with a burned-out maintainer.

Resilience mechanisms the ecosystem has developed in response:

  • Software Bill of Materials (SBOM): A machine-readable list of all components in a software product, making it possible to quickly identify which systems are affected when a vulnerability is disclosed.
  • Reproducible builds: Ensuring the same source code always produces the same binary, making it harder to insert malicious code in the build process.
  • Signed commits and releases: Cryptographic signatures on code changes create an audit trail and make impersonation harder.
  • Dependency auditing tools: npm audit, pip-audit, GitHub’s Dependabot — automated scanning for known vulnerabilities in declared dependencies.
  • Open Source Security Foundations: The OpenSSF (Open Source Security Foundation), funded by major tech companies, now provides security audits and tooling for critical projects.

Exercise 2:

Package systems build on existing libraries. Any security flaw can be multiplied. Open contributions could enable malicious actors to insert security flaws. In what ways is open source software resilient to these issues? When might resilience fail? What practices help bolster resilience?

Working in your group’s Google Doc:

  1. Choose two resilience mechanisms from the list above. For each, describe: (a) a scenario where it would help catch a vulnerability, and (b) a scenario where it would fail or be bypassed.

  2. AI code generation and security: Large language models (GitHub Copilot, ChatGPT, Cursor, etc.) are now widely used to write and review code. Discuss:

    • LLMs are trained on large corpora of public code — including code with known vulnerabilities. What does this imply for the security of AI-generated code?
    • If a developer uses an LLM to generate code that calls a library, how might this affect their awareness of their own dependency graph?
    • Could LLMs help with supply chain security? How?

13.6 Resources

Sharma, A., Speed, S., and Howison, J. (2022). The Securing Open Source Software Act Is Good, but Whatever Happened to Legal Liability? Lawfare Blog. https://www.lawfaremedia.org/article/securing-open-source-software-act-good-whatever-happened-legal-liability

Raymond, E. S. (1999). The Cathedral and the Bazaar. http://www.catb.org/~esr/writings/cathedral-bazaar/

Freund, A. (2024). backdoor in upstream xz/liblzma leading to ssh server compromise. oss-security mailing list. https://www.openwall.com/lists/oss-security/2024/03/29/4

OpenSSF. (2024). Open Source Security Foundation. https://openssf.org