13 Cybersecurity and Open Source Software
Open source software, and the particular way it is produced, connect with the security of computer systems (usually called cybersecurity) in a few specific ways.
13.1 Readings
Video Presentation: “Is Open Source More Secure?” IBM Technology https://www.youtube.com/watch?v=HcV4u-nemNk
Hacked: The overlooked and under-supported open source projects holding the Internet together (March 2025) https://illumin.usc.edu/hacked-open-source-projects/
- see also the XZ Utils backdoor timeline: https://research.swtch.com/xz-timeline
Sharma, Speed, and Howison (2022) The Securing Open Source Software Act Is Good, but Whatever Happened to Legal Liability? Lawfare Blog https://www.lawfaremedia.org/article/securing-open-source-software-act-good-whatever-happened-legal-liability
- see also EU product liability law changes https://www.taylorwessing.com/en/insights-and-events/insights/2024/03/software-als-produkt
Students work in collaborative Google Docs to answer the three discussion exercises below. Before class:
- Create one Google Doc per group in UT Mail Google Drive (drive.utmail.utexas.edu) — choose “Convert to Google Docs” on upload if starting from a template
- Set sharing on each to “Anyone with the link can edit” (Share > Change > Anyone with the link > Editor). Note: each copy needs this set individually.
- Add the three exercise headings as sections in each doc
- Post links via Canvas announcement before class
13.2 How software gets attacked
Before examining the relationship between open source and security, it helps to understand the common types of security vulnerabilities and how attackers exploit them.
Memory safety errors (common in C and C++): Buffer overflows, use-after-free. An attacker sends more data than a program expects, overwriting adjacent memory and potentially gaining control of execution. Many critical infrastructure libraries (OpenSSL, the Linux kernel) are written in C.
Injection attacks: SQL injection, command injection, cross-site scripting (XSS). User-supplied input is interpreted as code rather than data. Still one of the most prevalent vulnerability classes year after year.
Authentication and authorization failures: Weak passwords, broken session management, missing access controls — the system never checks whether you are who you claim to be, or whether you are allowed to do what you are doing.
Cryptographic weaknesses: Using outdated algorithms (MD5, SHA-1), misimplementing correct algorithms, or relying on random number generators that are not truly random.
Logic errors: The code does exactly what it was written to do — but the design itself has a flaw. An attacker finds a sequence of valid operations that produces an unintended result.
Supply chain attacks: Malicious code is inserted into a dependency, build tool, or development environment rather than into the target software directly. The attacker compromises something the target trusts.
How vulnerabilities are exploited: Researchers and attackers discover flaws through code review, fuzzing (automated random-input testing), and reverse engineering. Serious vulnerabilities receive a CVE (Common Vulnerabilities and Exposures) number — a public identifier used to track and communicate about them. A zero-day is a vulnerability known to attackers but not yet disclosed or patched publicly. Once a CVE is published, a race begins: defenders patching their systems vs. attackers exploiting systems that have not yet been patched. This race is why the time to patch matters enormously.
13.3 Many eyes make all bugs shallow?
Eric Raymond coined “Linus’s Law” in The Cathedral and the Bazaar (1999):
“Given enough eyeballs, all bugs are shallow.”
The argument: because open source code is publicly readable, a large community of developers can inspect it, find bugs, and fix them. A vulnerability that might hide for years in proprietary code will quickly be spotted in open source — before attackers can exploit it.
This is the central security argument for open source. But the evidence is more complicated.
13.3.1 Heartbleed (2014)
OpenSSL is the library that encrypts a large fraction of internet traffic. It is open source. A buffer over-read vulnerability — allowing an attacker to read up to 64 kilobytes of memory from a server — was present in the code for two years before it was discovered. It was found not by the community passively reviewing code, but by a security engineer at Google and a researcher at Codenomicon doing active, funded security auditing.
The lesson: many eyes don’t look. Most contributors to a project focus on features, not security review. Critical infrastructure code gets audited only when someone specifically funds that work.
13.3.2 The XZ Utils backdoor (2024)
The XZ incident reveals a different kind of failure: not a bug, but a deliberate, years-long attack on the community trust process itself.
xz is a widely-used compression library present on most Linux systems. In early 2024, versions 5.6.0 and 5.6.1 were found to contain a backdoor that would have allowed remote code execution via SSH — essentially a hidden key to millions of servers running systemd-based Linux distributions.
The attacker, operating under the name “Jia Tan,” spent approximately two years building credibility in the xz-utils project. They made legitimate, high-quality contributions. They built a relationship of trust with the sole maintainer, who was experiencing burnout and was being pressured by other fake accounts to hand over commit access. Eventually “Jia Tan” gained direct commit access and inserted the backdoor in a highly obfuscated way spread across several commits — designed to be invisible to casual review.
It was caught accidentally by Andres Freund, a Microsoft engineer, who noticed SSH logins on his machine were 500ms slower than expected. He traced the slowdown to liblzma (part of xz) and then found the backdoor. The full timeline is in the pre-reading at https://research.swtch.com/xz-timeline.
What this shows:
- Sophisticated attackers can operate on timescales of years, not hours
- Burnout and understaffing make maintainers vulnerable to social engineering
- “Many eyes” failed: the malicious code was reviewed and merged without detection
- But: the attack was caught before it reached stable major distributions — partly because of the open nature of the process (build logs were public, making the timing anomaly visible to someone who was paying attention)
Exercise 1:
“Many eyes make all bugs shallow.” In what ways does inspection and shared bug-fixing outweigh the advantages an attacker gets by looking directly at the source code? In what ways does it not?
Working in your group’s Google Doc:
- List two arguments for Linus’s Law — ways that openness genuinely helps security
- List two arguments against — ways it fails, using Heartbleed and/or XZ as evidence
- Find one recent security incident (from the last two years, not XZ or Heartbleed) and classify it: Was the vulnerability in open or closed source code? How long had it existed? How was it discovered? Would “more eyes” have helped?
13.4 The supply chain problem
Modern software is assembled from components. A typical application uses hundreds of open source libraries, which themselves depend on other libraries. This creates a dependency graph — and a vulnerability anywhere in the graph can affect everything above it.
13.4.1 Log4Shell (2021)
Log4j is a Java logging library — software that records what an application is doing. It is used in an enormous fraction of Java-based enterprise software, often as a transitive dependency: applications use it without explicitly listing it as a dependency, because something they depend on depends on it.
In December 2021, a critical vulnerability (CVE-2021-44228) was disclosed: by sending a specially crafted string to any application that logs it, an attacker could execute arbitrary code on the server. The vulnerability affected major products from Apple, Amazon, Cloudflare, Twitter, and thousands of other organizations. Many organizations didn’t know they had the dependency at all.
The aftermath was as revealing as the vulnerability itself. Nine months after Log4Shell became global news, 30% of applications using Log4j still used a vulnerable version. Patching requires knowing you have the dependency — and then getting an update deployed across all your systems.
13.4.2 Sustainability as the root problem
The Lawfare reading argues: “open source doesn’t have a security problem, it has a sustainability problem.”
Log4j was maintained by a tiny team of volunteers, largely unfunded. OpenSSL — encrypting much of the internet — operated for years on a budget of around $2,000/year. The pattern recurs: critical infrastructure maintained by a handful of people, without resources for systematic security auditing. Attackers know which libraries are critical and understaffed. The XZ attack specifically targeted a one-person project with a burned-out maintainer.
Resilience mechanisms the ecosystem has developed in response:
- Software Bill of Materials (SBOM): A machine-readable list of all components in a software product, making it possible to quickly identify which systems are affected when a vulnerability is disclosed.
- Reproducible builds: Ensuring the same source code always produces the same binary, making it harder to insert malicious code in the build process.
- Signed commits and releases: Cryptographic signatures on code changes create an audit trail and make impersonation harder.
- Dependency auditing tools:
npm audit,pip-audit, GitHub’s Dependabot — automated scanning for known vulnerabilities in declared dependencies. - Open Source Security Foundations: The OpenSSF (Open Source Security Foundation), funded by major tech companies, now provides security audits and tooling for critical projects.
Exercise 2:
Package systems build on existing libraries. Any security flaw can be multiplied. Open contributions could enable malicious actors to insert security flaws. In what ways is open source software resilient to these issues? When might resilience fail? What practices help bolster resilience?
Working in your group’s Google Doc:
Choose two resilience mechanisms from the list above. For each, describe: (a) a scenario where it would help catch a vulnerability, and (b) a scenario where it would fail or be bypassed.
AI code generation and security: Large language models (GitHub Copilot, ChatGPT, Cursor, etc.) are now widely used to write and review code. Discuss:
- LLMs are trained on large corpora of public code — including code with known vulnerabilities. What does this imply for the security of AI-generated code?
- If a developer uses an LLM to generate code that calls a library, how might this affect their awareness of their own dependency graph?
- Could LLMs help with supply chain security? How?
13.5 Legal liability and open source licenses
Open source licenses — the MIT license, the GPL, Apache 2.0 — all contain clauses like this (from MIT):
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND … IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY…
This is an AS-IS clause. It disclaims product liability: if the software is defective and causes harm, the authors are not legally responsible. This makes sense for volunteer contributors giving away code for free. But it applies equally to vendors who sell products built on open source — bundling it into commercial software without auditing or patching it.
13.5.1 Two camps
The Lawfare reading (Sharma, Speed, and Howison, 2022 — note: instructor (James) is a co-author) describes the policy debate around the Securing Open Source Software Act (SOSSA) through two competing camps:
Camp 1: Community investment. The problem is chronic underfunding of open source maintainers. The solution is direct investment — government and industry funding for security audits, better tooling, and paying maintainers. Regulation would burden already-stretched volunteers. SOSSA’s approach (CISA risk frameworks, federal dependency audits) is a reasonable, light-touch start.
Camp 2: Legal liability. Voluntary measures fail because vendors have no incentive to invest in security when they can ship vulnerable software with no legal consequence. Evidence: 30% of applications still running vulnerable Log4j nine months after disclosure. If vendors faced product liability, they would have financial incentive to scan their dependencies, patch promptly, and contribute upstream to the projects they depend on.
13.5.2 The synthesis
The authors argue these camps are not mutually exclusive. Legal liability, applied to vendors rather than volunteer maintainers, could actually increase investment in open source sustainability. If a company is liable for shipping software with a known vulnerability, the cheapest response may be to contribute to fixing the upstream open source project — cheaper than maintaining a private patch indefinitely.
The analogy is automotive recalls: when a defective part is discovered, the manufacturer is responsible for recalling and fixing the vehicle, even if the defective component came from a supplier. The “least-cost avoider” — the party best positioned to prevent harm at lowest cost — should bear legal responsibility.
13.5.3 The EU Product Liability Directive (2024)
The EU has moved ahead of the US on this. The revised Product Liability Directive explicitly includes software as a “product,” making vendors potentially liable for security defects. The EU Cyber Resilience Act adds mandatory security requirements for products with digital elements sold in the EU. The Taylor Wessing article in the readings covers the details.
Exercise 3:
Software licenses exclude product liability. How does this interact with open source? What are the trade-offs of applying product liability law to software and open source in particular?
Working in your group’s Google Doc, argue one of the following positions (your group will be assigned one):
Position A — For liability: Vendors profiting from software that relies on unaudited open source dependencies should bear product liability, as automotive manufacturers bear liability for defective components. The AS-IS clause is a legal artifact that has failed to create adequate security incentives. The EU is already moving in this direction and the sky has not fallen.
Position B — Against liability: Imposing liability on software vendors will be passed downstream to maintainers through contracts and indemnification clauses. It will chill open source contribution, push development to jurisdictions without liability regimes, and ultimately harm the commons that the entire tech industry depends on.
For your assigned position:
- List three supporting arguments
- Identify the strongest counter-argument and explain how you would respond to it
- Note where the Lawfare reading agrees or disagrees with your position
13.6 Resources
Sharma, A., Speed, S., and Howison, J. (2022). The Securing Open Source Software Act Is Good, but Whatever Happened to Legal Liability? Lawfare Blog. https://www.lawfaremedia.org/article/securing-open-source-software-act-good-whatever-happened-legal-liability
Raymond, E. S. (1999). The Cathedral and the Bazaar. http://www.catb.org/~esr/writings/cathedral-bazaar/
Freund, A. (2024). backdoor in upstream xz/liblzma leading to ssh server compromise. oss-security mailing list. https://www.openwall.com/lists/oss-security/2024/03/29/4
OpenSSF. (2024). Open Source Security Foundation. https://openssf.org