Software runs much of our modern business world, and a surprising amount of enterprise-level software suites are built upon open source software (OSS) components. A study by the Linux Foundation published in November 2025 found OSS components present in 55% of enterprise operating systems and 40% of AI and machine learning technology stacks. Artificial intelligence is particularly compatible with open source, a licensing framework that typically enables software modifications by end users, with business executives surveyed by the Linux Foundation reporting that AI and machine learning benefit the most from OSS components.
The use of OSS components gives companies the ability to develop software solutions uniquely crafted for their business model. Patent firms aren't immune to that appeal. As AI matures, more are weighing whether to build their own tools rather than adopt enterprise platforms.
However, for firms handling confidential invention disclosures and prosecution strategy, that flexibility comes with licensing encumbrances that create cybersecurity risks and potential ethical violations that most firms aren't equipped to manage. The ubiquity of OSS components across the entire field of software creates threats that law firms cannot ignore given the due diligence their professional responsibilities require.
According to Vlad Teplitskiy, Partner at Knobbe Martens and Co-Chair of the firm’s Electrical, Computer, and Software Group, there are several layers of open source concerns with the implementation of composite AI solutions in any corporate environment, especially one with the confidentiality and other ethical requirements of the legal industry.
Teplitskiy, who has written on open source considerations in the AI context, notes that open source licensing enables community development and access in ways that can create compliance or disclosure risks if not properly managed. “These licenses are specifically intended to keep things free and flowing for all users with an eye toward improvement,” Teplitskiy said. However, license provisions requiring source code disclosure, reciprocal redistribution, or restrictions on commercial use can create compliance risks when software or datasets are incorporated into legal practice tools.
For patent professionals, those risks aren't abstract. The confidentiality obligations that define legal practice are structurally incompatible with the open sharing principles that underpin most OSS frameworks, and the exposure compounds at multiple layers.
Layer 1: Open Source Licenses Embedded Within AI Source Code
AI platforms themselves are composed of software code that may be subject to various licenses creating responsibilities for end users that aren’t compatible with many practice environments. Worse, some modified distributions of OSS programs may only include information about which licenses apply within the software code itself. Patent attorneys come from technical backgrounds in scientific fields, but it’s not hard to conceive the idea of a patent attorney without a software background who might not appreciate this risk when engaging with available software tools, even AI patent tools that don’t appear to be open source.
According to Teplitskiy, there are three general categories of OSS licenses that could apply to the software code comprising AI platforms:
1. Permissive
The least restrictive form of open source license, often allowing the modification and distribution of source code for commercial purposes. Examples of permissive open source licenses include MIT, BSD and Apache licenses.
2. Reciprocal
Sometimes referred to as “weak copyleft,” reciprocal licenses typically mandate that downstream end users make their software code available on the same terms as the original software. Examples of reciprocal licenses include public licenses enabling app developers to contribute either to proprietary operating systems created by Apple, or computing environments maintained by development communities like Mozilla or Eclipse.
3. Restrictive
“Strong copyleft” licenses utilize the legal framework of copyright protection to ensure that works derived from the original software must be made freely available to all, including full disclosure of source code. One popular form of restrictive license is the GNU Project’s General Public License (GPL).
First released in 1989, the GPL is common in modern computing architecture, especially within systems that are built upon the Linux kernel. Although permissive licenses have been gaining in popularity among open source communities in recent years, the GPL is likely to encumber many software packages available on developer platforms like GitHub, potentially requiring any developer to publish source code for derivatives of the original program.
Generally, OSS licenses like GPL only become effective upon distribution of software to third parties. But in Teplitskiy’s view, the line between internal firm usage of an AI product and client distributions of that product is blurring thanks to cloud computing.
Teplitskiy notes that it would be possible for a law firm to make an AI-powered solution available to clients via the cloud in ways that trigger license terms of incredibly restrictive licenses like Affero GPL (AGPL), which can apply to network-based distributions of software and requires any source code modifications made by a law firm to be freely available. This can create unwanted consequences if confidential information is included in that code.
Layer 2: Dataset Inputs Encumbered By Licenses Preventing Modifications, Commercial Usage
IP lawyers are very aware of the software licensing frameworks for code powering AI solutions. Data used as inputs for training AI models, however, present a separate layer of open source issues that some in the legal industry don’t always fully appreciate, according to Teplitskiy. Even when the AI solution itself is not composed from OSS components, it is important to review any licensing conditions that may be attached to those training data inputs.
There are several categories of open data licenses administered by organizations like Creative Commons or Open Source Initiative that could be attached to publicly available data sets:
- Attribution: Requires that parties developing derivative works give appropriate credit to the author of the original data collection and indicate any changes made to that source.
- Share Alike: Requires that parties modifying or adding material to the original data source when creating a derivative work make the modified content available to the public on the same terms as the original source.
Those licenses introduce conditions on the use of publicly available data that can easily cause issues for a law firm mixing public data and confidential information when training their AI platform. However, some databases are covered by licenses with terms even more restrictive than these:
- No Derivatives: Prevents parties from making distributions of modifications to the original data source.
- No Commercial Use: Prevents parties from using the original data source for any kind of commercial purpose.
One particular open data initiative providing a stark example of issues that the legal industry cannot risk is InsightFace, the developer of a 2D/3D facial analysis and recognition toolset. The software framework of InsightFace itself, providing the tools for high speed and accurate facial detection, is governed by the framework of the permissive MIT License that allows for distributions of modified source code for any purpose, including commercial.
However, the training datasets are typically only available for academic research purposes, not commercial ones. Implementing an AI solution trained on third-party datasets without confirming commercial use or modification rights is not a chance worth taking.
Layer 3: AI-Generated Code Outputs Triggering Obligations Through Substantial Similarity
Open source derives its framework from copyright law, which is why many licensing terms become operable once a distribution of the same work or a derivative is made. That copyright regime also comes into play with the outputs of generative AI systems. Many high-profile infringement lawsuits have been filed over potentially infringing text, image or video outputs, but generative AI can also produce code substantially similar to copyrightable software code.
Generative AI developers are incentivized to mitigate outputs that infringe upon copyright, given the legal liability many of them currently face in US district courts. However, they are not so incentivized to train their models on software code available for free distribution under an open source license. While any work completely generated by AI is not copyrightable, software code generated with significant human input could carry forward some of the licensing restrictions found in the original source code.
It would be very detrimental to any legal practice to be caught in a situation where a software tool they developed on their own is unwittingly covered by a third party’s license. Enforcement of open source licensing terms is rare, but not unheard of.
How to Avoid Open Source Issues in Your Legal Practice
Even if enforcement risks are low for the vast majority of companies, the potential of unknown licensing conditions on practice tools can create concerns due to the legal industry’s professional responsibilities.
For firms weighing whether to build on open source components or adopt a purpose-built platform, the practical guidance breaks down by function:
1. Choose Integrated, Purpose-Built Platforms for Patent Prosecution
For patent prosecution specifically, the three layers of open source exposure described above—platform code, training data, and AI-generated outputs—don't operate independently. A firm cobbling together AI tools from multiple components multiplies its licensing exposure at every layer. An integrated, purpose-built patent platform built on proprietary code and data eliminates that exposure by design, rather than requiring ongoing diligence across a fragmented stack.
2. Perform a License Audit on Your Firm’s Entire Software Suite
Even if you turn toward a patent-specific proprietary AI solution, it’s possible that your firm has already implemented some software solutions encumbered with open source licensing terms. Internal documentation that can be quickly referenced to see if any non-legal service functions of a firm could trigger those licensing conditions. Building a digital folder full of end-user license agreements (EULAs) and engaging key vendors in conversations about specific licensing conditions are other ways to assess potential OSS issues that may have escaped a firm’s awareness as they’ve adopted AI solutions.
3. Write an Open Source Policy Explicitly Excluding the Most Restrictive Licenses
Law firms use software for far more than patent prosecution—CRM, operations, internal tooling—and those functions are where open source components are most likely to have slipped in unexamined. For these areas, firms should have explicit policies on which license types are acceptable. Teplitskiy notes that firms can model their own open source policies on those available through nonprofit organizations like the Linux Foundation or major tech companies like Google.
The Architecture of Trust
The underlying technology decisions behind any AI tool are rarely visible to end users. For patent professionals, that opacity is exactly the problem, and the reason why the architecture of the platforms they trust with client matters deserves the same scrutiny they bring to anything else.
FAQ: Open Source AI Risks and Patent Firm Technology Decisions
What open source licensing risks should patent firms be aware of when adopting AI tools?
Patent firms face exposure at three distinct layers: the licenses embedded in the AI platform's source code, the licenses attached to the datasets used to train the model, and the potential for AI-generated code outputs to carry forward restrictions from the original source code. Each layer creates obligations that can conflict directly with a firm's duty of confidentiality.
What is the difference between permissive, reciprocal, and restrictive open source licenses?
Permissive licenses (MIT, BSD, Apache) allow modification and commercial distribution with minimal conditions. Reciprocal licenses, sometimes called "weak copyleft," require downstream users to make their own code available on the same terms as the original. Restrictive licenses, such as the GNU General Public License (GPL), go further, requiring any derivative work to be made freely available, including full source code disclosure. For patent firms, restrictive licenses pose the greatest risk, particularly when client-confidential logic becomes entangled with licensed code.
Can cloud-based AI tools trigger open source licensing obligations for law firms?
Yes. Open source licenses like the GPL technically activate upon distribution to third parties. As Teplitskiy notes, the line between internal firm usage and client-facing distribution is blurring with cloud computing. A firm that makes an AI-powered tool available to clients via the cloud may trigger the Affero GPL (AGPL), which applies to network-based software distributions and requires source code modifications—potentially including proprietary prosecution logic—to be made freely available.
Why are AI training datasets a separate licensing concern from the platform itself?
An AI platform can be built on proprietary code and still carry open source exposure through its training data. Publicly available datasets administered by organizations like Creative Commons or the Open Source Initiative often carry conditions including attribution requirements, share-alike obligations, or outright prohibitions on commercial use. A firm that fine-tunes a model using such datasets—while mixing in confidential client matter—risks a direct conflict between those licensing terms and its professional obligations.
What is the build vs. buy risk for patent firms considering AI tools?
Firms that build their own AI tools using open source components face compounding exposure across all three licensing layers simultaneously. Each additional OSS component introduces its own licensing conditions, and managing diligence across a fragmented stack is an ongoing operational burden most firms aren't resourced to handle. Purpose-built patent AI platforms developed on proprietary code and data eliminate that exposure by design, leaving firms with a single set of licensing terms to understand and monitor rather than dozens.
How should law firms handle open source licensing for non-prosecution business functions?
For business functions outside of patent prosecution—CRM, operations, internal tooling—firms should develop explicit policies specifying which license types are acceptable and establishing diligence procedures for any new software adoption. Teplitskiy recommends modeling these policies on frameworks published by organizations like the Linux Foundation or major technology companies like Google, which have developed mature approaches to managing open source exposure at scale.

.png)


.png)




