March 28, 2026: Security Analysis

The Cloud AI
Security Stack
Is a Shield
Sales Pitch.

For years, cloud AI vendors sold law firms a soothing script. SOC 2. GDPR Compliant. Encryption. BYOK. Zero Training. The industry repeated those phrases until lawyers started mistaking them for real control over their clients' privileged matters. Then March 24 happened. This page breaks the fog, not to alarm you without cause, but because you were warned, and the warning proved correct.

The argument here is precise: these controls reduce certain risks. They do not eliminate third-party plaintext exposure during live inference. Those are different claims. They should not be sold as though they are the same.

THE LITELLM BREACH: MARCH 24, 2026

95M

Monthly Downloads

~3hrs

Before Detection

1,000+

SaaS Envs Confirmed

5–10K

Projected Final Count

On March 24, 2026, threat actor group TeamPCP published two poisoned versions of LiteLLM, the routing layer that sits between AI applications and every major model provider. Versions 1.82.7 and 1.82.8 contained malware that harvested environment variables, SSH keys, cloud credentials, Kubernetes tokens, database passwords, and live AI session content in plaintext. The malware installed a persistent backdoor polling for follow-on payloads every 50 minutes. It was discovered only because an attacker coding error crashed host machines. A careful attacker is never caught. Three days later, credentials from the LiteLLM harvest were used to breach Telnyx. Mandiant confirmed 1,000+ enterprise environments compromised and projects 5,000–10,000 as the final count. The infrastructure running LiteLLM carried real compliance certifications. The breach hit the inference layer, the moment every badge was designed to ignore.

Sources: LiteLLM Official Security Update · Datadog Security Labs · Snyk · Infosecurity Magazine · CSO Online / Mandiant

Decision Framework

Not All Matters Carry The Same Risk.
Not All Architectures Are Equal.

ABA Formal Opinion 512 requires lawyers to assess unauthorized-access and disclosure risk before entering client information into generative AI tools. The right architecture depends on the sensitivity of the matter. This matrix is a starting point, not legal advice. Consult your ethics counsel before making deployment decisions for privileged work.

Tier 1: Crown Jewel Matters

Shared SaaS Cloud AI

Multi-tenant cloud platforms where privileged client content transits third-party inference infrastructure. No matter how many compliance badges exist, inference requires plaintext on provider-operated systems. LiteLLM proved what that means when a dependency is compromised. For matters where disclosure would be materially damaging, major litigation strategy, M&A, regulatory exposure, criminal defense, shared cloud inference introduces a third-party runtime dependency that no contract eliminates.

Default: Avoid for privileged crown-jewel matters

Tier 2: Sensitive But Bounded Matters

Attested Confidential Cloud / Private Deployment

Architectures using attested Trusted Execution Environments, customer-dedicated infrastructure, or similar controls that materially reduce provider and operator access to data in use. These narrow the attack surface for certain threat classes and may support a reasonableness argument under ABA Model Rule 1.6 for lower-sensitivity workflows. They do not eliminate third-party runtime dependency. The vendor still controls hardware, deployment, attestation, patching, and orchestration. Use requires documented architecture review, verified builds, pinned dependencies, strict egress controls, and per-matter audit logs.

Conditional: Requires documented architecture review

Tier 3: Highest Sensitivity Matters

Customer-Controlled Isolated Inference

Firm-controlled infrastructure with no external inference call in the privileged processing chain. Verified and pinned artifacts, no live dependency pulls, per-matter segmentation, least-privilege access, strict egress controls, customer-held keys, and immutable audit logs. Air-gapped local deployment is the strongest currently available form of this pattern. It also eliminates network-based exfiltration paths and removes the third-party provider as a separate compulsory-process target for inference-related records. The LiteLLM breach confirms that pinned, offline-verified deployments were not impacted. That is the architecture that held.

Strongest default: For crown-jewel privileged matters

This matrix is a risk-framing tool only. It is not a compliance checklist, an ethics opinion, or legal advice. Architecture selection for privileged AI workflows requires qualified counsel review. Sources: ABA Formal Opinion 512 (July 2024); NIST SP 800-207 Zero Trust Architecture; 18 U.S.C. §§ 2702, 2703, 2713.

The Security Stack: Badge by Badge

Every control below is real. Every control below fails to address the same moment: live inference on third-party infrastructure, where your client's matter must exist in plaintext for the model to process it. These are not useless controls. They are incomplete answers to the wrong question, and they are being sold as though they are complete.

01 SOC 2 COMPLIANT Type I and Type II INCOMPLETE

▸ The Claim

We passed an independent security audit. Our controls have been verified by a third party.

✕ What It Misses

A SOC 2 report documents controls within a defined scope on a specific audit date. The scope does not include whether your client's matter is readable during live AI inference. LiteLLM carried SOC 2 Type I and II and ISO 27001 certifications on March 24. The breach hit anyway. The badge was real. The compromise was real. Both were true simultaneously. An auditor's stamp is not a technical barrier. It is a paperwork record of how a system was configured on a past day, reviewed against criteria that never asked what happens to your client's documents while the model is processing them. SOC 2 is a necessary element of vendor diligence. It is not sufficient to establish inference-time privilege protection.

✓ Customer-Controlled Inference

There is no vendor audit to pass or fail. The hardware is yours. The runtime is yours. The logs are yours. You are the auditor, and you can produce the evidence yourself without a third party's attestation standing between you and the answer.

02 GDPR COMPLIANT Your data is yours INCOMPLETE

▸ The Claim

We are GDPR compliant. Your data is protected under European privacy law.

✕ What It Misses

GDPR governs how a vendor handles your data legally: processing basis, data subject rights, retention policies. It does not govern whether your client's matter is technically readable during AI inference. A vendor can be in full GDPR compliance while your client's documents exist in plaintext on their servers during processing. The compliance framework and the architectural exposure address different questions and were never designed to conflict. Additionally: 18 U.S.C. §2713 requires covered U.S. providers to comply with lawful orders to disclose data within their possession, custody, or control regardless of where it is stored. GDPR compliance does not shield a U.S.-based provider from a valid U.S. federal compulsory process order. The exposure varies by architecture, retention practice, and the specific data involved, but the structural tension between a GDPR promise and a federal statute is not resolved by the compliance badge.

✓ Customer-Controlled Inference

No third-party cloud processor holds the matter. No vendor-side transfer problem. No provider sitting in the middle claiming GDPR compliance while retaining the control that creates the structural tension. The compulsory process question against a third party does not arise when there is no third party holding inference-related records.

03 24/7 ENCRYPTION In transit and at rest INCOMPLETE

▸ The Claim

Your data is encrypted at all times, moving to our servers and while stored.

✕ What It Misses

The AI cannot analyze ciphertext. It must decrypt your documents before it can process them. During inference, your client's matter must be in machine-usable form on the vendor's infrastructure so the model can tokenize, route, and generate output. That is a computational requirement, not a design flaw. The LiteLLM malware did not defeat encryption at rest. According to Datadog Security Labs, it reached the processing layer where the material already had to be readable and harvested it there. Encryption was fully intact. The exposure happened anyway. This control protects data on the way to the server and while it sits in storage. It protects everything except the moment the AI uses it.

✓ Customer-Controlled Inference

Your data never transits to anyone else's servers for inference. There is no third-party plaintext window because there is no third-party runtime. The only moment your client's matter exists in plaintext during processing is inside hardware you own, in a facility you control.

04 BYOK Bring Your Own Key INCOMPLETE

▸ The Claim

You hold the encryption key. We cannot read your data.

✕ What It Misses

The key must unlock the data before the AI can process it. Once that happens, the data is plaintext on the vendor's hardware, inside their runtime, under their operational control. BYOK improves key custody and revocation rights for stored data. It does not change who operates the machine doing the inference. Harvey's own published security documentation states directly: "customer data is used at inference time only." That is the vendor confirming the data is readable at the moment it matters most. You hold the key to a door that has to be left open for the AI to work. BYOK is a meaningful governance improvement. It is not a privilege-protection architecture.

✓ Customer-Controlled Inference

You hold the key, the hardware, the runtime, the storage, and the facility. There is no vendor-side environment where your matter ever has to become plaintext for someone else's benefit. Per-matter LUKS AES-256 encryption means each client's data is cryptographically isolated, and the key never leaves your building.

05 ZERO TRAINING Your data is never used for training INCOMPLETE

▸ The Claim

We will never use your documents to train our AI models.

✕ What It Misses

This promise addresses downstream reuse after the session ends. The inference-time exposure occurs during the session. A supply chain compromise, credential theft, or runtime access event does not care whether the vendor planned to use your documents for future training. The LiteLLM malware executed during active processing, not after. It harvested live session data. "We do not train on your data" is a model-governance promise. It is orthogonal to inference-time privilege protection. A vendor can fully honor a zero-training commitment while your client's matter is being read, copied, or exfiltrated in the same session. The promise and the exposure were never designed to address the same moment.

✓ Customer-Controlled Inference

The model runs locally and was trained before it arrived. Your matter never reaches anyone who could observe it, log it, or exfiltrate it during processing. The training promise becomes irrelevant because the external access path to the live session does not exist.

06 HOMOMORPHIC ENCRYPTION Advanced protection during processing NOT AVAILABLE

▸ The Claim

Advanced cryptographic techniques can protect data even while it is being processed.

✕ What It Misses

Fully homomorphic encryption, which would allow computation on encrypted data without decryption, does not operate at practical speed for production language model inference workloads. No cloud AI vendor offers it for legal AI use today. No law firm deploys it. It remains a research direction with significant computational overhead that makes real-time document analysis economically and technically unviable at scale. Citing it as a current defense is not a serious argument. If a vendor proposes it as a present solution, ask them to identify the production deployment, the throughput benchmarks, and the document volumes it handles in real matters. The current honest answer is: it does not yet exist as a deployable option for this workload.

✓ Customer-Controlled Inference

No exotic cryptography is required to solve a third-party possession problem. Remove the third party from the inference chain and the problem does not arise. Customer-controlled local inference eliminates the need for any cryptographic workaround during processing.

07 TRUSTED EXECUTION ENVIRONMENTS Confidential compute / Secure enclaves INCOMPLETE

▸ The Claim

Secure enclaves protect your data during inference even on our infrastructure.

✕ What It Misses

TEEs are real technology and they materially reduce certain threat classes. The claim here is precise: they do not eliminate third-party runtime dependency, and the inference-time exposure they address is narrower than the marketing implies. Inside the enclave, the processor still operates on data in machine-usable form. The protection is against certain classes of host-level and operator inspection, not against application-layer compromise. The LiteLLM attack operated at the application layer. It did not need to break through enclave walls. Tighter walls did not matter when the attacker was already inside the application running within them. Additionally: TEEs require trusting the chip manufacturer's attestation integrity, a dependency that cannot be audited or contractually eliminated. Intel SGX has documented vulnerabilities including SGAxe and CrossTalk that extract data from inside enclaves through side-channel attacks. The vendor still controls hardware, deployment, attestation workflow, patching, and orchestration. TEEs narrow and reallocate trust. They are meaningful risk reduction. They are not elimination of third-party runtime dependency.

△ Honest Assessment

For Tier 2 matters, lower-sensitivity workflows with documented architecture review, TEE-backed deployments with verified builds, pinned dependencies, strict egress controls, and per-matter audit logs may support a reasonableness argument under ABA Model Rule 1.6. For crown-jewel privileged matters, the residual trust dependencies are material and should be acknowledged in the confidentiality analysis.

✓ Customer-Controlled Inference

The security boundary is your entire controlled environment. Physical access controls, locked ports, per-matter encryption, offline deployment, and controlled transfer procedures change actual possession and control, not just the packaging around external dependence. Air-gapped deployment is the strongest currently available form of this pattern: it removes the vendor from the inference event entirely and eliminates network-based exfiltration paths.

08 THE FULL STACK All five controls combined INCOMPLETE

▸ The Claim

Together, all of these controls provide comprehensive protection for your client's data.

✕ What It Misses

Stacking five controls that each miss the inference window does not cover the inference window. Every one of these controls can be present and valid while your client's privileged matter is readable on third-party infrastructure during active processing. The infrastructure running LiteLLM carried real compliance certifications on March 24. The breach hit the inference layer anyway. Layered controls can materially reduce the probability of unauthorized access. They cannot eliminate the architectural fact that third-party runtime inference creates a third-party runtime dependency. Risk reduction and risk elimination are different claims. The full stack, as marketed, establishes the former. It does not establish the latter. The brochure is thicker. The exposure is the same.

✓ Customer-Controlled Inference

Remove the third party from the inference equation and the entire stack becomes structurally irrelevant simultaneously. There is no vendor runtime to audit, compel, or compromise. The architectural question resolves at the level of possession and control, not at the level of contractual promises about how a vendor handles what they hold.

09 CONTINUOUS THIRD-PARTY MONITORING Independent auditors verify our security INCOMPLETE

▸ The Claim

Independent auditors continuously verify and monitor our security posture.

✕ What It Misses

Third-party verification is still third-party trust. The auditor does not assume the privilege exposure. The assessor does not become co-counsel if a malpractice claim arises. The audit report does not transfer the risk from your firm to the vendor. Your client's matter still lives inside a system you do not own, operated by people you do not supervise, subject to changes you do not control. A compliance report documents a past state within a defined scope. March 24 compromised LiteLLM through its own security tooling, the very scanner it used to verify its own builds. The auditor's scope did not include transitive CI/CD dependencies. The audit and the attack were not in conflict. The audit simply was not designed to catch what caught it.

✓ Customer-Controlled Inference

Every session produces its own tamper-evident audit log on hardware you control. Every matter. Every inference event. Every output. That is evidence you can hand across a table yourself, not outsourced reassurance from a third party whose scope excluded the moment that mattered.

10 MALPRACTICE COVERAGE Our security posture protects your firm INCOMPLETE

▸ The Claim

Our security certifications demonstrate the level of protection your firm's risk management requires.

✕ What It Misses

Verisk ISO generative AI exclusions went into effect January 1, 2026. Your insurer cares whether you understood the risk, controlled the exposure, and can prove what happened, not whether your vendor had a compliance badge. If a vendor environment is compromised and you cannot independently demonstrate what processed what, when, and under whose control, you are relying on the vendor's incident narrative to defend your own diligence. That is a structurally weak position. A firm cannot outsource its duty of care to a policy page or a vendor sales engineer. The exclusion asks what the firm knew and what the firm did. The vendor's badge does not answer either question.

✓ Customer-Controlled Inference

When your carrier asks for proof of due diligence, you produce the audit trail from your own hardware. You can demonstrate what processed what and when without waiting for a vendor to explain their own breach. The processing evidence is yours from the moment the session runs.

11 CONTRACTUAL CONFIDENTIALITY Our agreements protect your data from disclosure INCOMPLETE

▸ The Claim

Our enterprise agreements guarantee that your data stays private and cannot be disclosed to third parties.

✕ What It Misses

Contracts govern voluntary conduct. They do not override compulsory process. Under 18 U.S.C. §2703, a provider that stores or maintains communications or related records may be compelled to disclose them. Under 18 U.S.C. §2713, covered providers must comply with such orders regardless of where the data is physically stored. Once your client's matter passes through a third-party cloud AI provider, that provider becomes a legal target for anyone seeking inference-related records. The exposure varies by architecture, retention practice, and the specific data categories involved, not every cloud design creates identical legal-process posture. But the structural point holds: your enterprise agreement governs what the vendor voluntarily does with your data. It does not govern what a federal court can order them to produce. Those are different questions answered by different bodies of law.

✓ Customer-Controlled Inference

No outside vendor holds inference-related records. Anyone seeking the data must come to you directly, under processes that trigger different legal protections, notice requirements, and response rights. You are not collateral in someone else's subpoena because there is no one else holding the matter.

12 ANONYMIZATION We strip identifying information before uploading INCOMPLETE

▸ The Claim

We protect clients by removing names and identifying information from documents before sending them to cloud AI systems.

✕ What It Misses

Stripping names does not strip facts. Legal matters are often identifiable from context: jurisdiction, case type, dates, dollar amounts, procedural posture, and specific fact patterns. Anonymization reduces identifiability in some cases. It does not eliminate the inference-time exposure that motivated it. Beyond the technical limitation: the decision to anonymize reflects a judgment that the underlying architecture requires mitigation before it is safe to use for this matter. That judgment may create a record that deserves scrutiny, and it should prompt a harder question about whether the architecture is appropriate for the matter at all. Anonymization also degrades the quality of the legal analysis by stripping context the model needs to do precise work. The result is a workflow that accepts the risk and weakens the output at the same time.

✓ Customer-Controlled Inference

The full unredacted matter runs locally. You analyze real documents with real context. No degraded inputs. No mitigation workaround embedded in the workflow. The need to anonymize does not arise because the exposure that motivated it does not exist.

13 LOCAL BUT CONNECTED "Local AI still has supply chain risk" TRUE BUT INCOMPLETE

▸ The Counter-Claim

Air-gapped local AI does not solve supply chain risk. Software has to come from somewhere, even local deployments pull packages from the internet.

✕ What It Conflates

This objection is technically correct for internet-connected local deployments. It does not apply to properly implemented air-gapped deployments. The distinction is between the staging process and the production system. If a machine went online to pull packages during setup, that step was the exposure. The production system is not the same as the setup process. LiteLLM confirmed that customers running the official pinned Docker deployment were not impacted because that path did not rely on the compromised PyPI packages. March 24 did not prove air-gap fails. It proved that live network dependency pulls during production operation fail. Any system pulling packages from public repositories during production retains the same supply chain attack surface that hit LiteLLM users. Local and internet-connected is a different risk profile from cloud inference, but it is not a clean architecture.

✓ True Air-Gapped Deployment

A properly implemented air-gapped production system does not reach package registries, model hubs, update servers, or vendor APIs during client matter processing. Artifacts are selected, verified, pinned, hashed, transferred on controlled media, and installed offline. The production box is sealed before client data touches it. No live network pulls. No silent updates. No remote callbacks. If you control the artifact chain before deployment and the box stays offline in production, you have closed the ingress path that March 24 exploited. That is the architecture that held on March 24. That is the standard.

Every badge answers a question.
None of them answer the right one.

The right question is not whether the vendor has enough credentials. It is: while the AI is processing your client's privileged matter, who else can reach it?

These controls are real. The claim that they eliminate inference-time third-party exposure is not. The LiteLLM breach did not create that gap. It confirmed what was always structurally true. The model must receive your client's matter in readable form to do its job. Every control above operates on a different moment than that one.

For crown-jewel privileged matters, the defensible architecture is customer-controlled isolated inference with verified builds, per-matter segmentation, least-privilege access, strict egress controls, customer-held keys, and immutable audit logs. Air-gapped local deployment is the strongest currently available form of that pattern. It removes the vendor from the inference event entirely, eliminates network-based exfiltration paths, and removes the third-party provider as a separate compulsory-process target for inference-related records.

That is not a feature upgrade. It is the first architecture that removes the structural dependency instead of managing it.

Contact CloseVector closevector.ai

Infrastructure commentary only. Nothing on this page constitutes legal advice, professional responsibility guidance, or compliance counsel. The three-tier matrix is a risk-framing tool only, not a compliance checklist, ethics opinion, or substitute for qualified counsel review. Analysis referencing ABA Formal Opinion 512 (July 2024), ABA Model Rules 1.1 and 1.6, and NIST SP 800-207 is descriptive only. Statutory references to 18 U.S.C. §§ 2702, 2703, and 2713 are informational. Law firms should consult qualified ethics and technology counsel before making AI deployment decisions for privileged work. LiteLLM breach facts sourced from LiteLLM official security update, Datadog Security Labs, Snyk, Infosecurity Magazine, and CSO Online/Mandiant, accurate as of March 28, 2026. Harvey quotation sourced from harvey.ai/security. Verisk ISO generative AI exclusion reference reflects publicly available endorsement language effective January 1, 2026. CloseVector makes no representations about insurance coverage or insurability. Read your own policy and consult counsel.

The Cloud AISecurity StackIs a ShieldSales Pitch.

THE LITELLM BREACH: MARCH 24, 2026

Every badge answers a question.None of them answer the right one.

The Cloud AI
Security Stack
Is a Shield
Sales Pitch.

Every badge answers a question.
None of them answer the right one.