ARTICLE
26 March 2026

AI Model Validation In Regulated Financial Firms: Supervisory Expectations And Practical Considerations

CG
Cahill Gordon & Reindel LLP

Contributor

With a history of legal innovation dating back to the firm’s founding in 1919, Cahill Gordon & Reindel LLP is trusted by market-leading financial institutions, companies and their boards to manage significant litigation, regulatory matters and transactions. The firm is based in New York with offices in London and Washington, D.C.
In this client alert, Frank J. Weigand and Louis Capizzi examine considerations that are becoming more critical for broker-dealers and investment advisers as the use of AI continues to expand.
United States Technology
Frank Weigand’s articles from Cahill Gordon & Reindel LLP are most popular:
  • within Technology topic(s)
  • in United States
  • with readers working within the Aerospace & Defence, Banking & Credit and Business & Consumer Services industries
Cahill Gordon & Reindel LLP are most popular:
  • within Finance and Banking topic(s)

I. Introduction

Over the past several years, broker-dealers and investment advisers have begun incorporating artificial intelligence ("AI") tools into an expanding range of regulated and operational workflows, including customer communications, trading strategies, investment research, compliance monitoring, and internal administrative processes.1 While the adoption of these technologies continues to accelerate, U.S. securities regulators have consistently emphasized that existing legal and regulatory obligations remain fully applicable as they apply to this new technology.2

The SEC, FINRA, and other financial regulators have repeatedly underscored that their rules are technology neutral, meaning that the same supervisory, recordkeeping, disclosure, and investor-protection requirements apply whether a task is performed by a human employee, traditional software, or an AI system.3 This principle has important practical implications. For example, FINRA has reminded member firms that AI tools used in business processes must be subject to supervisory systems and governance structures in the same manner as other technologies used in regulated workflows.4 The consequence of this is straightforward: when firms deploy AI in supervised workflows, regulators expect supervisory systems to account for the integrity, reliability, and accuracy of the models, the sufficiency of policies and procedures surrounding those models, and the protection of client information.5

Against this backdrop, it is clear that the challenge for regulated firms is not implementing new regulatory obligations, but translating existing requirements into governance frameworks appropriate for AI-enabled systems. In practice, this involves determining how to adapt in a reasonable fashion long-standing compliance concepts—such as supervision, model validation, recordkeeping, privacy controls, and vendor oversight—to technologies that function quite differently from legacy systems.

This memorandum examines issues that may become increasingly important for broker-dealers and investment advisers as AI adoption continues to expand. And considers how traditional model-validation concepts may need to evolve when applied to generative AI systems that produce nondeterministic outputs.

II. Why Traditional Validation Concepts Strain under Generative AI

Traditional model validation frameworks in financial services were developed in the context of deterministic systems—models whose outputs can be evaluated against stable benchmarks using repeatable inputs.6 In many established risk and quantitative modeling environments, validation methodologies assume that identical inputs should yield consistent outputs. This assumption underlies widely used validation techniques such as back testing, benchmarking, sensitivity analysis, and rule-based testing.7 The core assumptions consist of a bounded input space and a bounded output space with a repeatable true answer. Validation under these conditions is well understood: compare outputs to expectations, test at defined intervals, and document results. However, large language models ("LLMs") invert these assumptions. They generate output probabilistically, an identical prompt executed at different times can yield different outputs,8 and there is often no single correct answer against which to test. Conventional back testing and challenger-model comparisons become unreliable under such circumstances.9 The essential task of validating a model where variance is expected does not align with existing validation playbooks.

From a compliance and supervisory perspective, this distinction can create challenges that extend beyond technical LLM performance. Where firms deploy generative AI tools within regulated workflows—such as compliance monitoring, client communications, or investment research—variability in model outputs can complicate the ability of validation teams, internal audit, and regulators to apply conventional testing methodologies. Scripts, checklists, and sampling frameworks that assume stable outputs may prove less effective when language-based systems may produce a broad range of open-ended responses rather than deterministic calculations.

Regulators have increasingly highlighted these governance challenges. For example, FINRA's 2026 Annual Regulatory Oversight Report identifies a range of governance risks associated with AI deployment, including AI agents operating without human validation or approval, agents acting beyond the user's intended scope, difficulty auditing or explaining automated decisions, the storage or exposure of sensitive data, and misaligned reward functions that may produce unintended outcomes.10 The report also emphasizes well-known model risks such as bias, hallucinations, and related reliability concerns.11 In this environment, the central supervisory concern may not be the probabilistic nature of AI systems themselves, but whether firms can demonstrate that appropriate controls exist around how these tools are used. One practical implication is the growing importance of governance mechanisms designed to manage known limitations of generative models. For example, LLMs may produce so-called "hallucinations," generating information that appears plausible but is inaccurate or unsupported by source material.12 FINRA has highlighted the risk that such outputs could affect compliance reviews, client communications, or supervisory processes if not properly monitored.13

Moreover, there is a human-factors dimension that compliance professionals should take seriously. Automation bias—the tendency for humans to accept machine-generated outputs uncritically, especially when they appear authoritative or technically sophisticated—can exacerbate these risks.14 If an associated person accepts an AI-drafted communication without meaningful review, or a compliance analyst relies on an AI-generated surveillance summary without independent verification, a human checkpoint may become a formality rather than a control.

In sum, the validation problem presents as a governance problem. Legacy testing techniques—scripted tests, checklists tied to stable outputs, narrow sampling—may be poorly equipped in the era of LLMs.

III. Emerging Validation Concepts

Though traditional validation methods may not work well to validate probabilistic, language-based systems, the answer is not to abandon validation but to rethink how it can be achieved. To address these gaps, new validation approaches have begun to emerge. These range from basic guidance on responsible supervision of AI systems to more detailed governance frameworks. This section details some of these concepts.

A. Active Monitoring and Human-In-the-Loop Oversight

One fundamental aspect of emerging AI governance frameworks is an emphasis on human-in-the-loop oversight and controls designed to ensure that AI-generated information is independently reviewed before it is relied upon in regulated workflows.15 In the broker-dealer context, human-in-the-loop oversight generally refers to supervisory structures that require a qualified person to review, validate, or approve AI-assisted outputs before those outputs are used in customer communications,16 compliance processes and conclusions, trading decisions, or other regulated activities. This approach is consistent with the long-standing supervisory requirements well understood in the financial services industry.17

While regulators have not mandated any particular governance structure, FINRA and the SEC's technology-neutral approach to regulation suggests that the key question for firms will be whether their supervisory systems clearly allocate responsibility for reviewing and approving outputs generated by AI-enabled processes.18 FINRA Rule 3110 requires broker-dealers to maintain a supervisory system reasonably designed to achieve compliance with applicable securities laws and FINRA rules, and that obligation applies regardless of whether a task is performed by a human employee or an automated system. As such, one supervisory approach would be to treat AI systems as participants within the supervisory chain. Under this model, a designated supervisory principal or other responsible person retains accountability for reviewing or approving AI-generated outputs used in regulated activities, analogous to the oversight applied to human employees.

In practice, human-in-the-loop oversight often operates through monitoring and validation controls embedded in a firm's AI governance framework. These controls may include ongoing monitoring of prompts, responses, and outputs to confirm that a generative AI system continues to perform as expected and produces compliant results and implementing validation and human review of model outputs, including periodic checks for errors or bias.

B. Limiting Model Drift

A related challenge for firms deploying generative AI systems is the phenomenon commonly described as model drift—the gradual deterioration or alteration of model behavior over time as inputs, data environments, or usage patterns evolve. Traditional financial models are also subject to drift, typically arising from changes in market conditions or shifts in the statistical relationships underlying model assumptions. However, generative AI systems introduce additional dimensions of drift because their outputs are shaped not only by training data, but also by prompt design, system instructions, and evolving user behavior.

Moreover, generative AI tools are often deployed within shifting software ecosystems that include retrieval systems, document repositories, application programming interfaces, and third-party model providers. Changes to any component of this ecosystem, or the addition of new data sources may alter system behavior. In some cases, drift may arise not from changes in the model itself but from how users interact with the system, as employees experiment with new prompts or apply the tool to use cases beyond its original design.

Regulators have begun to emphasize the importance of monitoring these dynamics as part of firms' broader AI governance programs.19 FINRA has noted that firms adopting AI technologies should incorporate life-cycle testing and monitoring into their supervisory frameworks, including maintaining inventories of AI models and evaluating their reliability and accuracy over time.20 The use of AI in compliance, surveillance, or trading processes may therefore require ongoing review to ensure that model outputs remain consistent with the firm's policies, procedures, and risk appetite.

To view the full article clickhere

To subscribe to Cahill Publications Click Here

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

[View Source]

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More