Putting AI To The Test: How 5 Tools Performed Across 3 Financial Models

Ankura Consulting Group, LLC is an independent global expert services and advisory firm that delivers services and end-to-end solutions to help clients at critical inflection points related to conflict, crisis, performance, risk, strategy, and transformation. Ankura has more than 2,000 professionals serving 3,000+ clients across 55 countries. Collaborative lateral thinking, hard-earned experience, and multidisciplinary capabilities drive results and Ankura is unrivalled in its ability to assist clients to Protect, Create, and Recover ValueTM. For more information, please visit, ankura.com.

Explore Firm Details

Explore critical insights across multiple sectors including Middle East construction challenges amid regional conflict, the intersection of financial reporting valuations and tax compliance...

United States Technology

Article Insights

Ryan Muroff’s articles from Ankura Consulting Group LLC are most popular:

within Technology topic(s)
in United States
with readers working within the Healthcare and Securities & Investment industries

Every finance team is being asked the same question: Should we be using AI to build our models? The Ankura OCFO® team ran identical prompts through Claude for Excel, M365 Copilot (Agent Mode), ChatGPT in Excel, Shortcut (Free Tier – Open SourceModel), and Shortcut (Pro Tier) against a fictional company (Fieldstone Retail Holdings, approximately $180 million revenue, three entities).

The three test scenarios — a Long-Range Plan, 13-Week Cash Flow Forecast, and Customer Profitability Analysis — were selected because they mirror common financial modeling exercises in PE-backed finance and operations environments. AI was used to generate the fictional company data and over 500 word prompts for each scenario.

The goal was to establish a baseline comparison of AI financial modeling tools available on the market. What did we find? All five tools produced structurally complete models in 7-10 minutes, but no tool produced a production-ready model. Every tool had fundamental formula mechanics wrong or an incomplete, inconsistent structure. The value today is in scaffolding and structure; the formulas still require meaningful human rework. Critically, each prompt was delivered as a single pass with no iteration or follow-up, and identical prompts were copied across all five tools to isolate baseline capability. If a tool asked a follow-up question, the default was to accept its recommendation. With iterative prompting utilizing currently available features, results would likely improve.

Putting AI To The Test: How 5 Tools Performed Across 3 Financial Models

Contributor

Technology

Contributor

United States