Fraud Is the Only Honest Benchmark for Financial AI

Authored by

Nicole Hemsoth Prickett

State Street’s Anusha Nerella defines financial AI as a systems race of real-time data fusion, sub-millisecond inference, immutable audit trails, and adaptive agent training.

Nevermind portfolio optimization or sentiment analysis, if you ask Nerella, fraud is the real proving ground for AI in finance.

It is here, she says, where the adversary learns faster. “Fraudsters were weeks ahead of the humans,” she said, describing how financial systems fell behind in some key lapses in recent years.

At any high-frequency trading firm, latency is everything. Microseconds separate a valid transaction from an exploit. Nerella told us at the AI Infra Summit that the firm’s infrastructure processes hundreds of thousands of orders per second across distributed systems tuned for sub-millisecond settlement. When a data pipeline stalls, or when a fraud detection rule adds just 10 milliseconds of inference delay, the system loses equilibrium.

That’s the narrow operating window where AI now has to live. “We are in the implementation phase, and we are production ready,” she said. “It’s not about any buzz anymore.”

Nerella points to the 2020 Robinhood outage, which she says showed what happens when systems optimized for throughput fail to model failure itself.

That day, markets opened to record volatility, with retail order volumes surging past 20 million in a single morning. Robinhood’s backend, which she describes as built on multiple microservices querying a single account aggregation system, saturated its own API endpoints. Event queues filled faster than they could clear. The database’s time synchronization drifted under load, corrupting session tokens and locking out users.

“If AI was there and proper governing controls was implemented don’t you think that could have been eradicated or avoided prior to meeting this kind of scenario?” Nerella asked.

What she is talking about there can be considered as something of an AI-driven control plane driven by an agent trained on historical telemetry that could recognize precursors to overload, dynamically rebalance traffic, or trip fail-safes before cascading collapse. The same logic applies to fraud, she says, namely by emphasizing recognition, anticipation, and preemption.

Nerella also cites India’s Unified Payments Interface (UPI) as a more direct case. By 2020, she says, UPI was clearing over 1.3 billion transactions a month, often exceeding 45 million a day. Each transfer involves multiple message hops (originating bank, clearing switch, destination bank) all required to validate in under two seconds. Fraud actors exploited those windows via scripts that could spoof user confirmations, replay requests across gateways, and inject transactions at the instant before reconciliation. The fraud volume spiked 30–40% above baseline.

“Fraudsters were weeks ahead of us in doing all sort of fraudulent invocations,” Nerella said.

Fraud networks iterate faster than regulated systems can approve updates. Attackers retrain their scripts within hours. Banks deploy new detection rules in weeks.

Nerella’s prescription is to invert that. “Training agents with specific fraud detection models and giving specific use cases, training them to detect in a sample stage environment will make it more effective.” Her view is that detection must evolve like offense, trained continuously, validated under live loads, and deployed in modular increments without full system restarts.

Nerella also argues that fraud is a data unification challenge disguised as a security one.

A single transaction can touch six or more systems (card network, merchant gateway, issuing bank, acquirer, settlement processor, and analytics pipeline) each logging in different schemas and time zones.

“It's all about what kind of unified data that we are feeding into the system. It's not about just data is being fed, and there is a missing of reference data that has to make it meaningful.”

To make AI viable here, data has be consolidated into a time-aware ledger with transaction metadata, behavioral context, risk scoring, and historical reference all mapped within microseconds.

Then comes compliance, the other constraint most discussions treat as bureaucratic. In practice, it’s an engineering problem. Every AI-driven decision, every “decline,” “review,” or “approve” action, has to be mirrored to an immutable audit log.

And explainability is another element that can’t be overlooked. Models deployed in fraud detection are often ensembles trained on behavioral features that shift daily: velocity of transactions, geolocation variance, device signatures, transaction clustering. Without interpretability, model drift goes unnoticed until entire systems misclassify legitimate users.

“If we don't explain to agents, agents will lack the clarity, and they will not perform up to the mark whatever we are expecting.”

Nerella also argues that AI-driven defense has to be multi-site and multi-zone to match attack topology.

“Multi site operations like even though we are dealing with large corporations, I would say this has to be operated at multi sectors, multi sites, multi zones.” She envisions repeatable modules that can replicate across jurisdictions, sharing learned patterns without centralization bottlenecks.

“Scalability is all about, repeatable, modular systems.” It’s the same logic behind federated learning, agents trained locally on each bank’s sensitive data, sharing only model weights to improve collective defense without violating privacy laws.

Energy too is worth considering Nerella says. Fraud detection at this scale is computationally brutal. A single global institution can run real-time inference on billions of transactions per day. At 100 operations per inference and 10 million inferences per second, that’s on the order of a teraflop sustained load (roughly what an entire GPU cluster consumed continuously a decade ago).

“Datacenters are using 3% of the global electricity use,” she noted, “and when AI workloads are utilizing the energy cost extensively on top of what we are already using… 25 to 60% energy savings are with neuromorphic processors.”

Nerella believes neuromorphic hardware, modeled on biological neurons, could allow edge systems like ATMs, merchant devices, even POS terminals to make lightweight fraud decisions locally. Her analogy is a traffic signal.

In centralized control, a single failed node creates a jam. But replace that with neuromorphic chips, and each signal processes context locally, communicating laterally to its neighbors. “It will instantly work locally with local decisioning. It's not a single center communicating to them,” she argues.

In financial networks, that translates to micro-agents at the edge that infer and flag anomalies before data ever reaches the datacenter, cutting both latency and power consumption.

More broadly, across all these aspects of mitigating fraud, she says 70% institutions are at least deploying a single component into production,” but adds that partial adoption won’t hold. Fraudsters don’t attack one node, they probe the weakest one.

Fraud Is the Only Honest Benchmark for Financial AI

More from this topic