Skip to content

Chris Shayan × Backbase — Intelligence Layer

Source: LinkedIn post by chris-shayan, Head of AI at backbase, May 2026. Filed under: cross-domain convergence evidence for the 2026 harness-centric consensus.

The claim

The "hard problems" of banking agents were never language problems. LLMs are the linguistic engine but insufficient. Three walls:

  1. Signal vs noise"salary just dropped 20% — is that a job loss or a career transition?"
  2. State & memory"the agent recommended a product last Tuesday. The customer ignored it. Does it try again?"
  3. Consequence"if the agent nudges a customer to move savings into a term deposit and rates drop the following week, was that good advice?"

Shayan's proposed architecture sitting around the LLM: Signal Catalogue, Digital Twin, Nudge Mesh.

Cross-reference to wiki

Every claim in the post maps to a page already in the wiki, built from dev-tooling, ML-platform, and SRE sources. Shayan restated the same architecture in banking vocabulary.

Meta-claim — "LLM is essential but insufficient; architecture around it is where production lives"

→ This is the central axiom of harness-engineering (ryan-lopopolo): "the LM is the subroutine, not the program", and of control-flow-vs-prompt-flow (dexter-horthy): "don't use prompts for control flow if you can use control flow for control flow." Shayan's "start with the architecture, plug LLMs into the right places" is a direct paraphrase. ^[raw/articles/chris-shayan-backbase-intelligence-layer-2026.md]

Wall 1 — Signal vs noise → Signal Catalogue

  • jagged-intelligence — Shayan's "plausible isn't the same as right" IS Karpathy's verifiability-bias argument: LLMs produce confident answers in exactly the domains where reward signals are weakest.
  • control-flow-vs-prompt-flow — Signal Catalogue ≡ Horthy's pattern: classifier prompt + deterministic if/else + small focused branch prompts. Don't let the LLM decide whether it's a job loss or a career transition; let structured signals decide, then route.
  • do-not-outsource-thinking — the judgement about ambiguous life events is the thinking; the LLM handles the wording.

Wall 2 — State & memory → Digital Twin

  • learning-agent-loop (erin-ahmed, Cleric) is the closest match, almost verbatim: "Most agents are only ever able to [act]… the missing piece is operational memory. That's what allows you to complete the loop." Shayan's Digital Twin = Ahmed's operational memory at customer scope.
  • correction-must-persist-compound-visible — Ahmed's second lesson: the state has to persist, accumulate, and be inspectable. Maps to the Digital Twin being "persistent, evolving."
  • ambient-vs-directed-learning — customer-ignored-the-Tuesday-nudge is ambient signal; explicit "not interested" would be directed. Both must update the twin.
  • durable-observable-debuggable-agents (niels-bantilan) — the substrate: without durability the twin silently corrupts on infra failures.

Wall 3 — Consequence → Nudge Mesh

  • silent-failure-dropoff (maggie-konstanty / Prosus) is the most direct match — Prosus already solved Shayan's exact evaluation problem in food ordering: "we match conversation traces with conversion … which evaluator outcome ended up in conversion." Swap "conversion" for "risk-adjusted portfolio outcome" and you have Shayan's term-deposit question. Silent customer abandonment IS the modal unhappy signal.
  • pipeline-as-verifier — verification is not "did the LLM sound right" but "did the output match its spec, and does provenance check out" — critical in a regulated domain.
  • tokens-need-critique-loop (mikhail-parakhin) — restates Shayan's "plausible ≠ right": a critic agent is the minimum consequence check.
  • levels-of-autonomy-shapiro — Nudge Mesh's "when to stay silent" is governance: at what autonomy level is the agent allowed to push an unprompted nudge? L3 review vs L5 ship-and-tell is the exact decision.
  • driving-into-mud — unsupervised agents compound near-right pieces into jointly-wrong outcomes; an unchallenged nudge stream against a drifting customer model is the banking instance.

Prediction — "18 months, mostly the hard way"

Aligns with agent-orchestration-2026: 2026 is the year orchestration goes from novelty to core infrastructure. Teams that invert the stack (LLM-first, bolt on memory later) are already stuck in the upfront-investment-paradox.

Mapping table

Shayan (Backbase, banking) Wiki concept Original source
Signal Catalogue control-flow-vs-prompt-flow + jagged-intelligence Horthy, Karpathy
Digital Twin learning-agent-loop + correction-must-persist-compound-visible + durable-observable-debuggable-agents Ahmed, Bantilan
Nudge Mesh silent-failure-dropoff + levels-of-autonomy-shapiro + tokens-need-critique-loop + pipeline-as-verifier Konstanty, Shapiro, Parakhin, Sanchez
"LLM is insufficient; architecture is everything" harness-engineering Lopopolo
"18 months, the hard way" agent-orchestration-2026 Lloyd, Bantilan

Why this filing matters

Until now the wiki's harness-centric thesis is sourced from consumer dev-tooling (Cursor, Warp, Zed), ML-platform (Union, OpenAI Symphony), and SRE (Cleric, Flyte). Shayan is the first regulated-industry practitioner to converge on the same architecture independently. Banking has harder constraints than any of those:

  • Consequence is legally binding — a wrong nudge toward a term deposit has audit-trail and potentially mis-selling implications.
  • Silent drop-off is the default feedback — customers don't downvote a banking app, they just stop using it (Konstanty's thesis is even more load-bearing here).
  • Compliance gates every autonomy-level jump — Shapiro's ladder has to be traversed with regulators in the loop.

That a banking head-of-AI independently proposes Signal Catalogue / Digital Twin / Nudge Mesh and a software-factory engineer independently proposes Harness / Operational Memory / Verifier pipelines is cross-domain evidence the architecture isn't a dev-tooling fad — it's the shape the problem actually has.

Open questions / tensions

  • Shayan's framing is LLM-as-component; Lopopolo's is stronger — Lopopolo treats the agent as the full software engineer (harness-engineering axiom 3). Banking's regulatory posture may never allow that ceiling, which is an interesting constraint to name: the harness thesis has a regulation-capped variant where L5/L6 is structurally inaccessible and L4 is the permanent resting state.
  • Digital Twin vs RAG. Shayan doesn't distinguish; learning-agent-loop does — the twin must accumulate (persist, compound, visible), not merely retrieve. Worth probing Backbase on whether their twin is write-through or read-only-from-data-warehouse.
  • Nudge Mesh silence policy. Not yet named in wiki; could be a new concept page — "silence as an action" — with Shayan as a founding source alongside driving-into-mud's "disable self-verification when it can't verify."

Cross-references