Skip to content

  • Privacy Policy
  • Terms of Service
  • Contact
  • Privacy Policy
  • Terms of Service
  • Contact

  1.   »  
The Last of us

EPIC AI Hub

A research and engineering blog on agentic AI, systematic trading, and market microstructure.
Learn more

About

EPIC AI Hub is a research and writing project documenting nine years of work on machine learning trading systems. The focus is on the engineering reality of building autonomous trading architectures — how these systems actually work, where they fail, and how the field is evolving. \n\n This site does not sell products, manage capital, or provide investment advice. It exists as a record of technical thinking and a place to publish work that might be useful to others working on similar problems.

The Last of us

What this work is about:

Architecture of agentic AI systems for trading — how autonomous agents perceive market state, coordinate decisions, and execute in live conditions \n Market microstructure and order flow — the patterns and dynamics that repeatedly arise, and what they reveal about participant behavior \n Engineering practice for systematic trading — the unglamorous reality of building these systems \n History and evolution of quantitative methods — how the field has developed and where it appears to be heading

Blog Posts

Resources

From Single-Instrument Prototype to Multi-Asset Agentic Architecture

From Single-Instrument Prototype to Multi-Asset Agentic Architecture

November 20, 2025May 18, 2026 Brendan LettAbout, Article

The technical evolution of a long-running research project

The work that became EPIC AI began in 2016 with a deliberately brutal testbed: crude oil futures. Few instruments combine extreme volatility, discontinuous liquidity, and microstructure complexity in the same way crude does. Crude oil was never the end goal — it was the engineering filter. The premise was that an architecture which had to survive in that environment would be more honestly stress-tested than one developed against easier instruments.

What follows is a technical timeline of how the project evolved over nine years, from a single-instrument supervised learning prototype into a multi-component agentic architecture. It is written for engineers and researchers working on similar problems, not as a product description. The architecture described here is an internal research project. It is not licensed, sold, or made available to outside parties, and this article does not constitute an offer to do so.

Phase 1 — Supervised learning on crude oil (2016–2019)

The earliest version of the system was narrow by design. A single instrument, supervised deep neural networks trained on tick-level data, and an early library of geometric and probabilistic chart pattern models. The first implementations of incremental position scaling and re-pegging logic emerged in this phase, mostly as the byproduct of trying to handle the specific ways crude oil futures discontinuities punish naive approaches.

The lesson from this period was that single-instrument mastery is necessary but insufficient. The architecture functioned, but it had been shaped against one specific market’s pathologies. Whether the underlying approach would generalize was an open question.

Phase 2 — Componentization and reinforcement learning (2020–2024)

The second phase was structural. Instead of a monolithic system, the architecture decomposed into specialized components: market state perception, order flow analysis, risk evaluation, execution, and self-diagnosis. Each component could be developed, tested, and replaced independently. This made the system harder to reason about as a whole but allowed faster iteration on any individual area.

The other significant shift in this phase was the introduction of reinforcement learning loops that operated intraday rather than overnight. Overnight retraining cycles are common in the literature but they encode an assumption — that today’s market is similar enough to yesterday’s that yesterday’s adjustments will still be useful tomorrow. In practice, that assumption breaks down precisely when it matters most. Moving to intraday adaptation was an attempt to let the system respond to regime shifts as they happened rather than after the fact.

Expansion beyond crude began in this phase with Nasdaq-100 futures. The selection criteria were deliberate: another high-volume, machine-driven market with different microstructure characteristics. The goal was to test whether the architecture had genuinely generalized or whether it had simply learned crude oil very well.

Phase 3 — Agentic autonomy (2024–2025)

The third phase was about removing human intervention from the optimization loop. The earlier phases retained human-in-the-loop steps for parameter tuning, regime classification, and certain decision points. Phase 3 work focused on replacing these with autonomous agents — software components with goal-directed behavior and continuous self-retraining on live microstructure data.

This required substantial infrastructure work: isolated per-instrument server clusters, containerized execution environments, and cross-asset correlation agents that could pass context between instrument-specific systems. The architectural principle was strict separation between perception (what is the market doing), reasoning (what should we do about it), and action (how do we execute). Mixing these concerns creates systems that are difficult to debug and dangerous to deploy.

Architectural principles that survived every phase

A few design commitments held throughout the evolution:

No reliance on static backtesting for live decision logic. Backtesting is useful for validating that an idea is not obviously broken, but the assumption that historical data predicts live performance is a load-bearing claim that often fails silently. The system was designed to learn from live data rather than to act on patterns extracted from historical data.

Hard separation between perception, reasoning, and action. This is a discipline that has to be maintained against constant pressure to short-circuit it. When perception is uncertain, the temptation is to let the reasoning layer “help” by making assumptions. When action is slow, the temptation is to let the reasoning layer cache decisions. Both temptations corrupt the architecture and have to be resisted.

Continuous intraday evolution using data that cannot be historically recreated. This is the deepest commitment of the project and the one most at odds with conventional ML practice. Most of the interesting signal in microstructure data is information that exists only in the live moment — the sequence of order arrivals, the specific patterns of cancellation and replacement, the latency fingerprints of automated participants. None of this can be recreated from historical snapshots. A system that depends on it has to be trained against live data, which creates its own engineering challenges around evaluation and validation.

What the architecture is and is not

The result of nine years of this work is a multi-component agentic system designed to perceive market microstructure, coordinate decisions across specialized agents, and adapt continuously to changing conditions. It is not a magic alpha generator or a finished product. It is an architecture — a way of organizing the engineering problem — designed to handle kinds of complexity that defeat simpler approaches.

The interesting open questions in this work are not about whether agentic architectures can be built to trade; the engineering problem is tractable. The interesting questions are about what fails when these systems encounter conditions they were not designed for, how to detect that failure quickly, and how to architect systems that degrade gracefully rather than catastrophically. Those are the topics that future writing on this site will focus on.

Read More
EPIC IDENT — Architecture for Real-Time Order Flow Perception

EPIC IDENT — Architecture for Real-Time Order Flow Perception

November 20, 2025May 18, 2026 Brendan LettArticle

One component of an agentic trading system, focused on the problem of perceiving market microstructure in live conditions

In a multi-component agentic trading architecture, perception is the foundational layer. Before any reasoning or action can happen, the system needs to maintain an accurate, high-resolution model of what the market is currently doing. EPIC IDENT is the component of the architecture responsible for this — specifically, for processing order flow and microstructure data into a representation that downstream agents can reason about.

This article describes the design of IDENT as a research component. It is written for engineers working on related problems and for readers interested in how this layer of an agentic trading architecture is structured.

IDENT is an internal research component. It is not licensed, sold, or made available to outside parties, and this article does not constitute an offer to do so.

The problem IDENT is designed to solve

Order books are not just static lists of resting orders. They are continuously changing surfaces shaped by the behavior of every participant in the market — humans, slow algorithms, fast algorithms, market makers, directional traders, and the various species of automated systems that have come to dominate modern markets. The information that matters is rarely in any single snapshot of the book; it’s in the patterns of how the book changes over time.

A system trying to perceive what is happening in a market has to extract several distinct kinds of information from this stream: where genuine liquidity is concentrated (as opposed to liquidity that will be pulled the moment it’s pressured), whether aggressive flow is being absorbed or is moving the market, what regime the participants are operating in, and whether the pattern of activity matches any of the recognizable behavioral signatures that recur in microstructure data.

These are different problems with different time horizons and different signal characteristics. IDENT is designed as a perception layer that addresses them in parallel and presents the results in a form that reasoning agents can use.

Architectural overview

IDENT operates as a context provider for the rest of the system. Reasoning agents query IDENT for the current state of the market; IDENT does not itself make trading decisions. This separation is important. Mixing perception and reasoning creates systems that are difficult to debug because incorrect actions can stem from either layer, and the diagnostic process becomes ambiguous. Keeping the layers separate means perception failures and reasoning failures can be diagnosed independently.

The component handles several distinct analytical functions:

Liquidity absorption analysis. When aggressive flow encounters resting liquidity, the behavior of that liquidity reveals information. Does it absorb the flow and replenish, suggesting genuine size? Does it pull immediately, suggesting iceberg detection or stop-running setups? Does the replenishment rate decay over the course of an absorption sequence, suggesting exhaustion? These patterns are recognizable but require continuous tracking across the order book to identify reliably.

Microstructure pattern recognition. Certain patterns recur across markets and time periods — not in their specific numerical form, but in their structural shape. The shape of a spoofing pattern, the shape of liquidity walk-up, the shape of a coordinated cancellation sequence. A pattern recognition layer that maintains a library of these structural patterns and matches against them in real time provides downstream agents with categorized context rather than raw data.

These are patterns the system is designed to recognize in order to inform downstream decisions about resting and aggressive participation in the market. The system does not generate or replicate these patterns.

Participant behavior fingerprinting. Different automated participants in a market have characteristic behavioral signatures — recognizable patterns of order placement, cancellation timing, response to specific market conditions. These signatures evolve as the participants update their systems, but the rate of evolution is slow enough that maintaining a dynamic library of recognized behavioral patterns is tractable. When the pattern of activity matches a known signature, that context is meaningful for downstream decisions.

Regime classification. Markets move through different regimes — trending, range-bound, volatile, compressed, liquid, illiquid — and the appropriate response to any given pattern depends on which regime the market is currently in. A perception layer that classifies the current regime and detects transitions between regimes provides essential context for reasoning agents.

Integration with the broader architecture

IDENT outputs probabilistic assessments rather than binary signals. The output is not “buy” or “sell” or even “liquidity is being absorbed”; it’s something closer to “the pattern of activity in the last N seconds matches the absorption signature with confidence X, conditional on the current regime being Y.” This format is more useful for downstream reasoning because it preserves uncertainty and allows the reasoning layer to weight perception against other inputs rather than treating perception as ground truth.

The integration discipline is that reasoning agents are required to query perception fresh rather than caching previous outputs. This is more expensive computationally but it prevents a class of bugs where the reasoning layer acts on stale perception. In a system where market state can change meaningfully in milliseconds, stale perception is worse than no perception.

What this is not

A few clarifications about what IDENT is and is not, because the order flow analysis space includes both genuine technical work and a substantial amount of vendor marketing; this article is the former and should not be read as the latter.

IDENT is not a signal generator that predicts price movement. It is a perception layer that describes market state. What downstream agents do with that description is a separate problem solved by separate components.

IDENT does not depend on any particular data feed or vendor relationship. The architecture is designed around processing order flow data; the specific source of that data is an implementation detail that can vary across deployments.

IDENT is not a finished product. It is a component of a research architecture that has been developed and refined over nine years. The interesting questions about it are about its failure modes and limits — when does the pattern library miss patterns that turn out to matter, when do behavioral signatures shift faster than the system can adapt, when does regime classification lag behind actual regime change. These are the topics future writing will address.

Why perception matters more than people think

A common pattern in systematic trading is to focus enormous engineering effort on the reasoning and execution layers while treating perception as a solved problem — assuming that good data plus standard processing equals adequate market state representation. In practice, this is wrong. The hardest problems in this space are perception problems: figuring out what is actually happening in a market is much harder than figuring out what to do once you know.

The argument for investing in perception architecture is that good perception makes mediocre reasoning effective, while bad perception makes excellent reasoning useless. A reasoning layer acting on misperceived market state is going to make decisions that are wrong in ways that are difficult to diagnose, because the diagnostic process will keep coming back to “the reasoning seems sound, the action was correct given the perception, so why did the outcome fail?” The answer is often that perception was the failure point all along.

This is the conviction behind the IDENT design and the reason it gets disproportionate engineering attention within the broader research architecture this work is part of.

Read More
How Trading Is Evolving Toward Its Next Stage

How Trading Is Evolving Toward Its Next Stage

June 4, 2025May 18, 2026 Brendan LettArticle

A note on the long arc of exchange, from biological mutualism to autonomous intelligence

The act of identifying a need and offering a solution through exchange is one of the older patterns in human life — older than writing, older than cities, arguably older than the species itself if you take a broad enough view of what exchange means. Long before spreadsheets and order books, living systems from plant root networks to ancient settlements operated through exchanges that propelled survival, adaptation, and growth. Trade is not an invention. It is an extension of a much older rhythm.

This piece is a brief look at that long arc — where exchange comes from, how it has accelerated over recorded history, and where the current trajectory of autonomous intelligence appears to be taking it.

The deep roots of exchange

Trade in the human sense is a recent expression of a much older pattern. Mutualistic relationships in ecosystems, fundamental forces in physics, reciprocity in human societies — these all follow the same structural shape. Entities survive and evolve by exchanging value with other entities. The specific currencies vary (nutrients, energy, information, goods, money, attention) but the underlying pattern is consistent across scales.

This is worth keeping in mind when thinking about markets specifically. Markets are not a human invention imposed on neutral material. They are a particularly intricate and recent expression of a pattern that runs through living systems generally. Understanding markets as one instance of a broader phenomenon is more useful than treating them as a peculiar feature of modern economies.

The historical arc

Human history is in many ways the story of trade. Several of the most consequential moments in the recorded past happened at trading crossroads or were driven by trade dynamics:

Around 3500 BCE, Sumerian writing emerged in part to record commercial transactions. The Code of Hammurabi formalized commercial law around 1750 BCE. The Silk Road era moved not just goods but ideas and technologies across continents. The seventeenth century saw the emergence of public markets through institutions like the Amsterdam Stock Exchange. The twentieth and twenty-first centuries have seen electronic networks collapse time and distance in trading to the point where geography is barely a constraint.

Each of these moments altered the broader trajectory of the societies involved. Trade is not downstream of civilization; it is one of the substrates from which civilization is built.

The compression of cycles

Patterns of growth, peak, and decline are woven into the fabric of human societies, and these cycles often parallel the natural environment. The Medieval Warm Period and the Little Ice Age dovetailed with eras of prosperity and hardship. Empires from Rome to the Dutch and British rose on tides of commerce and declined when conditions shifted.

What is distinctive about the last century is the rate at which these cycles have compressed. The time between expansion, dominance, and renewal has fallen from centuries to decades, and now in some domains to years. This compression is driven by instant global communication, market integration, exponential technological progress, and the increasing share of automated and algorithmic activity in market structure.

A useful data point: research on S&P 500 tenure suggests the average lifespan of a constituent company has fallen from roughly six decades in the mid-twentieth century to under two decades today. The institutions that occupy the commanding heights of any given era are not the institutions that occupied those positions a generation earlier, and the rate at which these positions turn over continues to accelerate.

For anyone working in markets, this compression is the dominant feature of the environment. Strategies that worked for years now decay in months. Edges that were structural become contested within quarters. The implication is not that nothing works, but that durability looks different than it did a generation ago.

The current stage

Computers, the internet, and now artificial intelligence have transformed trading from a human-driven activity to a primarily system-driven one. Modern markets are arenas of data flow, where the edge belongs to architectures that can perceive, reason, and act faster and more accurately than human cognition allows on the time scales that matter.

Agentic AI systems represent the most recent development in this trajectory. These are autonomous, goal-directed software components that operate continuously and adapt their behavior based on what they observe. In market contexts, agentic systems do not replace the function that traders have always performed; they extend the evolutionary arc of exchange itself by enabling participation at time scales and complexity levels that human cognition cannot reach directly.

This is not a claim that agentic systems are the end state of market evolution. It is an observation that they are the current state, and that the trajectory they are part of has been running for a long time and shows no signs of stopping.

What this implies

From mycorrhizal networks to high-frequency order flow, the story of trade is a story of increasing speed, complexity, and embedded intelligence. The interesting question is not whether this trajectory continues but what it produces at each successive stage.

The current stage appears to be one in which markets themselves are becoming adaptive substrates — coordinated less by individual decisions and more by distributed autonomous systems that perceive, reason, and act faster than the participants who built them. Whether this development is net positive depends on questions that are well beyond the scope of this piece. What seems clear is that the trajectory is real and that engaging with it seriously requires understanding what kind of phenomenon it is.

The future of markets is unlikely to be human versus machine. It is more likely to be human and machine, evolving together along a trajectory that is itself far older than either.

Read More
Training Agentic Systems for Live Markets

Training Agentic Systems for Live Markets

May 20, 2025May 18, 2026 Brendan LettAbout

Notes on the iterative development process behind a long-running research project

Most public writing on training AI systems for trading falls into one of two registers. There’s the academic register, which describes formal training methodology in terms that don’t survive contact with live markets. And there’s the marketing register, which describes triumphant results without engaging seriously with what the training process actually involves. Neither register is particularly useful for engineers working on the problem.

This article is an attempt at a third register: an honest description of what iterative training of agentic systems looks like in practice, drawn from nine years of work on the problem. The framing here is general enough to be useful across systematic trading contexts; specifics about any particular implementation are kept deliberately abstract. The work described is part of an internal research project. Nothing in this article describes a product, service, or investment vehicle available to outside parties, and the article does not constitute an offer of any kind.

The basic shape of the work

At a high level, the iterative training cycle for an agentic trading system has four phases that repeat continuously: pushing agents to their limits, identifying failure points, retraining against those failures, and validating that the retraining actually addressed the underlying problem rather than just masking it.

Stated that simply, this sounds straightforward. In practice each phase contains failure modes that defeat most attempts. The reason this work takes years rather than months is that the failure modes are individually subtle, collectively numerous, and only become visible through extended operation in changing conditions.

Pushing agents to their limits

The first phase is about finding out where the system breaks. This is harder than it sounds because the obvious approach — testing against historical extreme conditions — produces results that don’t generalize. Agents trained against historical volatility events learn to recognize the specific shape of those events, not the general phenomenon of extreme conditions. When new extreme conditions arrive with a different shape, the recognition fails.

A more productive approach is to construct synthetic stress conditions that are structurally extreme without being historically specific. This means scenarios with extreme participant behavior, extreme liquidity discontinuities, extreme regime shifts — but not literal replays of 1987 or 2008 or 2020. The goal is to find the boundaries of the agents’ competence in a way that generalizes.

The discipline that has to be maintained here is to actually run these stress tests rather than just simulating them. There is a strong temptation to declare an agent robust because it passed simulated stress tests, when what those tests actually demonstrated is that the simulation faithfully reproduced the agent’s expected behavior in scenarios the agent was trained against. Real stress testing requires conditions the agent was not specifically trained for, and the willingness to discover that the agent fails in those conditions.

Identifying failure points

When agents fail under stress, the failures themselves are usually obvious. The question is what caused them. The answer is rarely the most visible factor.

A common pattern: an agent fails to respond appropriately to a volatility spike. The visible failure is in the execution layer. The actual cause is in the perception layer, which misclassified the regime and provided wrong context to the reasoning layer, which made the wrong decision, which the execution layer faithfully executed. Tracing failures to their actual source requires careful logging of every layer’s inputs and outputs, and the discipline to actually trace through the chain rather than fixing the visible failure and moving on.

A second common pattern: an agent fails in a specific market condition and the diagnosis points to the training data not covering that condition. The fix appears obvious — add training examples for that condition and retrain. This is usually wrong. The actual problem is more often that the agent was trained against the wrong representation of the condition, or that the condition exposes a deeper architectural limitation that additional training is unlikely to resolve. Throwing more data at the visible symptom can mask the underlying problem in ways that produce confidence without competence.

The disciplined approach is to treat every failure as potentially diagnostic of architecture rather than data, and to resist the temptation to declare the diagnosis complete once a plausible cause has been identified. The plausible cause is usually the surface; the actual cause is usually a layer or two deeper.

Retraining against failures

Retraining is the phase where the temptation to make the system look better is strongest, and therefore where the most damage can be done.

The technically correct approach is to update the system in ways that address the diagnosed failure without overfitting to the specific scenario that surfaced it. This is harder than it sounds because the natural feedback loop — the agent now performs well in the scenario that previously caused failure — provides immediate positive reinforcement for changes that may actually narrow the agent’s competence rather than broadening it. A retraining update that improves the agent in one specific scenario while subtly degrading its behavior in others looks like a win in the moment and becomes a problem later.

The discipline that addresses this is comprehensive validation across the full range of conditions the agent is expected to handle, not just the scenario that prompted the retraining. This is expensive — it means running validation against everything you can construct, not just the failure case — but it’s the only way to detect regression.

A second discipline is to be willing to back out changes when validation reveals problems, even when the changes addressed the original failure. The natural inclination is to keep the fix and try to address the regression with additional fixes. This rapidly produces systems that are accretions of patches rather than coherent architectures. Sometimes the right answer is to recognize that the original failure cannot be addressed within the current architecture and that more fundamental work is needed.

Validation

The last phase is about confirming that the work actually accomplished something. This sounds trivial and is in fact one of the hardest parts of the cycle.

The challenge is that the conditions an agent will encounter in live operation are not the same as the conditions used to validate it. Validation against known conditions can only describe behavior against those conditions; live operation is where the agent encounters conditions that don’t match anything in the validation set. A system that validates well but behaves poorly in live operation is one of the most demoralizing failure modes in this work, partly because the cause is often that validation was too closely matched to training rather than reflecting the actual generalization the system needs to demonstrate.

The discipline that addresses this is to maintain validation conditions that are deliberately not derived from training data, and to update validation conditions less frequently than training data so that successive training cycles can be compared against a stable reference. This is operationally painful — it means validation results sometimes look worse than training results suggest they should — but it provides actual information about whether the system is improving.

A second discipline is to weight live behavior more heavily than validation results when they disagree, and to be skeptical of any pattern where validation steadily improves while live behavior does not. This pattern usually indicates that training is moving toward validation rather than toward live generalization, which is a failure mode that gets harder to detect the longer it continues.

What this cycle is not

A few things this iterative training process is not, because the description above can sound more clean than the underlying work actually is:

It is not a process that produces a finished system. The system at any given moment is the current state of a continuous development effort. The framing of “training complete, deploy to production” doesn’t apply because the conditions the system operates in continue to evolve and the system has to evolve with them.

It is not a process where each iteration is straightforwardly an improvement on the last. Many iterations make the system worse in ways that aren’t immediately visible. Recognizing and reverting bad iterations is a substantial part of the work.

It is not a process that scales naively with computational resources. More compute applied to the same architecture produces marginal returns once the architecture has been adequately exercised. The bottleneck is usually conceptual rather than computational — figuring out what to do, not running more iterations of what you already know how to do.

Why this work takes years

The cycle described above takes years to execute well for one fundamental reason: many of the failure modes only become visible through extended operation in changing conditions. A system can pass every validation test for months and then encounter a market condition that exposes a limitation no one anticipated. The only way to find these limitations is to run the system through enough varied conditions for them to surface, and there is no way to compress this timeline through more aggressive testing.

This is the part of systematic trading work that is least appreciated by people who haven’t done it. The intellectual content of the architecture can be developed relatively quickly. The reduction to working code can be done in months. The actual maturation of the system into something that handles real conditions reliably takes years, and the years cannot be skipped.

This is also the reason that public claims about systems built in months should be treated with skepticism. Whatever exists at that point is a prototype. Whether it survives extended live operation is an open question that only time can answer.

Read More

  • Privacy Policy
  • Terms of Service
  • Contact

EPIC AI Incorporated, EIRL publishes research and engineering writing on systematic trading, agentic AI, and market structure. This site does not offer investment advice, solicit investments, or describe any specific investment product, fund, or vehicle. Nothing on this site constitutes an offer to sell or a solicitation of an offer to buy any security, commodity interest, or investment product. Content is provided for informational and educational purposes only and reflects the personal views of the author.

© 2026 EPIC AI Incorporated, EIRL All rights reserved.

Proudly powered by WordPress |