The technical evolution of a long-running research project
The work that became EPIC AI began in 2016 with a deliberately brutal testbed: crude oil futures. Few instruments combine extreme volatility, discontinuous liquidity, and microstructure complexity in the same way crude does. Crude oil was never the end goal — it was the engineering filter. The premise was that an architecture which had to survive in that environment would be more honestly stress-tested than one developed against easier instruments.
What follows is a technical timeline of how the project evolved over nine years, from a single-instrument supervised learning prototype into a multi-component agentic architecture. It is written for engineers and researchers working on similar problems, not as a product description. The architecture described here is an internal research project. It is not licensed, sold, or made available to outside parties, and this article does not constitute an offer to do so.
Phase 1 — Supervised learning on crude oil (2016–2019)
The earliest version of the system was narrow by design. A single instrument, supervised deep neural networks trained on tick-level data, and an early library of geometric and probabilistic chart pattern models. The first implementations of incremental position scaling and re-pegging logic emerged in this phase, mostly as the byproduct of trying to handle the specific ways crude oil futures discontinuities punish naive approaches.
The lesson from this period was that single-instrument mastery is necessary but insufficient. The architecture functioned, but it had been shaped against one specific market’s pathologies. Whether the underlying approach would generalize was an open question.
Phase 2 — Componentization and reinforcement learning (2020–2024)
The second phase was structural. Instead of a monolithic system, the architecture decomposed into specialized components: market state perception, order flow analysis, risk evaluation, execution, and self-diagnosis. Each component could be developed, tested, and replaced independently. This made the system harder to reason about as a whole but allowed faster iteration on any individual area.
The other significant shift in this phase was the introduction of reinforcement learning loops that operated intraday rather than overnight. Overnight retraining cycles are common in the literature but they encode an assumption — that today’s market is similar enough to yesterday’s that yesterday’s adjustments will still be useful tomorrow. In practice, that assumption breaks down precisely when it matters most. Moving to intraday adaptation was an attempt to let the system respond to regime shifts as they happened rather than after the fact.
Expansion beyond crude began in this phase with Nasdaq-100 futures. The selection criteria were deliberate: another high-volume, machine-driven market with different microstructure characteristics. The goal was to test whether the architecture had genuinely generalized or whether it had simply learned crude oil very well.
Phase 3 — Agentic autonomy (2024–2025)
The third phase was about removing human intervention from the optimization loop. The earlier phases retained human-in-the-loop steps for parameter tuning, regime classification, and certain decision points. Phase 3 work focused on replacing these with autonomous agents — software components with goal-directed behavior and continuous self-retraining on live microstructure data.
This required substantial infrastructure work: isolated per-instrument server clusters, containerized execution environments, and cross-asset correlation agents that could pass context between instrument-specific systems. The architectural principle was strict separation between perception (what is the market doing), reasoning (what should we do about it), and action (how do we execute). Mixing these concerns creates systems that are difficult to debug and dangerous to deploy.
Architectural principles that survived every phase
A few design commitments held throughout the evolution:
No reliance on static backtesting for live decision logic. Backtesting is useful for validating that an idea is not obviously broken, but the assumption that historical data predicts live performance is a load-bearing claim that often fails silently. The system was designed to learn from live data rather than to act on patterns extracted from historical data.
Hard separation between perception, reasoning, and action. This is a discipline that has to be maintained against constant pressure to short-circuit it. When perception is uncertain, the temptation is to let the reasoning layer “help” by making assumptions. When action is slow, the temptation is to let the reasoning layer cache decisions. Both temptations corrupt the architecture and have to be resisted.
Continuous intraday evolution using data that cannot be historically recreated. This is the deepest commitment of the project and the one most at odds with conventional ML practice. Most of the interesting signal in microstructure data is information that exists only in the live moment — the sequence of order arrivals, the specific patterns of cancellation and replacement, the latency fingerprints of automated participants. None of this can be recreated from historical snapshots. A system that depends on it has to be trained against live data, which creates its own engineering challenges around evaluation and validation.
What the architecture is and is not
The result of nine years of this work is a multi-component agentic system designed to perceive market microstructure, coordinate decisions across specialized agents, and adapt continuously to changing conditions. It is not a magic alpha generator or a finished product. It is an architecture — a way of organizing the engineering problem — designed to handle kinds of complexity that defeat simpler approaches.
The interesting open questions in this work are not about whether agentic architectures can be built to trade; the engineering problem is tractable. The interesting questions are about what fails when these systems encounter conditions they were not designed for, how to detect that failure quickly, and how to architect systems that degrade gracefully rather than catastrophically. Those are the topics that future writing on this site will focus on.
