Levelbrook Labs

Building AI-Powered Financial Insights: Notes on Artificial Intelligence

Try the interactive demo

The domain of finance presents a fascinating duality. On one hand, it's a world of absolute structure: ledgers that must balance, time-series data of market ticks, and rigorously defined financial statements. On the other, it's driven by unstructured narrative: news reports, analyst commentary, central bank minutes, and the sentiment buried in a CEO's conference call transcript. The genuinely interesting engineering problem in applying AI to this space isn't just optimizing a quantitative model or summarizing text. It's about building a cohesive system that can reason across both worlds—fusing hard numbers with soft context to synthesize something that approximates genuine insight.

This isn't a solved problem. It's a complex systems design challenge involving data pipelines, asynchronous job processing, real-time user interfaces, and—most critically—robust guardrails against machine error. Here are some notes on architecting such a system.

The Architectural Blueprint

A monolithic application won't cut it. The workloads are too different. We need a service-oriented architecture where specialized components handle distinct tasks, communicating via well-defined APIs, likely using JSON as the data interchange format.

1. The Data Foundation: A Dual-Model Approach

First, we have to model the two worlds. Structured, quantitative data—company fundamentals, market prices, economic indicators—is best stored in a relational database. PostgreSQL with an extension like TimescaleDB is an excellent choice for handling time-series data efficiently. Unstructured text data—10-K filings, news articles, earnings call transcripts—lives better as objects in a service like AWS S3 or Azure Blob Storage. The crucial link is the metadata. We need a vector database (or a Postgres instance with pgvector) to store embeddings of these documents, allowing for semantic search instead of just keyword matching.

2. The Quantitative Engine: Python at the Core

For heavy numerical computation and machine learning, Python is the undisputed standard. This is where libraries like PyTorch and TensorFlow come in. This service would be responsible for tasks like:

These are not quick, synchronous requests. A user's query should trigger an asynchronous job via a task queue like Celery. The service itself would run on dedicated compute, likely GPU-enabled instances on AWS or Azure, scaled independently of the main web application.

3. The Narrative Engine: NLP and Report Generation

This component bridges the gap. Its job is to take the structured output from the quantitative engine (e.g., "Projected Q4 revenue: $1.2B, a 5% YoY increase") and contextualize it with insights from the unstructured data. The workflow, often called Retrieval-Augmented Generation (RAG), looks like this:

  1. Receive numerical results and a high-level query (e.g., "Summarize Q3 performance and risks").
  2. Convert the query into a vector embedding and find the most relevant text chunks from the vector database (e.g., sections of the latest 10-K discussing "market risk," recent news about supply chain issues).
  3. Construct a detailed prompt for a Large Language Model (LLM). This prompt includes the quantitative results and the retrieved text snippets as context.
  4. Send the prompt to an LLM (via a managed service or a self-hosted model on Azure ML / AWS SageMaker) to generate a human-readable, narrative report.

4. The Orchestrator and UX: Real-Time Feedback

The user interacts with a web front-end built in JavaScript (perhaps with a framework like React). This client communicates with a primary back-end application, which could be written in anything from PHP to Ruby on Rails. This back-end acts as an orchestrator: it doesn't do the heavy lifting but instead dispatches jobs to the Python services and manages the user session.

Because the analysis can take seconds or even minutes, a responsive UX is non-negotiable. A simple request/response cycle with a loading spinner is a poor experience. This is a perfect use case for Server-Sent Events (SSE) or WebSockets. The orchestrator can push status updates to the client in real-time: {"status": "Fetching market data..."}, then {"status": "Running quantitative analysis..."}, and finally stream the generated report token by token as it comes back from the LLM. This creates a much more engaging and transparent experience.

Where It Breaks at Scale (And How We Fix It)

A proof-of-concept is one thing; a production system is another. The failure modes in a system like this are numerous.

The firehose of financial data never stops. A system that works with a curated dataset will fall over when faced with the real-time torrent of global market data, news feeds, and regulatory filings.

The Pragmatic Engineer’s Guardrail: Human-in-the-Loop

This is the most critical point. In finance, an error is not just a bug; it can be a catastrophic loss. LLMs are notorious for "hallucination"—confidently stating falsehoods. We cannot, under any circumstances, treat the AI's output as ground truth.

The only responsible way to build such a system is with a human-in-the-loop (HITL) architecture. The AI is a powerful associate, not an autonomous analyst. It generates a draft, not a final report.

The user interface must be designed for this workflow. Every claim in the generated text should be traceable to its source. If the report says, "The company cited supply chain pressures as a primary headwind," the user must be able to click that sentence and see the exact quote from the 10-K it was derived from. The human analyst must be able to edit, override, and ultimately be the one to approve and sign off on the final output. The system's goal is to augment human expertise, not replace it.

Closing Reflection

The core challenge here is less about inventing a novel algorithm and more about systems engineering. It's about orchestrating a complex ballet of data pipelines, distributed services, machine learning models, and real-time communication. The most elegant PyTorch model is useless if it's fed stale data or if its results can't be delivered to a user in a coherent, verifiable way. The true task is to build a "centaur"—a system where human intelligence and machine processing work together, each leveraging the other's strengths to produce an output that is more timely, comprehensive, and ultimately more valuable than either could achieve alone.