Building an AI-Powered Fraud Investigation Dashboard: Notes on Financial Crime Prevention

Patrick Donahue · Levelbrook Consulting

Financial crime prevention is one of those domains where the engineering challenges are as complex and dynamic as the problem itself. It's a high-stakes, adversarial game played at millisecond latencies and petabyte scale. The goal isn't just to build a system that works today, but one that can evolve faster than the opposition. This makes it a fascinating problem to decompose from a full-stack perspective.

I recently spent some time architecting a proof-of-concept for a real-time fraud alert and investigation dashboard. This write-up captures some notes on the domain, the architecture, and the pragmatic tradeoffs required to build something robust and useful.

Try the interactive demo

The Domain: A High-Stakes Game of Cat and Mouse

At its core, the problem is about identifying suspicious patterns in a massive stream of transactions. The technical interest comes from several constraints:

Latency: A decision to block or flag a transaction must often be made in under 100ms. This rules out many complex, batch-oriented architectures for the real-time path.
Scale: A large payment processor might see millions of transactions per hour. The system must handle these high write loads while simultaneously serving low-latency reads for feature lookups.
Data Complexity: A single transaction is almost meaningless in isolation. Its risk is a function of its relationship to historical data: user behavior, device reputation, merchant risk, geographic anomalies, and network effects (e.g., card testing rings).
Adversarial Nature: Unlike other engineering problems where you model a static system, here the system you're modeling is actively trying to deceive you. Fraudsters constantly change their tactics, rendering static rule-based systems obsolete.

This intersection of real-time processing, big data, and an adaptive threat model is what makes it a compelling engineering challenge. You can't just throw a model at it; you need a resilient, observable, and human-centric system.

System Architecture: A Polyglot, Purpose-Built Stack

No single language or database is the right tool for every part of this problem. A pragmatic architecture embraces a polyglot approach, choosing technologies for their specific strengths.


// Conceptual Data Flow

[Transaction Event] -> Kafka Topic
       |
       v
[Go/Java Scoring Service] --reads--> [Bigtable: User History]
       |                                --reads--> [Postgres: User Profile]
       |
       +--> [Python ML Model Endpoint] for inference
       |
       v
[Risk Score + Features] -> Kafka Topic
       |
       +----------------------------------> [BigQuery: Analytics & Model Training]
       |
       v
[Node.js WebSocket Service] --pushes--> [React Frontend Dashboard]
       |
       v
[Postgres: Cases/Alerts Table]

The Investigation Hub: React & TypeScript

The dashboard is the human interface to the machine's decisions. It needs to be fast, dense with information, and, above all, real-time. A new high-risk alert must appear on an investigator's screen instantly. For this, a stack of React and TypeScript is a solid choice. The component model is perfect for building a complex UI of transaction lists, detail panes, user history timelines, and network graphs. Real-time updates would be pushed from the backend via WebSockets or Server-Sent Events (SSE). My experience with Turbo Streams in the Rails world reinforces the value of server-pushed updates, and WebSockets provide the necessary bidirectional channel for actions initiated from the UI.

The Glue and Real-Time Layer: Node.js

A Node.js service acting as a Backend-for-Frontend (BFF) is ideal. Its non-blocking I/O model excels at managing thousands of persistent WebSocket connections and proxying requests to downstream services. It can listen to a Kafka topic for new alerts and immediately push them to the relevant investigator clients. It's the switchboard of the system.

The Core Logic: Golang & High-Performance Runtimes

The real-time transaction scoring engine is the critical path. It needs to be incredibly fast and concurrent. This is where a language like Golang (or Java/Rust) shines. Upon receiving a transaction, this service would perform a series of rapid, parallel lookups: fetch the user's last 10 transactions, check their account tenure, look up the device fingerprint's reputation, etc. After enriching the transaction with these features, it calls the ML model for a score and applies any hard-coded business rules. Go's concurrency primitives (goroutines, channels) are a natural fit for this fan-out/fan-in data retrieval pattern.

Data Storage: A Three-Tiered Approach

A single database can't efficiently serve all needs.

PostgreSQL: The source of truth for structured, relational data. User accounts, case management details (the notes and decisions made by investigators), audit logs, and merchant information. Its transactional integrity is non-negotiable here.
Bigtable/Cassandra: For high-throughput, time-series data. This is where you store raw transaction logs and pre-computed user features (e.g., `transaction_count_last_1h`). The data model is key: a row key like `user_id#reverse_timestamp` allows for extremely fast queries to get the most recent events for a given user. This is the "hot" data store for the real-time scoring engine.
BigQuery/Snowflake: The analytical warehouse. All transactions, enriched features, and investigator decisions are streamed here. This is where data scientists explore patterns, build new features, and train the next generation of ML models without impacting the production transaction path.

ML & Infrastructure

The model training and experimentation would live in the Python ecosystem (PyTorch, scikit-learn). Trained models are then exported to a format like ONNX for high-performance inference in the Go/Java service. All of this infrastructure—databases, services, networking rules—should be defined declaratively using Terraform for reproducibility and scalability.

Pragmatism, Tradeoffs, and the Human in the Loop

Building a system like this is an exercise in managing tradeoffs. A senior engineer's role is not just to pick the "best" tech but to make the right compromises.

The central challenge is not just detecting fraud, but making the detection explainable and actionable for a human investigator. A 99.8% risk score is useless without the "why."

Explainability is a Feature: The system must not return a simple score. It must return the score *and* the top contributing features. The React dashboard shouldn't just say "High Risk." It should say "High Risk because: transaction amount is 50x user average, shipping address is new, and device IP is from a high-risk region." This empowers the investigator to make a faster, more accurate decision.

The Human Feedback Loop: This is the most critical part of the architecture. When an investigator clicks "Confirm Fraud" or "Not Fraud" in the dashboard, that action must trigger an event. This event, a piece of high-quality, human-labeled data, is the most valuable asset the system can generate. It gets fed back into BigQuery, where it's used to retrain and validate the models. A system without this feedback loop is static and will inevitably be defeated.

Where It Breaks at Scale:

Database Hotspots: A single high-volume user or merchant could create a hot partition in Bigtable. Careful row key design is essential.
Model Latency: A complex model can slow down the entire transaction pipeline. There's a constant tension between model accuracy and inference speed. Sometimes a simpler, faster model is better.
State Management: On the frontend, managing the state of hundreds of streaming alerts without performance degradation is a challenge. Virtualized lists and efficient state management libraries (like Zustand or Jotai) are crucial.

Closing Reflection

Engineering for fraud prevention is a systems problem that extends beyond pure code. It's about creating a tight, symbiotic loop between automated analysis and human expertise. The goal isn't to build a perfect, all-seeing AI that obviates the need for people. Instead, the goal is to build a system that acts as a force multiplier for human investigators, allowing them to focus their expertise on the most ambiguous and complex cases. The architecture must serve this primary goal: to augment, not replace, human intelligence in a domain where the context and consequences are profoundly human.