Building a Personalized E-commerce Search Demo

Patrick Donahue · Levelbrook Consulting

E-commerce search seems like a solved problem. A text field, a database, a LIKE query, and you're done. But this view dissolves upon inspection. The gap between a trivial implementation and one that meaningfully connects users to products is a fascinating, multi-disciplinary engineering challenge. It sits at the intersection of information retrieval, machine learning, distributed systems, and user experience design.

This is a write-up of notes and architectural thoughts that went into building a proof-of-concept for a personalized e-commerce search results page. The goal was to explore the core components and their interactions, from the data model to the real-time feedback loops that drive relevance.

The Domain: More Than String Matching

The core problem is intent translation. A user types "summer dress," and the system must translate this ambiguous, high-level concept into a ranked list of specific products. This is hard for several reasons:

Vocabulary Mismatch: Users don't use the same terminology as product catalogs. They search for "warm jacket," but the product data has attributes like `material: "down"`, `insulation_rating: "800-fill"`. The system must bridge this semantic gap.
Intent Ambiguity: A query can be navigational ("brand_name shoes"), informational ("how to clean leather boots"), or transactional ("buy men's running shoes size 11"). A search system must primarily serve the transactional intent while gracefully handling the others.
The Cold Start Problem: How do you personalize results for a first-time visitor? What's the best ranking for a newly listed product with no interaction data?
Balancing Act: The final ranking is never purely about relevance. It's a weighted function of relevance, personalization, business priorities (e.g., promoting high-margin items), and diversity (avoiding a page full of near-identical products).

This complexity makes it an enduringly interesting space. It’s not about finding a single "correct" algorithm, but about building a flexible system that can balance these competing concerns.

An Architectural Sketch

To deliver a sub-100ms, personalized search experience, you can't have a monolith. A service-oriented architecture is a natural fit, allowing specialized components to handle their tasks efficiently. Here’s a high-level breakdown.

Data Model and Flow

At the heart of the system is the data. We have three primary streams:

Product Catalog: The source of truth for products, attributes, and inventory. Often lives in a relational database like Postgres, but is denormalized and pushed into a dedicated search index.
User Events: The clickstream. Every view, click, add-to-cart, and purchase. This is high-volume, real-time data, perfect for a pipeline like AWS Kinesis or Kafka.
User Profile: A slower-moving dataset containing user segments, historical purchases, and derived embeddings that represent a user's taste profile.

When a user executes a search for "blue running shoes":


Client ----> API Gateway
              |
              +-- (user_id, "blue running shoes") --> Search Orchestrator
                                |
          +---------------------+---------------------+
          | (query)             | (user_id)           | (business_rules)
          v                     v                     v
    Retrieval Service     Personalization Service   Business Logic
    (Elasticsearch/Rust)  (Python/ML Model)         (Promotions, etc.)
          |                     |
          | (candidate_set)     | (boost_scores)
          +-----------> Re-ranking Logic <------------+
                                |
                                v (final ranked list)
                              Client

Core Services

Retrieval Service: The first stage. Its job is to quickly find all potentially relevant documents. This is where tools like Elasticsearch or OpenSearch excel. For extreme performance or custom logic, a core retrieval engine could be written in Rust or C++, using libraries like Tantivy or Lucene directly. This service handles tokenization, filtering (e.g., `category: "shoes"`), and faceting. It returns a candidate set of, say, the top 500 most relevant product IDs, not the final ranked list.

Personalization Service: This is where the magic happens, and it's a natural home for Python with its rich ML ecosystem. This service takes a `user_id` and the candidate set of `product_ids`. It might:

Fetch the user's pre-computed embedding from a fast key-value store like Redis.
Calculate a similarity score between the user's embedding and the embeddings of the candidate products.
Return a set of boost scores or an ordered list of product IDs.
The models powering this are trained offline using the event stream data, with frameworks like PyTorch or TensorFlow, and the training jobs are orchestrated by tools like Jenkins or GitHub Actions.

Search Orchestrator / Re-ranker: This service, which could be part of a larger API Gateway, fuses the signals. It takes the candidate set from Retrieval and the personalization scores from Personalization, applies business rules (e.g., "boost items on sale"), and produces the final, ordered list of products sent to the user. This is a critical component for latency; it must be lightweight and fast.

The User Experience

On the front end, a modern framework like React (built with Vite) provides the necessary reactivity for features like instant facet updates without full page reloads. While the initial search result is a standard request-response, subsequent interactions can be enhanced. For instance, a "trending searches" component could be powered by Server-Sent Events (SSE), or real-time inventory updates could be pushed via WebSockets. My work on real-time UIs with Hotwire and Turbo Streams in the Rails world informs this thinking—the goal is to make the page feel alive and responsive to both user actions and underlying data changes.

Pragmatism, Scale, and the Human in the Loop

An architecture diagram is a clean abstraction. Reality is messy. Here’s where things get difficult at scale:

The "Head vs. Tail": Personalization models work well for popular products with lots of data (the "head"). They struggle with new or niche items (the "long tail"). The system must gracefully degrade, perhaps by falling back to content-based similarity (matching product attributes) when interaction data is sparse.
Feedback Loops: A model that promotes popular items will see those items get more clicks, generating more data that reinforces their popularity. This can create a "rich get richer" problem, starving new products of visibility. Solving this requires injecting randomness or explicitly boosting for "exploration."
Observability is Non-Negotiable: You can't fly blind. A robust monitoring stack using Prometheus for metrics and Grafana for dashboards is essential. Key metrics to track include P95 query latency, click-through rate (CTR) by rank position, and zero-result search rates. Alarms on these metrics are the system's immune response.

The most critical lesson from shipping complex systems is that full automation is a myth. The most effective systems are not black boxes; they are tool-assisted workflows.

This leads to the most important pragmatic tradeoff: building a human-in-the-loop system. Merchandisers and business analysts have invaluable domain knowledge. The search system should not be a black box it imposes on them. It should provide levers for them to pull: a dashboard to manually boost certain products, a rules engine to curate results for high-value queries (e.g., "Mother's Day gifts"), and an A/B testing framework to rigorously evaluate changes to the ranking algorithm. Building these internal tools is just as important as optimizing the core ranking function.

Closing Reflection

Building a great search experience is not a one-off project but a continuous process of refinement. It’s a microcosm of modern product engineering, requiring a tight integration of data science, low-latency backend systems, and thoughtful frontend design. The challenge lies not just in the individual components, but in orchestrating them to create a system that is simultaneously fast, relevant, and steerable. The problem space is deep, and for an engineer, that's the best kind of problem to have.