Agentic Product Catalog

Voice + Natural Language Product Search + AI Chat Assistant

Powered by Gemini 2.0 Flash, OpenAI Whisper & GPT, FastAPI + In-Memory Vector Store, Model Context Protocol, and RAG (Retrieval-Augmented Generation)

Architecture Flow

1

User Input

Voice or Text Query

2

Next.js Host

Whisper Transcription + MCP Host

3

Gemini 2.0 Flash

Function Calling + Tool Selection

4

FastAPI Backend

MCP Tool Server + Vector Search + RAG

Key Features

🎤 Voice & Text Input

Search for products using either spoken commands or typed text with OpenAI Whisper transcription.

🤖 Gemini 2.0 Integration

Uses Gemini 2.0 Flash model for intelligent function calling and natural language understanding.

🔍 In-Memory Vector Search

Fast in-memory semantic search using OpenAI embeddings with cosine similarity for natural language product discovery.

📡 Model Context Protocol

JSON-RPC 2.0 over HTTP with automatic tool discovery via /.well-known/mcp.json endpoint for robust backend integration.

🧠 RAG-Powered AI Assistant

Conversational AI assistant combining semantic search with GPT models for natural language product recommendations and advice.

🚀 Vercel Optimized

Lightweight ~25MB bundle with no external vector database, perfect for serverless deployment under 250MB limit.

Available MCP Tools

📦 list_products

Returns all products in the catalog with complete metadata.

Triggers: "show all products", "what's available"

🔍 search_products

Filters products by specific attributes: colors, city, price range.

Triggers: "red items under $50", "products in Portland"

🤖 semantic_product_search

In-memory vector similarity search for natural language queries with scoring.

Triggers: "comfortable hoodies", "warm winter clothing"

💬 rag_query

AI-powered conversational assistant with product recommendations and explanations.

Triggers: "I need help choosing", "recommend something"

RAG vs Semantic Search

🔍 Semantic Search

Best For:

  • Direct product discovery
  • Fast browsing experience
  • Simple similarity matching
  • Cost-sensitive applications

Response:

Returns ranked product list with similarity scores

Speed:

~50ms

💬 RAG Assistant

Best For:

  • Conversational experience
  • Complex queries requiring reasoning
  • Personalized recommendations
  • Customer support scenarios

Response:

AI-generated explanations with product context

Speed:

~1-3s

API Endpoints

🐍 Backend (FastAPI Tool Server)

⚛️ Frontend (Next.js MCP Host)

  • POST/api/chat (Gemini Orchestration)
  • POST/api/transcribe (Whisper)
  • POST/api/rag (RAG Proxy)
Note: Frontend API routes are internal serverless functions that handle LLM orchestration, speech transcription, and RAG proxy.

How It Works

  1. 1

    User Input

    User speaks or types a product search query. Voice input is transcribed using OpenAI Whisper via /api/transcribe.

  2. 2

    MCP Discovery

    Frontend fetches available tools from backend's /.well-known/mcp.json and converts them to Gemini function declarations.

  3. 3

    Gemini Function Calling

    Gemini 2.0 Flash analyzes query intent and automatically selects the best tool: structured filtering, semantic search, RAG conversation, or product listing via /api/chat.

  4. 4

    Backend Tool Execution

    Backend receives JSON-RPC 2.0 request at /mcp and executes the chosen function: attribute filtering, in-memory vector search, RAG conversation, or simple listing.

  5. 5

    Result Synthesis

    Frontend sends tool results back to Gemini for natural language summarization, then displays products with similarity scores, tool indicators, and rich metadata. RAG responses include AI-generated explanations alongside relevant products.

Technology Stack

Frontend (MCP Host)

⚛️

Next.js 14

React Framework + API Routes

TS

TypeScript 5

Type Safety

🤖

Gemini 2.0 Flash

@google/genai

🎨

Tailwind CSS

Utility-First Styling

Backend (MCP Tool Server)

FastAPI

Async Python Web Framework

🐍

Python 3.13

Runtime Environment

🔗

LangChain

OpenAI Embeddings Integration

🧠

In-Memory Store

Pure Python Vector Search

💬

RAG System

OpenAI GPT + Vector Context

AI & ML Services

🎤

OpenAI Whisper

whisper-1 Speech-to-Text

🔍

OpenAI Embeddings

text-embedding-ada-002

🤖

Google Gemini

2.0 Flash Function Calling

💬

OpenAI GPT

gpt-3.5-turbo RAG Generation

DevOps & Infrastructure

🐳

Docker

Containerization

📦

Poetry

Python Dependencies

🌐

JSON-RPC 2.0

Tool Invocation Protocol

📡

MCP

Model Context Protocol

Performance Metrics

~50ms

Vector Search

Pure Python cosine similarity search response time

~1-3s

Voice Transcription

OpenAI Whisper speech-to-text processing

~1-3s

RAG Response

End-to-end retrieval + generation time

~25MB

Bundle Size

Vercel-optimized serverless deployment

Use Cases & Examples

🔍 Quick Product Search

Query:

"Show me red hoodies under $40"

Response:

Filtered product list with exact matches

💬 Conversational Assistance

Query:

"I need something comfortable for weekend casual wear"

Response:

AI-generated recommendations with explanations