Agentic Product Catalog

Voice + Natural Language Product Search + AI Chat Assistant

Powered by Gemini 2.0 Flash, OpenAI Whisper & GPT, FastAPI + In-Memory Vector Store, Model Context Protocol, and RAG (Retrieval-Augmented Generation)

Architecture Flow

User Input

Voice or Text Query

→

Next.js Host

Whisper Transcription + MCP Host

→

Gemini 2.0 Flash

Function Calling + Tool Selection

→

FastAPI Backend

MCP Tool Server + Vector Search + RAG

Key Features

🎤 Voice & Text Input

Search for products using either spoken commands or typed text with OpenAI Whisper transcription.

🤖 Gemini 2.0 Integration

Uses Gemini 2.0 Flash model for intelligent function calling and natural language understanding.

🔍 In-Memory Vector Search

Fast in-memory semantic search using OpenAI embeddings with cosine similarity for natural language product discovery.

📡 Model Context Protocol

JSON-RPC 2.0 over HTTP with automatic tool discovery via /.well-known/mcp.json endpoint for robust backend integration.

🧠 RAG-Powered AI Assistant

Conversational AI assistant combining semantic search with GPT models for natural language product recommendations and advice.

🚀 Vercel Optimized

Lightweight ~25MB bundle with no external vector database, perfect for serverless deployment under 250MB limit.

Available MCP Tools

📦 list_products

Returns all products in the catalog with complete metadata.

Triggers: "show all products", "what's available"

🔍 search_products

Filters products by specific attributes: colors, city, price range.

Triggers: "red items under $50", "products in Portland"

🤖 semantic_product_search

In-memory vector similarity search for natural language queries with scoring.

Triggers: "comfortable hoodies", "warm winter clothing"

💬 rag_query

AI-powered conversational assistant with product recommendations and explanations.

Triggers: "I need help choosing", "recommend something"

RAG vs Semantic Search

🔍 Semantic Search

Best For:

Direct product discovery
Fast browsing experience
Simple similarity matching
Cost-sensitive applications

Response:

Returns ranked product list with similarity scores

Speed:

~50ms

💬 RAG Assistant

Best For:

Conversational experience
Complex queries requiring reasoning
Personalized recommendations
Customer support scenarios

Response:

AI-generated explanations with product context

Speed:

~1-3s

API Endpoints

🐍 Backend (FastAPI Tool Server)

GET/.well-known/mcp.json
GET/products
POST/products/search
POST/products/semantic-search
POST/products/rag
POST/mcp (JSON-RPC 2.0)
GET/docs (Swagger UI)
GET/healthz

⚛️ Frontend (Next.js MCP Host)

POST/api/chat (Gemini Orchestration)
POST/api/transcribe (Whisper)
POST/api/rag (RAG Proxy)

Note: Frontend API routes are internal serverless functions that handle LLM orchestration, speech transcription, and RAG proxy.

How It Works

1
User Input
User speaks or types a product search query. Voice input is transcribed using OpenAI Whisper via /api/transcribe.
2
MCP Discovery
Frontend fetches available tools from backend's /.well-known/mcp.json and converts them to Gemini function declarations.
3
Gemini Function Calling
Gemini 2.0 Flash analyzes query intent and automatically selects the best tool: structured filtering, semantic search, RAG conversation, or product listing via /api/chat.
4
Backend Tool Execution
Backend receives JSON-RPC 2.0 request at /mcp and executes the chosen function: attribute filtering, in-memory vector search, RAG conversation, or simple listing.
5
Result Synthesis
Frontend sends tool results back to Gemini for natural language summarization, then displays products with similarity scores, tool indicators, and rich metadata. RAG responses include AI-generated explanations alongside relevant products.

Technology Stack

Frontend (MCP Host)

⚛️

Next.js 14

React Framework + API Routes

TypeScript 5

Type Safety

🤖

Gemini 2.0 Flash

@google/genai

🎨

Tailwind CSS

Utility-First Styling

Backend (MCP Tool Server)

⚡

FastAPI

Async Python Web Framework

🐍

Python 3.13

Runtime Environment

🔗

LangChain

OpenAI Embeddings Integration

🧠

In-Memory Store

Pure Python Vector Search

💬

RAG System

OpenAI GPT + Vector Context

AI & ML Services

🎤

OpenAI Whisper

whisper-1 Speech-to-Text

🔍

OpenAI Embeddings

text-embedding-ada-002

🤖

Google Gemini

2.0 Flash Function Calling

💬

OpenAI GPT

gpt-3.5-turbo RAG Generation

DevOps & Infrastructure

🐳

Docker

Containerization

📦

Poetry

Python Dependencies

🌐

JSON-RPC 2.0

Tool Invocation Protocol

📡

MCP

Model Context Protocol

Performance Metrics

~50ms

Vector Search

Pure Python cosine similarity search response time

~1-3s

Voice Transcription

OpenAI Whisper speech-to-text processing

~1-3s

RAG Response

End-to-end retrieval + generation time

~25MB

Bundle Size

Vercel-optimized serverless deployment

Use Cases & Examples

🔍 Quick Product Search

Query:

"Show me red hoodies under $40"

Response:

Filtered product list with exact matches

💬 Conversational Assistance

Query:

"I need something comfortable for weekend casual wear"

Response:

AI-generated recommendations with explanations

Architecture Flow

User Input

Next.js Host

Gemini 2.0 Flash

FastAPI Backend

Key Features

🎤 Voice & Text Input

🤖 Gemini 2.0 Integration

🔍 In-Memory Vector Search

📡 Model Context Protocol

🧠 RAG-Powered AI Assistant

🚀 Vercel Optimized

Available MCP Tools

📦 list_products

🔍 search_products

🤖 semantic_product_search

💬 rag_query

RAG vs Semantic Search

🔍 Semantic Search

Best For:

Response:

Speed:

💬 RAG Assistant

Best For:

Response:

Speed:

API Endpoints

🐍 Backend (FastAPI Tool Server)

⚛️ Frontend (Next.js MCP Host)

How It Works

User Input

MCP Discovery

Gemini Function Calling

Backend Tool Execution

Result Synthesis

Technology Stack

Frontend (MCP Host)

Next.js 14

TypeScript 5

Gemini 2.0 Flash

Tailwind CSS

Backend (MCP Tool Server)

FastAPI

Python 3.13

LangChain

In-Memory Store

RAG System

AI & ML Services

OpenAI Whisper

OpenAI Embeddings

Google Gemini

OpenAI GPT

DevOps & Infrastructure

Docker

Poetry

JSON-RPC 2.0

MCP

Performance Metrics

Vector Search

Voice Transcription

RAG Response

Bundle Size

Use Cases & Examples

🔍 Quick Product Search

💬 Conversational Assistance