Agentic Product Catalog
Voice + Natural Language Product Search + AI Chat Assistant
Powered by Gemini 2.0 Flash, OpenAI Whisper & GPT, FastAPI + In-Memory Vector Store, Model Context Protocol, and RAG (Retrieval-Augmented Generation)
Architecture Flow
User Input
Voice or Text Query
Next.js Host
Whisper Transcription + MCP Host
Gemini 2.0 Flash
Function Calling + Tool Selection
FastAPI Backend
MCP Tool Server + Vector Search + RAG
Key Features
🎤 Voice & Text Input
Search for products using either spoken commands or typed text with OpenAI Whisper transcription.
🤖 Gemini 2.0 Integration
Uses Gemini 2.0 Flash model for intelligent function calling and natural language understanding.
🔍 In-Memory Vector Search
Fast in-memory semantic search using OpenAI embeddings with cosine similarity for natural language product discovery.
📡 Model Context Protocol
JSON-RPC 2.0 over HTTP with automatic tool discovery via /.well-known/mcp.json endpoint for robust backend integration.
🧠 RAG-Powered AI Assistant
Conversational AI assistant combining semantic search with GPT models for natural language product recommendations and advice.
🚀 Vercel Optimized
Lightweight ~25MB bundle with no external vector database, perfect for serverless deployment under 250MB limit.
Available MCP Tools
📦 list_products
Returns all products in the catalog with complete metadata.
🔍 search_products
Filters products by specific attributes: colors, city, price range.
🤖 semantic_product_search
In-memory vector similarity search for natural language queries with scoring.
💬 rag_query
AI-powered conversational assistant with product recommendations and explanations.
RAG vs Semantic Search
🔍 Semantic Search
Best For:
- Direct product discovery
- Fast browsing experience
- Simple similarity matching
- Cost-sensitive applications
Response:
Returns ranked product list with similarity scores
Speed:
~50ms
💬 RAG Assistant
Best For:
- Conversational experience
- Complex queries requiring reasoning
- Personalized recommendations
- Customer support scenarios
Response:
AI-generated explanations with product context
Speed:
~1-3s
API Endpoints
🐍 Backend (FastAPI Tool Server)
- GET/.well-known/mcp.json
- GET/products
- POST/products/search
- POST/products/semantic-search
- POST/products/rag
- POST/mcp (JSON-RPC 2.0)
- GET/docs (Swagger UI)
- GET/healthz
⚛️ Frontend (Next.js MCP Host)
- POST/api/chat (Gemini Orchestration)
- POST/api/transcribe (Whisper)
- POST/api/rag (RAG Proxy)
How It Works
- 1
User Input
User speaks or types a product search query. Voice input is transcribed using OpenAI Whisper via
/api/transcribe
. - 2
MCP Discovery
Frontend fetches available tools from backend's
/.well-known/mcp.json
and converts them to Gemini function declarations. - 3
Gemini Function Calling
Gemini 2.0 Flash analyzes query intent and automatically selects the best tool: structured filtering, semantic search, RAG conversation, or product listing via
/api/chat
. - 4
Backend Tool Execution
Backend receives JSON-RPC 2.0 request at
/mcp
and executes the chosen function: attribute filtering, in-memory vector search, RAG conversation, or simple listing. - 5
Result Synthesis
Frontend sends tool results back to Gemini for natural language summarization, then displays products with similarity scores, tool indicators, and rich metadata. RAG responses include AI-generated explanations alongside relevant products.
Technology Stack
Frontend (MCP Host)
Next.js 14
React Framework + API Routes
TypeScript 5
Type Safety
Gemini 2.0 Flash
@google/genai
Tailwind CSS
Utility-First Styling
Backend (MCP Tool Server)
FastAPI
Async Python Web Framework
Python 3.13
Runtime Environment
LangChain
OpenAI Embeddings Integration
In-Memory Store
Pure Python Vector Search
RAG System
OpenAI GPT + Vector Context
AI & ML Services
OpenAI Whisper
whisper-1 Speech-to-Text
OpenAI Embeddings
text-embedding-ada-002
Google Gemini
2.0 Flash Function Calling
OpenAI GPT
gpt-3.5-turbo RAG Generation
DevOps & Infrastructure
Docker
Containerization
Poetry
Python Dependencies
JSON-RPC 2.0
Tool Invocation Protocol
MCP
Model Context Protocol
Performance Metrics
Vector Search
Pure Python cosine similarity search response time
Voice Transcription
OpenAI Whisper speech-to-text processing
RAG Response
End-to-end retrieval + generation time
Bundle Size
Vercel-optimized serverless deployment
Use Cases & Examples
🔍 Quick Product Search
Query:
"Show me red hoodies under $40"
Response:
Filtered product list with exact matches
💬 Conversational Assistance
Query:
"I need something comfortable for weekend casual wear"
Response:
AI-generated recommendations with explanations