Global E-commerce Platform
AI customer service automation
LLM-based support automation for an e-commerce platform: intent classification, retrieval over policy and order data, CRM and ticketing integration.
The challenge
A global e-commerce platform serving customers in more than 50 countries was taking in over 50,000 support inquiries a day across email, chat, web, and social channels, in 12 languages. Average first response time sat at 24 hours. Support costs were growing past $10M a year, and quality varied widely from agent to agent because every inquiry, however routine, landed in the same human queue. The volume problem was structural: most tickets were repetitive questions about orders, returns, and policy, but the team had no reliable way to separate those from the cases that genuinely needed a person.
What we built
The system is a triage pipeline, not a chatbot. Every inbound message, regardless of channel, flows through the same classification and routing stages before anything answers it.
Intent classification and triage
All channels feed a single intent classifier that categorizes and prioritizes each inquiry. We trained it on more than 500,000 historical support tickets, building domain-specific datasets so it understood the client’s product names, policy vocabulary, and the way real customers phrase problems in each of the 12 supported languages. Language is auto-detected at ingestion. The classifier reached 98% accuracy, which mattered because everything downstream depends on the routing decision being right: a misrouted complex case means a frustrated customer arguing with a machine.
Response engine and knowledge retrieval
Routine inquiries go to an AI response engine built on GPT-4 and LangChain. Rather than baking answers into prompts, the engine retrieves from a live knowledge layer covering policy documents, order data, and internal documentation, updated in real time as products and policies change. This was a deliberate decision: support content at this client changed weekly, and a static model would drift out of date within a sprint. The engine maintains conversation context across turns, so follow-up questions resolve against what was already said. Common queries return in under a second, and this path absorbs roughly 70% of total volume.
Human escalation
The classifier routes complex cases directly to a queue of specialized human agents. A sentiment analysis stage runs alongside intent classification and can override the routing: if a customer is angry or distressed, the conversation escalates to a person even when the intent looks routine. Handoff protocols carry the full conversation history across, so customers never repeat themselves. Keeping humans in the loop for these cases was the design choice that made the automation trustworthy rather than a wall to get past.
Integrations
The pipeline writes into the client’s existing stack rather than replacing it: Zendesk for ticketing, Salesforce for CRM, and internal APIs for order and account data. Tickets and CRM records stay in sync whether the AI or a human resolved the case, so reporting and audit trails kept working unchanged.
How it was delivered
We delivered in three phases over 16 weeks with a team of six.
Weeks 1 to 4 were foundation work: analyzing the 500,000+ historical tickets, building the training datasets, and establishing integration points with the existing systems. Weeks 5 to 12 covered core development: the conversation engine, real-time translation, and the fallback and human-handoff mechanics. Weeks 13 to 16 were integration and rollout: wiring up Zendesk, Salesforce, and the internal APIs, then A/B testing against 10% of live traffic and tuning response quality from real customer feedback before expanding. The gradual rollout meant we could refine the system on production conversations without ever putting the full support operation at risk.
What shipped
- Over 1 million customer inquiries processed in the first quarter
- Average resolution time cut from 24 hours to 7 hours
- 98% accuracy in intent classification
- Sub-second responses for common queries across 12 languages
- Sentiment-triggered escalation with full-context human handoff
- Live knowledge retrieval that tracks product and policy changes in real time
- Bidirectional sync with Zendesk, Salesforce, and internal APIs
- Headroom for 10x peak load without degradation
What kept it in production was the discipline around the boundary: the system only answers what it can answer well, escalates early on sentiment, and feeds every human correction back into the loop. Customers got fast answers when fast answers were possible, and a person when they were not.
Want something like this running against your data?
Start a prototype sprint