RAG isn't “embed the docs and chat.” Real RAG is engineering at every step.

Production RAG chatbots, in-product copilots, and agent workflows. Real engineering on chunking, hybrid retrieval, reranking, grounding, and eval, built by engineers who ship the production layer.

Talk to an AI engineer See AI work

Demo conversation, anonymised

grounded, cited

How do I configure SSO with our enterprise tier?

SSO is available on Enterprise. Configure it under Settings, Identity. SAML is supported with Okta, Azure AD, OneLogin.

docs.example/sso (chunk 4 to 6)release-notes/v4.2 (chunk 1)

Does it work with Okta SCIM?

Yes, SCIM is available via the Okta integration. Provisioning flows are documented under Settings, SCIM.

docs.example/scim-okta (chunk 2)

KB-grounded chatbot, in-product copilot, or an agent workflow for ops? We architect the one that fits.

The RAG pipeline

How we build production-ready RAG chatbots

01 Chunk

02 Embed

03 Retrieve

04 Rerank

05 Ground

06 Answer

Chunking strategy

How you split the source documents matters more than which model you embed with. Sentence overlap, header preservation, code-block handling.

Embedding model choice

OpenAI text-embedding-3-small vs Voyage vs Cohere, cost-vs-quality tradeoffs at scale. We model the bill before we lock the schema.

Hybrid retrieval

Vector and BM25 keyword. Pure vector misses exact-match queries (SKU codes, error codes, proper nouns). Hybrid is the default.

Reranking

Second-pass model that orders top-K results before the LLM sees them. Sharp quality jump for marginal cost.

Grounding and citations

Answers cite source chunks. User verifies, system is auditable. No grounding means no production deployment.

Eval

Without a regression suite, every model swap is a guess. We ship eval before we ship the chatbot.

What you get

AI chatbot and RAG development services we provide

Custom RAG Chatbot Development

Over customer KBs, product docs, internal wikis. Proper chunking, hybrid retrieval, reranking, citations. Eval suite included.

AI Customer Support Chatbot Development

Integrated with Intercom, Zendesk, Help Scout or custom. Hand-off to a human agent when confidence falls below threshold.

In-product AI assistants

Copilots that operate inside the app's own data, not generic web search. The assistant knows your tenants, your data, your permissions.

Agent workflows (where it fits)

Research summarisation, lead enrichment, content drafting with approval gates. Not payments. Not irreversible actions. We draw the line.

Model selection and cost modelling

OpenAI, Anthropic, Gemini or open-weights. Realistic monthly cost projections before the architecture is locked.

On-device AI for mobile

CoreML and MLKit where latency or privacy demand it. Pairs with React Native engagements.

RAG quality vs cost curve

RAG quality, cost and performance optimisation

Monthly cost against answer quality (eval score). The sweet spot is where the curve bends: hybrid retrieval plus reranking is where most teams get the biggest quality lift per dollar.

Monthly cost

Eval score

Configuration

$50 / mo

Cheap embed, no rerank

$200 / mo

Better embed, no rerank

$500 / mo

Hybrid and rerank◆ Sweet spot

$900 / mo

Hybrid, rerank, GPT-4-class

$1,800 / mo

More-of-everything

Sweet spot ($500 / mo, 72 quality): hybrid retrieval plus reranking. Past this, you spend a lot more for a little more.

Evidence

Production AI integrations and the authority to teach the stack

Named portfolio

Indian AI client alongside Rajasthan Royals, CultFit, and KhataBook

We build production AI integrations for clients in this tier: RAG over product surfaces, evaluation harnesses, model selection with cost projections. Engagement details disclosed under NDA.

Authority signal

We train professionals on the stack we ship with

Live training sessions for practising Chartered Accountants on AI-assisted app building. Stack taught: Lovable, Supabase, Vercel, Cloudflare and Make.com. We aren't just integrating AI; we're teaching it to professional audiences.

An honest framing on agents

Agents (autonomous tool-using AI loops) are exciting and unreliable. We build them where the use case tolerates non-determinism: internal-tool automations, research summarisers, draft generators with human approval. We don't build them where reliability matters more than autonomy: payments, irreversible actions, compliance-sensitive workflows. Buyers respect that line.

Pairs well with

AI SaaS product development for consumer and B2B AI products
AI app completion Lovable and Bolt graduation
Supabase pgvector is a common RAG store
Workflow automation AI-in-the-loop scenarios
API and integration for tying the chatbot to ticket systems

Good fit if

When RAG is real production work

SaaS products with a KB or docs estate and a defined support-volume problem
Teams who have prototyped a chatbot and need the production layer: grounding, eval, monitoring
Operators who want in-product AI without re-architecting the rest of the application
Buyers comfortable being told an agent isn't yet the right tool for their use case

Probably not a fit

We'll be honest if AI is the wrong call

Hobby chatbots, start with ChatGPT custom GPTs
Use cases that need deterministic guarantees (payments, compliance-sensitive workflows)
Teams expecting AI to write itself; we don't ship without eval and grounding

Stack we ship on

Models, vectors, retrieval, orchestration, eval.

Models

OpenAI, Anthropic Claude, Gemini, open-weights via Together or Replicate

Vector DB

Supabase pgvector, Pinecone, Weaviate, Qdrant

Retrieval

Hybrid (BM25 and vector), Cohere Rerank, custom rerankers when the latency budget allows

Orchestration

LangChain (sparingly), custom Node or Python, streaming token responses

Eval

Per-question regression set, BLEU and ROUGE where applicable, human-graded golden set

If you have docs and a support volume problem

Bring the corpus, the eval questions, and a monthly budget. We'll cost it before we build it.

Talk to an AI engineer hello@appycodes.com