Skip to content
All articles
AI & Business · · 13 min read
Updated:

AI integration in business — from PoC to production

90% of AI projects die after the demo. The methodology that takes you from Proof of Concept to a production system without burning budget.

  • #ai
  • #integration
  • #architecture
  • #roi
Share
AI integration in business — from PoC to production
[Table of contents]Show table of contents

In our portfolio: 14 AI projects in production, 11 working. Three killed. Demo on 300 cherry-picked, production on 50k dirty. That is the real ratio, not the marketing slogan about “90% of AI projects dying”. Those 11 work because each one passed through the same filter: a specific process, a specific metric, a specific fallback.

At Do More Soft we have deployed AI commercially for 14 clients over the last two years. The three killed cost us more lessons than the eleven successes. Below is the methodology that grew out of them.

Why 90% of AI projects die after the demo

The VP of Operations thinks AI will replace a person. Reality: 70% of cases handled correctly, 20% human review, 10% escalation. The org chart needs a rethink before the SOW gets signed. Without that, “we’re deploying AI” is a decision about restructuring the team that nobody has actually made yet.

Second reason: demo on clean data, deployment on dirty data. A model that hits 94% accuracy on the test set falls apart on real data filled with typos, gaps, and inconsistencies. Third: AI as a separate system with its own login. Nobody uses it.

AI PoC to Production: methodology step by step

Days 1-2: diagnosis, not brainstorming

First question: which metric moves? 50% time reduction or 95% error reduction? Different problem. Different model. Different budget. The client says “we want AI in customer service” — that is not a problem, that is a wish. The problem: “web form inquiries wait an average of 14h for first response, we lose 22% of leads to the ghosting phenomenon”. That is a project.

This conversation eliminates half of the misfired ideas at the start and lets us assign a real KPI to the PoC, the one we will use to measure go/no-go.

Days 3-5: data audit and proof of feasibility

Take 200-500 real cases from production. Not cleaned, not curated — real, with all their flaws. Run them through the model (start with off-the-shelf APIs, do not train your own). Measure accuracy, response time, and cost per request. If results land below 70%, this use case will not work with the current data.

Days 6-8: integration prototype

Build a minimal integration with the existing system. Not a landing page with a text field, a real integration. AI as a layer in the existing flow: data comes in from the system, gets processed by the model, the result goes back to the system. The user does not need to know AI is involved.

Days 9-10: user testing and decision report

Hand the prototype to 3-5 real users. Collect feedback not about the technology, but about the process: is it faster? Are the results trustworthy? Would they rely on it in their daily work? On that basis the go/no-go/pivot report — with numbers, not opinions.

Integration architecture: AI as a layer, not a replacement

AI integration architecture diagram
AI integration architecture — AI layer with graceful degradation fallback.

In Textio: the first version had Claude as a SPOF. API timeout, the editor stares at an empty draft, the client writes an email. Today: cache with template fallback, queue when the API is slow, the editor always has a draft. Two weeks of outage to engineer it properly. Now Gemini 2.0 Flash generates the first version, the LangChain + RAG layer glues on tone-of-voice from the brand profile, the editorial workflow forces approval before anything ships to Facebook, WooCommerce or WordPress. When the LLM goes down, the editor gets a draft from cache or a template shaped by the data structure. Writing never stops.

This is an architectural rule, not decoration. The AI endpoint is optional. No response within 3 seconds, fallback to a business rule. Confidence score below the threshold, escalation to a human. The system stays resilient, users trust it.

One use case broken down to first principles

Five use cases with clean ROI numbers sounds attractive, but teaches nothing. See how it looks in practice on a single deployment.

FMCG invoice classification. 450k invoices per year, manual routing 2-3h per 100 documents. Three full-time people in accounting. GPT-4o with prompt engineering on the chart of accounts: 87% accuracy in the first iteration. Not enough — a 13% error rate in accounting is an auditor’s nightmare. Reframed: the model pre-screens and suggests an account, the human does the final routing with a ready recommendation. Time dropped to 45 min per 100 invoices. ROI +45k PLN/year against a deployment cost of 110k PLN, payback in 2.5 years — but the real win is the reduction in billing errors and accountants’ time shifted to controlling.

Analogous patterns repeat in the remaining four use cases we run in production:

  • RAG chatbot on the company knowledge base: support load reduction 40-60%, cost PLN 50-120k
  • Churn prediction in SaaS with >1,000 clients: ROI 200-400%, cost PLN 100-200k
  • Computer vision quality control on the production line: defect reduction 30-70%, cost PLN 150-350k
  • Content generation and personalization (reference: Textio): savings of 15-25h per week, cost PLN 40-80k

Cost matrix

PoC (2 weeks): PLN 15-40k. MVP (6-8 weeks): PLN 60-180k. Production deployment (3-6 months): PLN 120-500k. Annual maintenance: 15-25% of deployment cost. API costs (OpenAI/Anthropic/Azure): PLN 500-5,000/month depending on volume. These numbers cover SMBs, enterprise starts at 3x.

Common pitfalls — from our killed projects

We tried a custom model for an e-commerce client. We had 8k labeled records, the model needs 50k to make any sense. Lost 3 weeks and PLN 40k before someone called stop. OpenAI fine-tune on the same dataset: 2 days, 92% accuracy.

Second killed project: the VP Sales wanted AI across the whole company at once. Four departments, four different processes, one budget. After half a year of pilots, each was at 60% readiness, none in production. Pivot to one process, one department, one measurable KPI — running to this day.

Third: “we’ll sort out the data later”. The client had a CRM with 12 years of data, 40% of fields filled with “ASAP”, “TBD” or empty. The model learns chaos. Six weeks of data cleaning before the first prompt.

Seen it. The CTO decides in Q1: “We’re an AI company”. Q2: three pilots, burned-out team, technical debt growing. Q3: the board kills all AI projects along with the budget. Pick one use case with numbers. Deliver. Then scale.

FAQ

What separates an AI PoC from a production deployment? A PoC runs on 200-500 cherry-picked records for 2 weeks. Production handles 50k+ dirty records per month, has SLA, observability, fallback, and human-in-the-loop. Different architecture, different cost, different success metric.

How long does it take to move from AI prototype to production? PoC 2 weeks, MVP 6-8 weeks, production deployment 3-6 months. Total 4-9 months from first conversation to a system that earns. A shortcut below 4 months usually means a skipped data audit — and costs twice as much later.

What are the top 5 ROI-positive AI use cases? Document classification (ROI 120-180%), RAG chatbot on a knowledge base (80-150%), churn prediction (200-400%), computer vision quality control (150-300%), content generation with editorial workflow (100-200%). Numbers from our 11 production deployments.

Why do most AI projects die after the demo? Demo on 300 cherry-picked records, production on 50k dirty ones. No clear business problem, no integration plan with the existing stack, an architectural decision of “AI instead of” rather than “AI as a layer”. The three killed projects in our portfolio failed on exactly this.

PDF

Download free checklist

A 20-point checklist before choosing a software house. Practical knowledge in PDF — zero spam.

Let's talk about your project.

Free consultation — no strings attached.

Schedule a consultation