The 48-Hour Demo: Prototyping for a Customer
OpenAI structures every FDE engagement around an early-scoping phase measured in days, not weeks. Here is what that compresses into and why speed is the actual deliverable.
The cadence is in the job posting
When OpenAI describes its FDE engagement model on its own Deployment Company page, the first phase is explicitly 'a couple of days onsite with the customer.' Anthropic's posted 'Forward Deployed Engineer, Applied AI' role on Greenhouse requires 'production LLM experience' across 'prompt engineering, agents, evaluation, deployment at scale.' The job descriptions are doing useful disambiguation: the artifact the FDE produces in the first week is not a slide deck, not a Figma, not a roadmap. It is something runnable, against something resembling the customer's data, that does something they actually want. Speed is the deliverable. A working artifact two days after the kickoff call updates the customer's internal model of you ('these people ship') in a way no roadmap can. That belief is what unlocks the data access, the second meeting, and eventually the contract.
What to cut
A 48-hour demo is not a small version of the production system; it is a different artifact altogether. Cut authentication: hardcode a user. Cut multi-tenancy: there is one tenant, them. Cut error handling beyond the happy path. Cut tests. Cut configuration: every value is a literal in the code. Cut deploy: it runs on your laptop, or a single Vercel or Cloud Run service with a public URL. Cut the database if you can; a JSON file or in-memory store is often enough. The instinct to 'do it right' is the instinct that kills the timeline. The customer does not see your skipped tests; they see a thing that works.
What to fake, and what not to
Faking is allowed and often necessary. Hardcoded model outputs for slow paths, mocked third-party APIs, seeded data instead of a real pipeline, all fine, provided two rules hold. First: the part of the demo the customer is judging must be real. If you're selling them a search system, the search must work; the user management can be faked. Second: you must be able to articulate exactly what is real and what is not, because they will ask, and the moment you fudge that answer you lose the trust the demo was supposed to build. The honest framing, 'this part is real, this part is stubbed, here is what it takes to make it real', is more credible than a polished demo that's secretly mostly smoke.
Real customer data, even a little
If you can get a sample of the customer's real data before the demo, take it. Even fifty rows from their production system, properly anonymized, beats a thousand rows of synthetic data. Real data has the specific weirdnesses, the misspelled categories, the timestamps in three formats, the dangling foreign keys, that make the customer go 'oh, you understand our world.' Synthetic data, no matter how well generated, has the smoothness of fiction. If real data is impossible to get in 48 hours, get a screenshot of their actual UI and reskin your demo to match. Visual fidelity is a cheap proxy for the same recognition.
Scoping the demo itself
Pick one workflow, end to end, that is painful for them today. Not three. Not a tour of the platform. One. The reaction you're hoping for is 'this is the thing I do every Tuesday and it's terrible' followed by your demo doing that thing in eight seconds. Breadth-first demos ('here are all our features') consistently lose to a depth-first demo of one workflow they recognize. After the demo lands, you can talk about the other things; before it lands, every minute spent on other things dilutes the moment of recognition. This is also why OpenAI's engagement model puts evals as Phase 2, not Phase 1: there is no point evaluating quality before you've shown them the thing whose quality they care about.
The day-after work
The artifact you ship in 48 hours is throwaway by design, but the conversation it generates is not. The customer will reveal, in the demo meeting, three things you couldn't have learned otherwise: what they assumed your system did that it doesn't, what they want next, and what's politically contentious about the workflow you picked. Write those down in the meeting, not after. Within 24 hours, send a one-page summary back with: what you showed, what's real vs. stubbed, what they said they need next, and what the smallest second step would be. That document, not the code, is what turns the demo into momentum.
Key takeaways
- •OpenAI explicitly structures FDE engagements around a multi-day onsite scoping phase. The 48-hour artifact is the industry-standard cadence, not a personal preference.
- •Cut hard: auth, multi-tenancy, error handling, tests, deploy, even the database if possible.
- •Fake what isn't being judged. Be explicit and honest about what's real vs. stubbed. Credibility outlives polish.
- •Real customer data, even fifty anonymized rows, beats any volume of synthetic data.
- •Pick one workflow, depth-first. Breadth-first demos always lose to a single moment of recognition.
- •The follow-up summary the day after is what converts a demo into a contract.
Exercise
Take a problem you've solved at work and imagine a customer asked for it cold. Write a one-page plan for a 48-hour demo: which workflow you'd pick, what you'd cut, what you'd fake, what data you'd need, and what the single 'aha' moment is. Then ask: if you had to ship this in two days from a hotel conference room with no laptop access to your normal infra, what would you actually do?
Self-check
- 1.Why is honest stubbing more credible than a polished demo that hides its limitations?
- 2.What's the failure mode of a breadth-first demo, and why does a depth-first one beat it?
- 3.Why does OpenAI's engagement model put evals (Phase 2) after building a first artifact (Phase 1) instead of before?
- 4.What three pieces of information does the demo meeting reveal that nothing earlier could?
Sources