Agentic Engineering

Build apps that hit aha.

Loss functions exist for models. I built them for products. Autonomous agents iterate against three metrics. The app improves the way a model trains.

Platform access when it ships. No spam.

live DocBench 4.80 Delegate your document work

Live app ↗ Conveyor ↗

live Annotate 4.70 AI annotation platform

Live app ↗

soon Launch Autopilot — AI launch operator for founders

soon ClawTrade 4.65 Trading automation with receipts

soon VidCraft 4.50 AI video editor

Loss Functions for Products

Model training has a loss function that drives improvement automatically. Product development didn't. Until now. These three metrics are the product's loss function. Autonomous agents iterate against them, round after round.

Each round, a swarm of synthetic users tests the app. Three scores come out. Agents iterate until they converge on the target, or the app gets killed.

Activation ROI 4.80 DocBench, round 54

First value, how fast?

Seconds to first useful result. Every dead end chips away at the score. DocBench started at 2.1, agents fixed it over 10 rounds.

f = activated · e^-time · e^-friction

Payoff 0.93 completion × quality

Did they finish the job?

The aha is not enough. The persona keeps going: real documents, harder questions, exports. Payoff scores the whole session.

f = completion · quality

Retention Signal 0.67 return rate

Would they come back?

Same persona, next day, with memory. Did the app earn a second visit? Most apps fail here silently.

f = Σ returns / revisits

The loop

Build. Score. Trace friction to features. Fix. Repeat.

Same principle as gradient descent, applied to the product itself.

View the full conveyor diagram →

Sample conveyor: DocBench

Open status page →

Round 54 Score 4.80 / 4.50 Deploy docbench.roibench.com

Complete

Discovery

Value contract, personas, and fixture-backed user jobs are defined.

Round 54

Build

Latest round fixed question-before-upload race and SSE display fallback for long agent runs.

Above Target

Evaluate

Current scored state is 4.80, with Diana/Rachel validated in the latest scored round.

Live

Ship

The app is deployed and reachable on its production URL.

Started

Promote

Homepage shows the next-channel plan and links to the full status detail.

Autonomous agents.
Measured outcomes.

"When a simulated user says they got value, does a real person agree?"

"Can you tell early that an app has hit its ceiling?"

"What drives retention? How does session one shape whether someone returns?"

05 Apps through the pipeline

52 Autonomous rounds on single app

01 Killed early (low ROI)