App · API → Router → GPU worker → Token stream
01 · Client
User sends request
The app or API submits prompt, model, and session metadata.
02 · Orchestrator
Route job
The router selects eligible GPU supply by model, availability, and measured speed.
03 · Worker
Run inference
The worker executes the model and streams tokens back to the orchestrator.
04 · Settlement
Credit work
Usage is counted, earnings are credited, and prompt context is discarded.
Trace
User request -> queue by model and tier -> select idle worker -> stream tokens -> record usage -> discard prompt context


