A mesh of GPU workers, routed in real time.

Every worker that joins advertises which models it can serve and at what speed. The router holds that state and picks a path for each request as it arrives.

Worker meshEligible routes

How a request is routed.

01
Submit

The app or API sends a model, a prompt, and session metadata to the router.

02
Route

The router ranks eligible workers by model support, availability, and measured speed.

03
Execute

The selected worker loads the model and begins generating tokens.

04
Stream

Tokens stream back over the open connection as they are produced.