A mesh of GPU workers, routed in real time.
Every worker that joins advertises which models it can serve and at what speed. The router holds that state and picks a path for each request as it arrives.
Worker meshEligible routes
How a request is routed.
01
Submit
The app or API sends a model, a prompt, and session metadata to the router.
02
Route
The router ranks eligible workers by model support, availability, and measured speed.
03
Execute
The selected worker loads the model and begins generating tokens.
04
Stream
Tokens stream back over the open connection as they are produced.


