Worker lifecycle
The native worker is the first serious capacity target because it can use CUDA, Metal, or Vulkan through a local model runtime.
Worker slots
- 01Install the worker client and authenticate a provider account.
- 02Benchmark GPU capability, memory, supported runtimes, and tokens per second.
- 03List supported models in the marketplace.
- 04Receive jobs from the orchestrator while the worker is online and idle.
- 05Stream tokens back and receive credit for completed work.
Worker types
| Type | Best for | Tradeoff |
|---|---|---|
| Browser worker | Low friction onboarding through WebGPU. | Smaller models and less predictable sustained performance. |
| Native worker | High capacity GPUs, larger models, serious providers. | Requires installation, runtime setup, and stronger provider support. |
| Bootstrap worker | Baseline product availability while marketplace supply grows. | Costs scale with usage until replaced by marketplace capacity. |
Pools
The term pool has two meanings in this project. They are kept separate so implementation work is not scoped against the wrong feature.
Reward pool
A grouping model for smoothing provider payouts based on contribution. Easier to build and useful earlier.
Sharding pool
A group of workers that jointly serve one model too large for a single GPU. A later feature, since it adds latency and coordination risk.


