Worker guide

A worker is a machine that connects to Kunagi, advertises model capacity, accepts routed inference jobs, and streams responses back.

Worker lifecycle

The native worker is the first serious capacity target because it can use CUDA, Metal, or Vulkan through a local model runtime.

Worker slots
  1. 01Install the worker client and authenticate a provider account.
  2. 02Benchmark GPU capability, memory, supported runtimes, and tokens per second.
  3. 03List supported models in the marketplace.
  4. 04Receive jobs from the orchestrator while the worker is online and idle.
  5. 05Stream tokens back and receive credit for completed work.

Worker types

TypeBest forTradeoff
Browser workerLow friction onboarding through WebGPU.Smaller models and less predictable sustained performance.
Native workerHigh capacity GPUs, larger models, serious providers.Requires installation, runtime setup, and stronger provider support.
Bootstrap workerBaseline product availability while marketplace supply grows.Costs scale with usage until replaced by marketplace capacity.

Pools

The term pool has two meanings in this project. They are kept separate so implementation work is not scoped against the wrong feature.

Reward pool

A grouping model for smoothing provider payouts based on contribution. Easier to build and useful earlier.

Sharding pool

A group of workers that jointly serve one model too large for a single GPU. A later feature, since it adds latency and coordination risk.