Kunagi Systems: Decentralized GPU inference network

A worker is a machine that connects to Kunagi, advertises model capacity, accepts routed inference jobs, and streams responses back.

Worker lifecycle

The native worker is the first serious capacity target because it can use CUDA, Metal, or Vulkan through a local model runtime.

Worker slots

Type	Best for	Tradeoff
Browser worker	Low friction onboarding through WebGPU.	Smaller models and less predictable sustained performance.
Native worker	High capacity GPUs, larger models, serious providers.	Requires installation, runtime setup, and stronger provider support.
Bootstrap worker	Baseline product availability while marketplace supply grows.	Costs scale with usage until replaced by marketplace capacity.

The term pool has two meanings in this project. They are kept separate so implementation work is not scoped against the wrong feature.

A grouping model for smoothing provider payouts based on contribution. Easier to build and useful earlier.

A group of workers that jointly serve one model too large for a single GPU. A later feature, since it adds latency and coordination risk.