llama-swap

llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.

When a request is made to an OpenAI compatible endpoint, llama-swap will extract the model value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to the correct one to serve the request.

In the most basic configuration llama-swap handles one model at a time. For more advanced use cases, the groups feature allows multiple models to be loaded at the same time. You have complete control over how your system resources are used.

Name
llama-swap
Main Program
llama-swap
Programs
  • llama-swap
  • wol-proxy
Homepage
Version
183
License
Maintainers
Platforms
  • x86_64-darwin
  • aarch64-darwin
  • aarch64-linux
  • armv5tel-linux
  • armv6l-linux
  • armv7a-linux
  • armv7l-linux
  • i686-linux
  • loongarch64-linux
  • m68k-linux
  • microblaze-linux
  • microblazeel-linux
  • mips-linux
  • mips64-linux
  • mips64el-linux
  • mipsel-linux
  • powerpc-linux
  • powerpc64-linux
  • powerpc64le-linux
  • riscv32-linux
  • riscv64-linux
  • s390-linux
  • s390x-linux
  • x86_64-linux
  • wasm64-wasi
  • wasm32-wasi
  • i686-freebsd
  • x86_64-freebsd
  • aarch64-freebsd
Defined
Source