llama-swap ‐ Nix Packages ‐ Searchix

llama-swap

llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.

When a request is made to an OpenAI compatible endpoint, llama-swap will extract the model value and load the appropriate server configuration to serve it. If the wrong upstream server is running, it will be replaced with the correct one. This is where the "swap" part comes in. The upstream server is automatically swapped to the correct one to serve the request.

In the most basic configuration llama-swap handles one model at a time. For more advanced use cases, the groups feature allows multiple models to be loaded at the same time. You have complete control over how your system resources are used.

Name

llama-swap

Main Program

llama-swap

Programs

llama-swap
wol-proxy

Homepage

https://github.com/mostlygeek/llama-swap

Version

183

License

MIT License

Maintainers

Platforms

x86_64-darwin
aarch64-darwin
aarch64-linux
armv5tel-linux
armv6l-linux
armv7a-linux
armv7l-linux
i686-linux
loongarch64-linux
m68k-linux
microblaze-linux
microblazeel-linux
mips-linux
mips64-linux
mips64el-linux
mipsel-linux
powerpc-linux
powerpc64-linux
powerpc64le-linux
riscv32-linux
riscv64-linux
s390-linux
s390x-linux
x86_64-linux
wasm64-wasi
wasm32-wasi
i686-freebsd
x86_64-freebsd
aarch64-freebsd

Defined

Source