llama-swap
llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.
When a request is made to an OpenAI compatible endpoint, llama-swap will
extract the model value and load the appropriate server configuration to
serve it. If the wrong upstream server is running, it will be replaced
with the correct one. This is where the "swap" part comes in. The upstream
server is automatically swapped to the correct one to serve the request.
In the most basic configuration llama-swap handles one model at a time.
For more advanced use cases, the groups feature allows multiple models
to be loaded at the same time. You have complete control over how your
system resources are used.
- Name
- llama-swap
- Main Program
llama-swap- Programs
llama-swapwol-proxy
- Homepage
- Version
- 183
- License
- Maintainers
- Platforms
- x86_64-darwin
- aarch64-darwin
- aarch64-linux
- armv5tel-linux
- armv6l-linux
- armv7a-linux
- armv7l-linux
- i686-linux
- loongarch64-linux
- m68k-linux
- microblaze-linux
- microblazeel-linux
- mips-linux
- mips64-linux
- mips64el-linux
- mipsel-linux
- powerpc-linux
- powerpc64-linux
- powerpc64le-linux
- riscv32-linux
- riscv64-linux
- s390-linux
- s390x-linux
- x86_64-linux
- wasm64-wasi
- wasm32-wasi
- i686-freebsd
- x86_64-freebsd
- aarch64-freebsd
- Defined
- Source