python314Packages.flashinfer

FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, PageAttention and LoRA. FlashInfer focus on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.

Name
flashinfer
Homepage
Version
0.3.1
License
Maintainers
Platforms
  • aarch64-linux
  • armv5tel-linux
  • armv6l-linux
  • armv7a-linux
  • armv7l-linux
  • i686-linux
  • loongarch64-linux
  • m68k-linux
  • microblaze-linux
  • microblazeel-linux
  • mips-linux
  • mips64-linux
  • mips64el-linux
  • mipsel-linux
  • powerpc-linux
  • powerpc64-linux
  • powerpc64le-linux
  • riscv32-linux
  • riscv64-linux
  • s390-linux
  • s390x-linux
  • x86_64-linux
  • x86_64-darwin
  • aarch64-darwin
  • aarch64-windows
  • x86_64-windows
  • i686-windows
  • i686-freebsd
  • x86_64-freebsd
  • aarch64-freebsd
Defined
Source