cld2
CLD2 probabilistically detects over 80 languages in Unicode UTF-8 text, either plain text or HTML/XML. Legacy encodings must be converted to valid UTF-8 by the caller. For mixed-language input, CLD2 returns the top three languages found and their approximate percentages of the total text bytes (e.g. 80% English and 20% French out of 1000 bytes of text means about 800 bytes of English and 200 bytes of French). Optionally, it also returns a vector of text spans with the language of each identified. This may be useful for applying different spelling-correction dictionaries or different machine translation requests to each span. The design target is web pages of at least 200 characters (about two sentences); CLD2 is not designed to do well on very short text, lists of proper names, part numbers, etc.
- Name
- cld2
- Homepage
- Version
- 0-unstable-2015-08-21
- License
- Maintainers
- Platforms
- i686-cygwin
- x86_64-cygwin
- x86_64-darwin
- aarch64-darwin
- i686-freebsd
- x86_64-freebsd
- aarch64-freebsd
- aarch64-genode
- i686-genode
- x86_64-genode
- x86_64-solaris
- javascript-ghcjs
- aarch64-linux
- armv5tel-linux
- armv6l-linux
- armv7a-linux
- armv7l-linux
- i686-linux
- loongarch64-linux
- m68k-linux
- microblaze-linux
- microblazeel-linux
- mips-linux
- mips64-linux
- mips64el-linux
- mipsel-linux
- powerpc-linux
- powerpc64-linux
- powerpc64le-linux
- riscv32-linux
- riscv64-linux
- s390-linux
- s390x-linux
- x86_64-linux
- mmix-mmixware
- aarch64-netbsd
- armv6l-netbsd
- armv7a-netbsd
- armv7l-netbsd
- i686-netbsd
- m68k-netbsd
- mipsel-netbsd
- powerpc-netbsd
- riscv32-netbsd
- riscv64-netbsd
- x86_64-netbsd
- aarch64_be-none
- aarch64-none
- arm-none
- armv6l-none
- avr-none
- i686-none
- microblaze-none
- microblazeel-none
- mips-none
- mips64-none
- msp430-none
- or1k-none
- m68k-none
- powerpc-none
- powerpcle-none
- riscv32-none
- riscv64-none
- rx-none
- s390-none
- s390x-none
- vc4-none
- x86_64-none
- i686-openbsd
- x86_64-openbsd
- x86_64-redox
- wasm64-wasi
- wasm32-wasi
- aarch64-windows
- x86_64-windows
- i686-windows
- Defined
- Source