fastports

sentencepiece 0.2.1

textproc/py-sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation

Category
textproc
Maintainer
yuri@FreeBSD.org
WWW
https://github.com/google/sentencepiece
License
APACHE20
USES
compiler:c++17-lang pkgconfig python

Description

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for
Neural Network-based text generation systems where the vocabulary size is
predetermined prior to the neural model training. SentencePiece implements
subword units (e.g., byte-pair-encoding (BPE)) and unigram language model
with the extension of direct training from raw sentences. SentencePiece
allows us to make a purely end-to-end system that does not depend on
language-specific pre/postprocessing.

Dependencies

Commit History

may be incomplete — full history at freebsd-ports on GitHub

Loading commit history — this may take up to a minute on first view. Reload the page in a moment.