Other packages > Find by keyword >

sentencepiece  

Text Tokenization using Byte Pair Encoding and Unigram Modelling
View on CRAN: Click here


Download and install sentencepiece package within the R console
Install from CRAN:
install.packages("sentencepiece")

Install from Github:
library("remotes")
install_github("cran/sentencepiece")

Install by package version:
library("remotes")
install_version("sentencepiece", "0.2.3")



Attach the package and use:
library("sentencepiece")
Maintained by
Jan Wijffels
[Scholar Profile | Author Map]
First Published: 2020-06-04
Latest Update: 2022-11-13
Description:
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) . Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) .
How to cite:
Jan Wijffels (2020). sentencepiece: Text Tokenization using Byte Pair Encoding and Unigram Modelling. R package version 0.2.3, https://cran.r-project.org/web/packages/sentencepiece. Accessed 28 Feb. 2025.
Previous versions and publish date:
0.1.1 (2020-06-04 12:10), 0.1.2 (2020-06-08 23:40), 0.2.1 (2021-12-21 17:00), 0.2.2 (2022-11-09 09:00), 0.2 (2021-12-15 00:00)
Other packages that cited sentencepiece R package
View sentencepiece citation profile
Other R packages that sentencepiece depends, imports, suggests or enhances
Complete documentation for sentencepiece
Downloads during the last 30 days
01/2901/3001/3102/0102/0202/0302/0402/0502/0602/0702/0802/0902/1002/1102/1202/1302/1402/1502/1602/1702/1802/1902/2002/2102/2202/2302/2402/2502/26Downloads for sentencepiece0102030405060TrendBars

Today's Hot Picks in Authors and Packages

flifo  
Don't Get Stuck with Stacks in R
Functions to create and manipulate FIFO (First In First Out), LIFO (Last In First Out), and NINO (N ...
Download / Learn more Package Citations See dependency  
DSSP  
Implementation of the Direct Sampling Spatial Prior
Draw samples from the direct sampling spatial prior model as described in G. White, D. Sun, P. Spec ...
Download / Learn more Package Citations See dependency  
onemap  
Construction of Genetic Maps in Experimental Crosses
Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non ...
Download / Learn more Package Citations See dependency  
CorrBin  
Nonparametrics with Clustered Binary and Multinomial Data
Implements non-parametric analyses for clustered binary and multinomial data. The elements of the c ...
Download / Learn more Package Citations See dependency  
grnn  
General regression neural network
The program GRNN implements the algorithm proposed by Specht (1991). ...
Download / Learn more Package Citations See dependency  
nextGenShinyApps  
Craft Exceptional 'R Shiny' Applications and Dashboards with Novel Responsive Tools
Nove responsive tools for designing and developing 'Shiny' dashboards and applications. The scripts ...
Download / Learn more Package Citations See dependency  

23,762

R Packages

206,309

Dependencies

64,478

Author Associations

23,763

Publication Badges

© Copyright since 2022. All right reserved, rpkg.net.  Based in Cambridge, Massachusetts, USA