Other packages > Find by keyword >

sentencepiece  

Text Tokenization using Byte Pair Encoding and Unigram Modelling
View on CRAN: Click here


Download and install sentencepiece package within the R console
Install from CRAN:
install.packages("sentencepiece")

Install from Github:
library("remotes")
install_github("cran/sentencepiece")

Install by package version:
library("remotes")
install_version("sentencepiece", "0.2.4")



Attach the package and use:
library("sentencepiece")
Maintained by
Jan Wijffels
[Scholar Profile | Author Map]
All associated links for this package
First Published: 2020-06-04
Latest Update: 2022-11-13
Description:
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) . Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) .
How to cite:
Jan Wijffels (2020). sentencepiece: Text Tokenization using Byte Pair Encoding and Unigram Modelling. R package version 0.2.4, https://cran.r-project.org/web/packages/sentencepiece. Accessed 04 Jun. 2026.
Previous versions and publish date:
0.1.1 (2020-06-04 12:10), 0.1.2 (2020-06-08 23:40), 0.2.1 (2021-12-21 17:00), 0.2.2 (2022-11-09 09:00), 0.2.3 (2022-11-13 10:30), 0.2.4 (2025-11-27 21:20), 0.2 (2021-12-15 00:00)
Other packages that cited sentencepiece R package
View sentencepiece citation profile
Other R packages that sentencepiece depends, imports, suggests or enhances
Complete documentation for sentencepiece
Downloads during the last 30 days

Today's Hot Picks in Authors and Packages

AMPLE  
Shiny Apps to Support Capacity Building on Harvest Control Rules
Three Shiny apps are provided that introduce Harvest Control Rules (HCR) for fisheries management. ...
Download / Learn more Package Citations See dependency  
crplyr  
A 'dplyr' Interface for Crunch
In order to facilitate analysis of datasets hosted on the Crunch data platform ...
Download / Learn more Package Citations See dependency  
golem  
A Framework for Robust Shiny Applications
An opinionated framework for building a production-ready 'Shiny' application. This package contains ...
Download / Learn more Package Citations See dependency  
murphydiagram  
Murphy Diagrams for Forecast Comparisons
Data and code for the paper by Ehm, Gneiting, Jordan and Krueger ('Of Quantiles and Expectiles: Con ...
Download / Learn more Package Citations See dependency  
phers  
Calculate Phenotype Risk Scores
Use phenotype risk scores based on linked clinical and genetic data to study Mendelian disease and ...
Download / Learn more Package Citations See dependency  
shinybusy  
Busy Indicators and Notifications for 'Shiny' Applications
Add indicators (spinner, progress bar, gif) in your 'shiny' applications to show the user that the ...
Download / Learn more Package Citations See dependency  

27,268

R Packages

233,548

Dependencies

72,590

Author Associations

27,205

Publication Badges

© Copyright since 2022. All right reserved, rpkg.net.  Based in Cambridge, Massachusetts, USA