Other packages > Find by keyword >

sentencepiece  

Text Tokenization using Byte Pair Encoding and Unigram Modelling
View on CRAN: Click here


Download and install sentencepiece package within the R console
Install from CRAN:
install.packages("sentencepiece")

Install from Github:
library("remotes")
install_github("cran/sentencepiece")

Install by package version:
library("remotes")
install_version("sentencepiece", "0.2.3")



Attach the package and use:
library("sentencepiece")
Maintained by
Jan Wijffels
[Scholar Profile | Author Map]
All associated links for this package
First Published: 2020-06-04
Latest Update: 2022-11-13
Description:
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) . Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) .
How to cite:
Jan Wijffels (2020). sentencepiece: Text Tokenization using Byte Pair Encoding and Unigram Modelling. R package version 0.2.3, https://cran.r-project.org/web/packages/sentencepiece
Previous versions and publish date:
0.1.1 (2020-06-04 12:10), 0.1.2 (2020-06-08 23:40), 0.2.1 (2021-12-21 17:00), 0.2.2 (2022-11-09 09:00), 0.2 (2021-12-15 00:00)
Other packages that cited sentencepiece R package
View sentencepiece citation profile
Other R packages that sentencepiece depends, imports, suggests or enhances
Downloads during the last 30 days
Get rewarded with contribution points by helping add
Reviews / comments / questions /suggestions ↴↴↴

Today's Hot Picks in Authors and Packages

CompoundEvents  
Statistical Modeling of Compound Events
Tools for extracting occurrences, assessing potential driving factors, predicting occurrences, and q ...
Download / Learn more Package Citations See dependency  
multiocc  
Fits Multivariate Spatio-Temporal Occupancy Model
Spatio-temporal multivariate occupancy models can handle multiple species in occupancy models. This ...
Download / Learn more Package Citations See dependency  
nextGenShinyApps  
Craft Exceptional 'R Shiny' Applications and Dashboards with Novel Responsive Tools
Nove responsive tools for designing and developing 'Shiny' dashboards and applications. The scripts ...
Download / Learn more Package Citations See dependency  
gclus  
Clustering Graphics
Orders panels in scatterplot matrices and parallel coordinate displays by some merit index. Package ...
Download / Learn more Package Citations See dependency  
stagePop  
Modelling the Population Dynamics of a Stage-Structured Species in Continuous Time
Provides facilities to implement and run population models of stage-structured species... ...
Download / Learn more Package Citations See dependency  
MatrixEQTL  
Matrix eQTL: Ultra Fast eQTL Analysis via Large Matrix Operations
Matrix eQTL is designed for fast eQTL analysis on large datasets. Matrix eQTL can test for associat ...
Download / Learn more Package Citations See dependency  

22,114

R Packages

188,080

Dependencies

55,244

Author Associations

22,115

Publication Badges

© Copyright 2022 - present. All right reserved, rpkg.net. Contact Us / Suggestions / Concerns