Other packages > Find by keyword >

tokenizers  

Fast, Consistent Tokenization of Natural Language Text
View on CRAN: Click here


Download and install tokenizers package within the R console
Install from CRAN:
install.packages("tokenizers")

Install from Github:
library("remotes")
install_github("cran/tokenizers")

Install by package version:
library("remotes")
install_version("tokenizers", "0.3.0")



Attach the package and use:
library("tokenizers")
Maintained by
Lincoln Mullen
[Scholar Profile | Author Map]
All associated links for this package
First Published: 2016-04-02
Latest Update: 2022-12-22
Description:
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words.The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages forfast yet correct tokenization in 'UTF-8'.
How to cite:
Lincoln Mullen (2016). tokenizers: Fast, Consistent Tokenization of Natural Language Text. R package version 0.3.0, https://cran.r-project.org/web/packages/tokenizers. Accessed 25 Jun. 2026.
Previous versions and publish date:
0.1.0 (2016-04-02 23:34), 0.1.1 (2016-04-04 08:37), 0.1.2 (2016-04-14 18:19), 0.1.3 (2016-08-18 23:27), 0.1.4 (2016-08-29 22:59), 0.2.0 (2018-03-21 15:43), 0.2.1 (2018-03-29 22:07), 0.2.3 (2022-09-23 22:00)
Other packages that cited tokenizers R package
View tokenizers citation profile
Other R packages that tokenizers depends, imports, suggests or enhances
Complete documentation for tokenizers
Downloads during the last 30 days

Today's Hot Picks in Authors and Packages

airGRiwrm  
'airGR' Integrated Water Resource Management
Semi-distributed Precipitation-Runoff Modelling based on 'airGR' package models integrating human i ...
Download / Learn more Package Citations See dependency  
quickcode  
Quick and Essential 'R' Tricks for Better Scripts
The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to im ...
Download / Learn more Package Citations See dependency  
foster  
Forest Structure Extrapolation with R
Set of tools to streamline the modeling of the relationship betweensatellite imagery time series or ...
Download / Learn more Package Citations See dependency  
edeaR  
Exploratory and Descriptive Event-Based Data Analysis
Exploratory and descriptive analysis of event based data. Provides methods for describing and select ...
Download / Learn more Package Citations See dependency  
sitmo  
Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files
Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel enviro ...
Download / Learn more Package Citations See dependency  

27,535

R Packages

236,180

Dependencies

73,223

Author Associations

27,536

Publication Badges

© Copyright since 2022. All right reserved, rpkg.net.  Based in Cambridge, Massachusetts, USA