clean.text | R Documentation |
Changes multiline documents to single line. Strips extra whitespace and punctuation. Changes digits to 'X's. Non-alpha characters converted to spaces.
clean.text(bigcorp)
bigcorp |
A tm Corpus object. |
library( tm ) txt = c( "thhis s! and bonkus 4:33pm and Jan 3, 2015. ", " big space\n dawg-ness?") a <- clean.text( VCorpus( VectorSource( txt ) ) ) a[[1]]