Tokenize texts in the `text_cols` of the csv `fname` in parallel using `n_workers`

tokenize_csv(
  fname,
  text_cols,
  outname = NULL,
  n_workers = 4,
  rules = NULL,
  mark_fields = NULL,
  tok = NULL,
  header = "infer",
  chunksize = 50000
)

Arguments

fname

file name

text_cols

text columns

outname

outname

n_workers

numeber of workers

rules

rules

mark_fields

mark fields

tok

tokenizer

header

header

chunksize

chunk size

Value

None