R/blurr_hugging_face.R
HF_TokenClassBeforeBatchTransform.Rd
Handles everything you need to assemble a mini-batch of inputs and targets, as well as decode the dictionary produced
HF_TokenClassBeforeBatchTransform(
hf_arch,
hf_tokenizer,
ignore_token_id = -100,
max_length = NULL,
padding = TRUE,
truncation = TRUE,
is_split_into_words = TRUE,
n_tok_inps = 1,
...
)
architecture
tokenizer
ignore token id
maximum length
padding or not
truncation or not
to split into_words
number tok inputs
additional arguments
None
as a byproduct of the tokenization process in the `encodes` method.