Skip to content

Module bastionlab.tokenizers.remote_tokenizers

Classes

RemoteTokenizer()

Represents a tokenizer on the server.

Instance variables

decoder :

model :

normalizer :

padding :

post_processor :

pre_tokenizer :

truncation :

Methods

enable_padding(_self, *args, **kwargs) :

enable_truncation(_self, *args, **kwargs) :

encode(self, rdf:ย RemoteLazyFrame, add_special_tokens:ย boolย =ย True) โ€‘> Tuple[bastionlab.polars.RemoteArray,ย bastionlab.polars.RemoteArray]

Encodes a RemoteLazyFrame as tokenized RemoteArray.

Args: rdf: RemoteLazyFrame The RemoteDataframe containing string sequences to be tokenized. add_special_tokens: bool Whether to add the special tokens

Returns: Tuple[RemoteArray, RemoteArray] Returns a tuple of the tokenized entries (first RemoteArray contains input_ids and the other, attention_mask)

get_vocab(_self, *args, **kwargs) :

get_vocab_size(_self, *args, **kwargs) :

no_padding(_self, *args, **kwargs) :