Module bastionlab.tokenizers.remote_tokenizers
Classes
RemoteTokenizer()-
Represents a tokenizer on the server.
Instance variables
decoder:model:normalizer:padding:post_processor:pre_tokenizer:truncation:Methods
enable_padding(_self, *args, **kwargs):enable_truncation(_self, *args, **kwargs):encode(self, rdf:ย RemoteLazyFrame, add_special_tokens:ย boolย =ย True) โ> Tuple[bastionlab.polars.RemoteArray,ย bastionlab.polars.RemoteArray]-
Encodes a RemoteLazyFrame as tokenized RemoteArray.
Args: rdf: RemoteLazyFrame The RemoteDataframe containing string sequences to be tokenized. add_special_tokens: bool Whether to add the special tokens
Returns: Tuple[RemoteArray, RemoteArray] Returns a tuple of the tokenized entries (first RemoteArray contains input_ids and the other, attention_mask)
get_vocab(_self, *args, **kwargs):get_vocab_size(_self, *args, **kwargs):no_padding(_self, *args, **kwargs):