Basic classes and helpers for modularized training

class ParentLabeller[source]

ParentLabeller(level=1)

Extracts class based on filename's parent at level

Parameters:

  • level : <class 'int'>, optional

    The level up from `fname` to find the label

class ColReader[source]

ColReader(cols, pref:str='', suff:str='', label_delim:str=None)

Reads cols in row with potential pref and suff Based on the fastai class

Parameters:

  • cols : <class 'inspect._empty'>

    Some column names to use

  • pref : <class 'str'>, optional

    A prefix

  • suff : <class 'str'>, optional

    A suffix

  • label_delim : <class 'str'>, optional

    A label delimiter

class Categorize[source]

Categorize(names, sort=True)

Collection of categories with reverse mapping in o2i Based on the fastai class

Parameters:

  • names : <class 'inspect._empty'>

    An interable collection of items to create a vocab from

  • sort : <class 'bool'>, optional

    Whether to make the items sorted

Categorize.map_objs[source]

Categorize.map_objs(objs)

Map objs to IDs

Parameters:

  • objs : <class 'inspect._empty'>

    Some iterable collection

Returns:

  • <class 'fastcore.foundation.L'>

Categorize.map_ids[source]

Categorize.map_ids(ids)

Map ids to objects in vocab

Parameters:

  • ids : <class 'inspect._empty'>

    Some ids correlating to `self.classes`

Returns:

  • <class 'fastcore.foundation.L'>

Categorize.decode[source]

Categorize.decode(o)

Decodes o by looking in self.classes

Parameters:

  • o : <class 'inspect._empty'>

    A key in self.classes

class MultiCategorize[source]

MultiCategorize(names) :: Categorize

Collection of multi-categories with reverse mapping in o2i Based on the fastai class

Parameters:

  • names : <class 'inspect._empty'>

    An interable collection of items to create a vocab from

MultiCategorize.decode[source]

MultiCategorize.decode(o)

Decodes o by looking in self.classes

Parameters:

  • o : <class 'inspect._empty'>

    A list of keys in self.classes

Returns:

  • <class 'list'>

Splitters

Functions designed for splitting your data

To write your own you should make a function that returns two L's of indicies (or lists work as well)

For example, if I have a dataset of 5 items, we start with [0,1,2,3,4]. If I wanted to write a split function to split the first three and last two items into train and validation, I can write it as:

def split_func(idxs): return L(idxs[:3]), L(idxs[3:])

And we can see it work:

split_func([0,1,2,3,4])
((#3) [0,1,2], (#2) [3,4])

RandomSplitter[source]

RandomSplitter(valid_pct=0.2, seed=None)

Creates a function that splits some items between train and validation with valid_pct randomly Based on the fastai class

Parameters:

  • valid_pct : <class 'float'>, optional

  • seed : <class 'NoneType'>, optional

class TaskDatasets[source]

TaskDatasets(train_dset, valid_dset, tokenizer_name:str=None, tokenize:bool=True, tokenize_func:callable=None, tokenize_kwargs:dict={}, auto_kwargs:dict={}, remove_cols:Union[str, List[str]]=None, label_keys:list=['labels'])

A set of datasets for a particular task, with a simple API.

Note: This is the base API, items should be a set of regular text and model-ready labels, including label or one-hot encoding being applied.

Parameters:

  • train_dset : <class 'inspect._empty'>

    A train `Dataset` object

  • valid_dset : <class 'inspect._empty'>

    A validation `Dataset` object

  • tokenizer_name : <class 'str'>, optional

    The string name of a `HuggingFace` tokenizer or model. If `None`, will not tokenize the dataset.

  • tokenize : <class 'bool'>, optional

    Whether to tokenize the dataset immediatly

  • tokenize_func : <built-in function callable>, optional

    A function to tokenize an item with

  • tokenize_kwargs : <class 'dict'>, optional

    Some kwargs for when we call the tokenizer

  • auto_kwargs : <class 'dict'>, optional

    Some kwargs when calling `AutoTokenizer.from_pretrained`

  • remove_cols : typing.Union[str, typing.List[str]], optional

    What columns to remove

  • label_keys : <class 'list'>, optional

    The keys in each item that relate to the label (such as `labels`)

TaskDatasets.set_tokenizer[source]

TaskDatasets.set_tokenizer(tokenizer_name:str, override_existing:bool=False)

Sets a new AutoTokenizer to self.tokenizer

Parameters:

  • tokenizer_name : <class 'str'>

    A string name of a `HuggingFace` tokenizer or model

  • override_existing : <class 'bool'>, optional

    Whether to override an existing tokenizer

TaskDatasets.dataloaders[source]

TaskDatasets.dataloaders(batch_size:int=8, shuffle_train:bool=True, collate_fn:callable=None, path='.', device=None)

Creates DataLoaders from the dataset

Parameters:

  • batch_size : <class 'int'>, optional

    A batch size

  • shuffle_train : <class 'bool'>, optional

    Whether to shuffle the training dataset

  • collate_fn : <built-in function callable>, optional

    A custom collation function

  • path : <class 'str'>, optional

  • device : <class 'NoneType'>, optional

class AdaptiveDataLoaders[source]

AdaptiveDataLoaders(*loaders, tokenizer=None, label_keys:list=['labels'], path='.', device=None) :: DataLoaders

A set of DataLoaders that keeps track of a tokenizer

Parameters:

  • loaders : <class 'inspect._empty'>

  • tokenizer : <class 'NoneType'>, optional

  • label_keys : <class 'list'>, optional

  • path : <class 'str'>, optional

  • device : <class 'NoneType'>, optional

AdaptiveDataLoaders.show_batch[source]

AdaptiveDataLoaders.show_batch(ds_idx:int=0, n:int=5, raw:bool=False)

Show a batch of data

from fastcore.basics import mk_class
from nbverbose.showdoc import show_doc

class Strategy[source]

Strategy(*args, **kwargs)

Class for fitting strategies with typo-proofing

Parameters:

  • args : <class 'inspect._empty'>

  • kwargs : <class 'inspect._empty'>

Supported strategies:
* CosineAnnealing
* OneCycle
* SGDR

class AdaptiveTuner[source]

AdaptiveTuner(expose_fastai:bool=False, tokenizer=None, label_keys:list=['labels'], **kwargs)

A base Tuner that interfaces with AdaptiveLearner with specific exposed functions

Parameters:

  • expose_fastai : <class 'bool'>, optional

    Whether to expose the entire API in `self`

  • tokenizer : <class 'NoneType'>, optional

    A HuggingFace tokenizer

  • label_keys : <class 'list'>, optional

    A list of keys correlating to the labels in the batch

  • kwargs : <class 'inspect._empty'>

Since fastai is a very lightweight framework that is easily approachable and incorporates state-of-the-art ideas, AdaptNLP bridges the gap between HuggingFace and fastai, allowing you to train with their framework through the *Tuner classes

The constructor of the AdaptiveTuner class has an optional expose_fastai_api parameter. When set to True, the Tuner inherits fastai's Learner, so every attribute of the Learner is available to you. This is only recommended for those very familiar with the fastai API.

Otherwise, you have access to eight functions in each class:

  • tune
  • lr_find
  • predict
  • save
  • load
  • export

All task fine-tuners should inherit the AdaptiveTuner, write good defaults, and override any specific needs as dictated by the task

AdaptiveTuner.tune[source]

AdaptiveTuner.tune(epochs:int, lr:float=None, strategy:Strategy='fit_one_cycle', callbacks:list=[], **kwargs)

Fine tune self.model for epochs with an lr and strategy

Parameters:

  • epochs : <class 'int'>

    Number of iterations to train for

  • lr : <class 'float'>, optional

    If None, finds a new learning rate and uses suggestion_method

  • strategy : <class 'fastcore.basics.Strategy'>, optional

    A fitting method

  • callbacks : <class 'list'>, optional

    Extra fastai Callbacks

  • kwargs : <class 'inspect._empty'>

AdaptiveTuner.lr_find[source]

AdaptiveTuner.lr_find(start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=valley)

Runs fastai's LR Finder

Parameters:

  • start_lr : <class 'float'>, optional

  • end_lr : <class 'int'>, optional

  • num_it : <class 'int'>, optional

  • stop_div : <class 'bool'>, optional

  • show_plot : <class 'bool'>, optional

  • suggest_funcs : <class 'function'>, optional

AdaptiveTuner.save[source]

AdaptiveTuner.save(save_directory)

Save a pretrained model to a save_directory

Parameters:

  • save_directory : <class 'inspect._empty'>

    A folder to save our model to

AdaptiveTuner.load[source]

AdaptiveTuner.load(path:Union[Path, str], device=None)

Loads a pretrained model with AutoModel.from_pretrained from path and loads it to device

Parameters:

  • path : typing.Union[pathlib.Path, str]

    A location to load a tokenizer and weights from

  • device : <class 'NoneType'>, optional

    A valid device such as `cpu` or `cuda:0`

AdaptiveTuner.predict[source]

AdaptiveTuner.predict(text:Union[List[str], str])

Predict some text with the current model. Needs to be implemented for each task separately

Parameters:

  • text : typing.Union[typing.List[str], str]

    Some text or list of texts to inference with

AdaptiveTuner.export[source]

AdaptiveTuner.export(save_directory:Union[Path, str])

Exports the current model and tokenizer information to save_directory

Parameters:

  • save_directory : typing.Union[pathlib.Path, str]

    A folder to export our model to