class
SequenceClassificationDatasets
[source]
SequenceClassificationDatasets
(train_dset
:Dataset
,valid_dset
:Dataset
,tokenizer_name
:str
,tokenize
:bool
,tokenize_kwargs
:dict
,auto_kwargs
:dict
,remove_columns
:list
,categorize
:MultiCategorize'>]
) ::TaskDatasets
A set of datasets designed for sequence classification
Parameters:
train_dset
:<class 'datasets.arrow_dataset.Dataset'>
A training dataset
valid_dset
:<class 'datasets.arrow_dataset.Dataset'>
A validation dataset
tokenizer_name
:<class 'str'>
The name of a tokenizer
tokenize
:<class 'bool'>
Whether to tokenize immediatly
tokenize_kwargs
:<class 'dict'>
kwargs for the tokenize function
auto_kwargs
:<class 'dict'>
AutoTokenizer.from_pretrained kwargs
remove_columns
:<class 'list'>
The columns to remove when tokenizing
categorize
:[<class 'adaptnlp.training.core.Categorize'>, <class 'adaptnlp.training.core.MultiCategorize'>]
A Categorize instance
SequenceClassificationDatasets.from_dfs
[source]
SequenceClassificationDatasets.from_dfs
(train_df
:DataFrame
,text_col
:str
,label_col
:str
,tokenizer_name
:str
,tokenize
:bool
=True
,is_multicategory
:bool
=False
,label_delim
=' '
,valid_df
=None
,split_func
=None
,split_pct
=0.2
,tokenize_kwargs
:dict
={}
,auto_kwargs
:dict
={}
)
Builds SequenceClassificationDatasets
from a DataFrame
or set of DataFrames
Parameters:
train_df
:<class 'pandas.core.frame.DataFrame'>
A training dataframe
text_col
:<class 'str'>
The name of the text column
label_col
:<class 'str'>
The name of the label column
tokenizer_name
:<class 'str'>
The name of the tokenizer
tokenize
:<class 'bool'>
, optionalWhether to tokenize immediatly
is_multicategory
:<class 'bool'>
, optionalWhether each item has a single label or multiple labels
label_delim
:<class 'str'>
, optionalIf `is_multicategory`, how to separate the labels
valid_df
:<class 'NoneType'>
, optionalAn optional validation dataframe
split_func
:<class 'NoneType'>
, optionalOptionally a splitting function similar to RandomSplitter
split_pct
:<class 'float'>
, optionalWhat % to split the train_df
tokenize_kwargs
:<class 'dict'>
, optionalkwargs for the tokenize function
auto_kwargs
:<class 'dict'>
, optionalkwargs for the AutoTokenizer.from_pretrained constructor
SequenceClassificationDatasets.from_csvs
[source]
SequenceClassificationDatasets.from_csvs
(train_csv
:Path
,text_col
:str
,label_col
:str
,tokenizer_name
:str
,tokenize
:bool
=True
,is_multicategory
:bool
=False
,label_delim
=' '
,valid_csv
:Path
=None
,split_func
=None
,split_pct
=0.2
,tokenize_kwargs
:dict
={}
,auto_kwargs
:dict
={}
, **kwargs
)
Builds SequenceClassificationDatasets
from a single csv or set of csvs. A convience constructor for from_dfs
Parameters:
train_csv
:<class 'pathlib.Path'>
A training csv file
text_col
:<class 'str'>
The name of the text column
label_col
:<class 'str'>
The name of the label column
tokenizer_name
:<class 'str'>
The name of the tokenizer
tokenize
:<class 'bool'>
, optionalWhether to tokenize immediatly
is_multicategory
:<class 'bool'>
, optionalWhether each item has a single label or multiple labels
label_delim
:<class 'str'>
, optionalIf `is_multicategory`, how to separate the labels
valid_csv
:<class 'pathlib.Path'>
, optionalAn optional validation csv
split_func
:<class 'NoneType'>
, optionalOptionally a splitting function similar to RandomSplitter
split_pct
:<class 'float'>
, optionalWhat % to split the train_df
tokenize_kwargs
:<class 'dict'>
, optionalkwargs for the tokenize function
auto_kwargs
:<class 'dict'>
, optionalkwargs for the AutoTokenizer.from_pretrained constructor
kwargs
:<class 'inspect._empty'>
SequenceClassificationDatasets.from_folders
[source]
SequenceClassificationDatasets.from_folders
(train_path
:Path
,get_label
:callable
,tokenizer_name
:str
,tokenize
:bool
=True
,is_multicategory
:bool
=False
,label_delim
='_'
,valid_path
:Path
=None
,split_func
=None
,split_pct
=0.2
,tokenize_kwargs
:dict
={}
,auto_kwargs
:dict
={}
)
Builds SequenceClassificationDatasets
from a folder or groups of folders
Parameters:
train_path
:<class 'pathlib.Path'>
The path to the training data
get_label
:<built-in function callable>
A function which grabs the label(s) given a text files `Path`
tokenizer_name
:<class 'str'>
The name of the tokenizer
tokenize
:<class 'bool'>
, optionalWhether to tokenize immediatly
is_multicategory
:<class 'bool'>
, optionalWhether each item has a single label or multiple labels
label_delim
:<class 'str'>
, optionalif `is_multicategory`, how to separate the labels
valid_path
:<class 'pathlib.Path'>
, optionalThe path to the validation data
split_func
:<class 'NoneType'>
, optionalOptionally a splitting function similar to RandomSplitter
split_pct
:<class 'float'>
, optionalWhat % to split the items in the `train_path`
tokenize_kwargs
:<class 'dict'>
, optionalkwargs for the tokenize function
auto_kwargs
:<class 'dict'>
, optionalkwargs for the AutoTokenizer.from_pretrained constructor
When passing in kwargs if anything should go to the tokenize
function they should go to tokenize_kwargs
, and if it should go to the Auto
class constructor, they should go to auto_kwargs
class
SequenceClassificationTuner
[source]
SequenceClassificationTuner
(dls
:DataLoaders
,model_name
:str
,tokenizer
=None
,loss_func
=CrossEntropyLoss()
,metrics
=[<function accuracy at 0x7f4dac07a9d0>, <fastai.metrics.AccumMetric object at 0x7f4da7f6e070>]
,opt_func
=Adam
,additional_cbs
=None
,expose_fastai_api
=False
,num_classes
:int
=None
, **kwargs
) ::AdaptiveTuner
An AdaptiveTuner
with good defaults for Sequence Classification tasks
Valid kwargs and defaults:
lr
:float = 0.001splitter
:function =trainable_params
cbs
:list = Nonepath
:Path = Nonemodel_dir
:Path = 'models'wd
:float = Nonewd_bn_bias
:bool = Falsetrain_bn
:bool = Truemoms
: tuple(float) = (0.95, 0.85, 0.95)
Parameters:
dls
:<class 'fastai.data.core.DataLoaders'>
A set of DataLoaders
model_name
:<class 'str'>
A HuggingFace model
tokenizer
:<class 'NoneType'>
, optionalA HuggingFace tokenizer
loss_func
:<class 'fastai.losses.CrossEntropyLossFlat'>
, optionalA loss function
metrics
:<class 'list'>
, optionalMetrics to monitor the training with
opt_func
:<class 'function'>
, optionalA fastai or torch Optimizer
additional_cbs
:<class 'NoneType'>
, optionalAdditional Callbacks to have always tied to the Tuner,
expose_fastai_api
:<class 'bool'>
, optionalWhether to expose the fastai API
num_classes
:<class 'int'>
, optionalThe number of classes
kwargs
:<class 'inspect._empty'>
SequenceClassificationTuner.predict
[source]
SequenceClassificationTuner.predict
(text
:Union
[List
[str
],str
],bs
:int
=64
,detail_level
:DetailLevel
='low'
,class_names
:list
=None
)
Predict some text
for sequence classification with the currently loaded model
Parameters:
text
:typing.Union[typing.List[str], str]
Some text or list of texts to do inference with
bs
:<class 'int'>
, optionalA batch size to use for multiple texts
detail_level
:<class 'fastcore.basics.DetailLevel'>
, optionalA detail level to return on the predictions
class_names
:<class 'list'>
, optionalA list of labels
Returns:
<class 'dict'>
A dictionary of filtered predictions