encode_tags
[source]
encode_tags
(tags
,encodings
)
Parameters:
tags
:<class 'inspect._empty'>
encodings
:<class 'inspect._empty'>
class
TokenClassificationDatasets
[source]
TokenClassificationDatasets
(train_dset
:Dataset
,valid_dset
:Dataset
,tokenizer_name
:str
,tokenize
:bool
,tokenize_kwargs
:dict
,auto_kwargs
:dict
,remove_columns
:list
,entity_mapping
:dict
) ::TaskDatasets
A set of datasets designed for token classification
Parameters:
train_dset
:<class 'datasets.arrow_dataset.Dataset'>
A training dataset
valid_dset
:<class 'datasets.arrow_dataset.Dataset'>
A validation dataset
tokenizer_name
:<class 'str'>
The name of a tokenizer
tokenize
:<class 'bool'>
Whether to tokenize immediately
tokenize_kwargs
:<class 'dict'>
kwargs for the tokenize function
auto_kwargs
:<class 'dict'>
AutoTokenizer.from_pretrained kwargs
remove_columns
:<class 'list'>
The columns to remove when tokenizing
entity_mapping
:<class 'dict'>
A mapping of entity names to encoded labels
TokenClassificationDatasets.from_dfs
[source]
TokenClassificationDatasets.from_dfs
(train_df
:DataFrame
,token_col
:str
,tag_col
:str
,entity_mapping
:dict
,tokenizer_name
:str
,tokenize
:bool
=True
,valid_df
=None
,split_func
=None
,split_pct
=0.2
,tokenize_kwargs
:dict
={}
,auto_kwargs
:dict
={}
)
Builds TokenClassificationDatasets
from a DataFrame
or set of DataFrames
Parameters:
train_df
:<class 'pandas.core.frame.DataFrame'>
A training dataframe
token_col
:<class 'str'>
The name of the token column
tag_col
:<class 'str'>
The name of the tag column
entity_mapping
:<class 'dict'>
A mapping of entity names to encoded labels
tokenizer_name
:<class 'str'>
The name of the tokenizer
tokenize
:<class 'bool'>
, optionalWhether to tokenize immediately
valid_df
:<class 'NoneType'>
, optionalAn optional validation dataframe
split_func
:<class 'NoneType'>
, optionalOptionally a splitting function similar to RandomSplitter
split_pct
:<class 'float'>
, optionalWhat % to split the train_df
tokenize_kwargs
:<class 'dict'>
, optionalkwargs for the tokenize function
auto_kwargs
:<class 'dict'>
, optionalkwargs for the AutoTokenizer.from_pretrained constructor
TokenClassificationDatasets.from_csvs
[source]
TokenClassificationDatasets.from_csvs
(train_csv
:Path
,token_col
:str
,tag_col
:str
,entity_mapping
:dict
,tokenizer_name
:str
,tokenize
:bool
=True
,valid_csv
:Path
=None
,split_func
=None
,split_pct
=0.2
,tokenize_kwargs
:dict
={}
,auto_kwargs
:dict
={}
, **kwargs
)
Builds SequenceClassificationDatasets
from a single csv or set of csvs. A convience constructor for from_dfs
Parameters:
train_csv
:<class 'pathlib.Path'>
A training csv file
token_col
:<class 'str'>
The name of the token column
tag_col
:<class 'str'>
The name of the tag column
entity_mapping
:<class 'dict'>
A mapping of entity names to encoded labels
tokenizer_name
:<class 'str'>
The name of the tokenizer
tokenize
:<class 'bool'>
, optionalWhether to tokenize immediately
valid_csv
:<class 'pathlib.Path'>
, optionalAn optional validation csv
split_func
:<class 'NoneType'>
, optionalOptionally a splitting function similar to RandomSplitter
split_pct
:<class 'float'>
, optionalWhat % to split the train df
tokenize_kwargs
:<class 'dict'>
, optionalkwargs for the tokenize function
auto_kwargs
:<class 'dict'>
, optionalkwargs for the AutoTokenizer.from_pretrained constructor
kwargs
:<class 'inspect._empty'>
When passing in kwargs if anything should go to the tokenize
function they should go to tokenize_kwargs
, and if it should go to the Auto
class constructor, they should go to auto_kwargs
class
SeqEvalMetrics
[source]
SeqEvalMetrics
(entity_mapping
:dict
)
Multi-label classification metrics for NER, using seqeval metric from HuggingFace
Parameters:
entity_mapping
:<class 'dict'>
A mapping of entity names to encoded labels
class
NERMetric
[source]
NERMetric
(*args
, **kwargs
)
Class for all valid NER metrics usable during fine-tuning with typo-proofing
Parameters:
args
:<class 'inspect._empty'>
kwargs
:<class 'inspect._empty'>
Supported metrics: * Accuracy * F1 * Precision * Recall
class
TokenClassificationTuner
[source]
TokenClassificationTuner
(dls
:DataLoaders
,model_name
:str
,tokenizer
=None
,loss_func
=CrossEntropyLoss()
,metrics
:List
[NERMetric
]=['accuracy', 'f1']
,opt_func
=Adam
,additional_cbs
=None
,expose_fastai_api
=False
,num_classes
:int
=None
,entity_mapping
:dict
=None
, **kwargs
) ::AdaptiveTuner
An AdaptiveTuner
with good defaults for Token Classification tasks
Valid kwargs and defaults:
lr
:float = 0.001splitter
:function =trainable_params
cbs
:list = Nonepath
:Path = Nonemodel_dir
:Path = 'models'wd
:float = Nonewd_bn_bias
:bool = Falsetrain_bn
:bool = Truemoms
: tuple(float) = (0.95, 0.85, 0.95)
Parameters:
dls
:<class 'fastai.data.core.DataLoaders'>
A set of DataLoaders
model_name
:<class 'str'>
A HuggingFace model
tokenizer
:<class 'NoneType'>
, optionalA HuggingFace tokenizer
loss_func
:<class 'fastai.losses.CrossEntropyLossFlat'>
, optionalA loss function
metrics
:typing.List[fastcore.basics.NERMetric]
, optionalMetrics to monitor the training with
opt_func
:<class 'function'>
, optionalA fastai or torch Optimizer
additional_cbs
:<class 'NoneType'>
, optionalAdditional Callbacks to have always tied to the Tuner
expose_fastai_api
:<class 'bool'>
, optionalWhether to expose the fastai API
num_classes
:<class 'int'>
, optionalThe number of classes
entity_mapping
:<class 'dict'>
, optionalA mapping of entity names to encoded labels
kwargs
:<class 'inspect._empty'>
TokenClassificationTuner.predict
[source]
TokenClassificationTuner.predict
(text
:Union
[List
[str
],str
],bs
:int
=64
,grouped_entities
:bool
=True
,detail_level
:DetailLevel
='low'
)
Predict some text
for token classification with the currently loaded model
Parameters:
text
:typing.Union[typing.List[str], str]
Some text or list of texts to do inference with
bs
:<class 'int'>
, optionalA batch size to use for multiple texts
grouped_entities
:<class 'bool'>
, optionalReturn whole entity span strings
detail_level
:<class 'fastcore.basics.DetailLevel'>
, optionalA detail level to return on the predictions
Returns:
<class 'dict'>
A dictionary of filtered predictions