Shortcuts

Token Classification

The Task

The Token classification Task is similar to text classification, except each token within the text receives a prediction. A common use of this task is Named Entity Recognition (NER). Use this task if you require your data to be classified at the token level.

Datasets

Currently supports the conll dataset, or custom input files.

Training

python train.py task=nlp/token_classification dataset=nlp/token_classification/conll

Swap to GPT backbone:

python train.py task=nlp/token_classification dataset=nlp/token_classification/conll backbone.pretrained_model_name_or_path=gpt2

We report the Precision, Recall, Accuracy and Cross Entropy Loss for validation. To see all options available for the task, see Find all options available for the task `here.

Token Classification Using Your Own Files

To use custom text files, the files should contain new line delimited json objects within the text files. For each token, there should be an associated label.

{
    "label_tags": [11, 12, 12, 21, 13, 11, 11, 21, 13, 11, 12, 13, 11, 21, 22, 11, 12, 17, 11, 21, 17, 11, 12, 12, 21, 22, 22, 13, 11, 0],
    "tokens": ["The", "European", "Commission", "said", "on", "Thursday", "it", "disagreed", "with", "German", "advice", "to", "consumers"]
}
python train.py task=nlp/token_classification dataset.cfg.train_file=train.json dataset.cfg.validation_file=valid.json

Token Classification Inference Pipeline (experimental)

By default we use the NER pipeline, which requires a an input sequence string and the number of labels.

For Hydra to correctly parse your input argument, if your input contains any special characters you must either wrap the entire call in single quotes like ‘+x=”my, sentence”’ or escape special characters. See escaped characters in unquoted values.

python predict.py task=nlp/token_classification +checkpoint_path=/path/to/model.ckpt +x="London is the capital of the United Kingdom." +model_data_kwargs='{labels: 2}'

You can also run prediction using a default HuggingFace pre-trained model:

python predict.py task=nlp/token_classification +x="London is the capital of the United Kingdom." +model_data_kwargs='{labels: 2}'

Or run prediction on a specified HuggingFace pre-trained model:

python predict.py task=nlp/token_classification backbone.pretrained_model_name_or_path=bert-base-cased +x="London is the capital of the United Kingdom." +model_data_kwargs='{labels: 2}'
Read the Docs v: stable
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.