Multiple Choice¶

The Task¶

The Multiple Choice task requires the model to decide on a set of options, given a question with optional context.

Similar to the text classification task, the model is fine-tuned on multi-class classification to provide probabilities across all possible answers. This is useful if the data you’d like the model to predict on requires selecting from a set of answers based on context or questions, where the answers can be variable. In contrast, use the text classification task if the answers remain static and are not needed to be included during training.

Datasets¶

Currently supports the RACE and SWAG datasets, or custom input files.

Question: What color is the sky?
Answers:
    A: Blue
    B: Green
    C: Red

Model answer: A

Training¶

import pytorch_lightning as pl
from transformers import AutoTokenizer

from lightning_transformers.task.nlp.multiple_choice import (
    MultipleChoiceTransformer,
    SwagMultipleChoiceDataModule,
)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-uncased")
model = MultipleChoiceTransformer(pretrained_model_name_or_path="bert-base-uncased")
dm = SwagMultipleChoiceDataModule(
    batch_size=1,
    dataset_config_name="regular",
    padding=False,
    tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)

trainer.fit(model, dm)

We report Cross Entropy Loss, Precision, Recall and Accuracy for validation.

Multiple Choice Using Your Own Files¶

To use custom text files, the files should contain the data you want to train and validate on and be in CSV or JSON format as described below.

The format varies from dataset to dataset as input columns may differ, as well as pre-processing. To make our life easier, we use the RACE dataset format and override the files that are loaded.

Below we have defined a json file to use as our input data.

{
    "article": "The man walked into the red house but couldn't see where the light was.",
    "question": "What colour is the house?",
    "options": ["White", "Red", "Blue"]
    "answer": "Red"
}

We override the dataset files, allowing us to still use the data transforms defined with the RACE dataset.

from lightning_transformers.task.nlp.multiple_choice import (
    RaceMultipleChoiceDataModule,
)

dm = RaceMultipleChoiceDataModule(
    batch_size=1,
    dataset_config_name="all",
    padding=False,
    train_file="path/train.json",
    validation_file="/path/valid.json"
    tokenizer=tokenizer,
)

Multiple Choice Inference¶

Currently there is no HF pipeline available for this model. Feel free to make an issue or PR if you require this functionality.