Shortcuts

DataModule API Reference

class lightning_transformers.task.nlp.language_modeling.LanguageModelingDataModule(*args, cfg=LanguageModelingDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, block_size=128), **kwargs)

Defines LightningDataModule for Language Modeling Datasets.

Parameters
  • *argsHFDataModule specific arguments.

  • cfg (LanguageModelingDataConfig) – Contains data specific parameters when processing/loading the dataset (Default LanguageModelingDataConfig)

  • **kwargsHFDataModule specific arguments.

class lightning_transformers.task.nlp.multiple_choice.MultipleChoiceDataModule(tokenizer, cfg=HFTransformerDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None))

Defines the LightningDataModule for Multiple Choice Datasets.

class lightning_transformers.task.nlp.question_answering.QuestionAnsweringDataModule(*args, cfg=QuestionAnsweringDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=384, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, version_2_with_negative=False, null_score_diff_threshold=0.0, doc_stride=128, n_best_size=20, max_answer_length=30), **kwargs)

Defines the LightningDataModule for Question Answering Datasets.

Parameters
  • *argsHFDataModule specific arguments.

  • cfg (QuestionAnsweringDataConfig) – Contains data specific parameters when processing/loading the dataset (Default QuestionAnsweringDataConfig)

  • **kwargsHFDataModule specific arguments.

class lightning_transformers.task.nlp.summarization.SummarizationDataModule(*args, cfg=Seq2SeqDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='longest', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, max_target_length=128, max_source_length=1024), **kwargs)

Defines the LightningDataModule for Summarization Datasets.

class lightning_transformers.task.nlp.text_classification.TextClassificationDataModule(tokenizer, cfg=HFTransformerDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None))

Defines the LightningDataModule for Text Classification Datasets.

class lightning_transformers.task.nlp.token_classification.TokenClassificationDataModule(*args, cfg=TokenClassificationDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, task_name='ner', label_all_tokens=False), **kwargs)

Defines the LightningDataModule for Token Classification Datasets.

Parameters
  • *argsHFDataModule specific arguments.

  • cfg (TokenClassificationDataConfig) – Contains data specific parameters when processing/loading the dataset (Default TokenClassificationDataConfig)

  • **kwargsHFDataModule specific arguments.

class lightning_transformers.task.nlp.translation.TranslationDataModule(*args, cfg=TranslationDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='longest', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, max_target_length=128, max_source_length=1024, source_language='', target_language=''), **kwargs)

Defines the LightningDataModule for Translation Datasets.

Parameters
  • *argsSeq2SeqDataModule specific arguments.

  • cfg (TranslationDataConfig) – Contains data specific parameters when processing/loading the dataset (Default TranslationDataConfig)

  • **kwargsSeq2SeqDataModule specific arguments.

class lightning_transformers.core.nlp.seq2seq.Seq2SeqDataModule(*args, cfg=Seq2SeqDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='longest', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None, max_target_length=128, max_source_length=1024), **kwargs)

Defines the LightningDataModule for Seq2Seq Datasets, such as Summarization and Translation.

Parameters
  • *argsHFDataModule specific arguments.

  • cfg (Seq2SeqDataConfig) – Contains data specific parameters when processing/loading the dataset (Default Seq2SeqDataConfig)

  • **kwargsHFDataModule specific arguments.

class lightning_transformers.core.nlp.HFDataModule(tokenizer, cfg=HFTransformerDataConfig(batch_size=32, num_workers=0, dataset_name=None, dataset_config_name=None, train_val_split=None, train_file=None, test_file=None, validation_file=None, padding='max_length', truncation='only_first', max_length=128, preprocessing_num_workers=8, load_from_cache_file=True, cache_dir=None, limit_train_samples=None, limit_val_samples=None, limit_test_samples=None))

Base LightningDataModule for HuggingFace Datasets. Provides helper functions and boilerplate logic to load/process datasets.

Parameters
  • tokenizer (PreTrainedTokenizerBase) – PreTrainedTokenizerBase for tokenizing data.

  • cfg (HFTransformerDataConfig) – Contains data specific parameters when processing/loading the dataset (Default HFTransformerDataConfig)

Read the Docs v: stable
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.