Load a dataset

HF_load_dataset(
  path,
  name = NULL,
  data_dir = NULL,
  data_files = NULL,
  split = NULL,
  cache_dir = NULL,
  features = NULL,
  download_config = NULL,
  download_mode = NULL,
  ignore_verifications = FALSE,
  save_infos = FALSE,
  script_version = NULL,
  ...
)

Arguments

path

path

name

name

data_dir

dataset dir

data_files

dataset files

split

split

cache_dir

cache directory

features

features

download_config

download configuration

download_mode

download mode

ignore_verifications

ignore verifications or not

save_infos

save information or not

script_version

script version

...

additional arguments

Value

data frame

Details

This method does the following under the hood: 1. Download and import in the library the dataset loading script from ``path`` if it's not already cached inside the library. Processing scripts are small python scripts that define the citation, info and format of the dataset, contain the URL to the original data files and the code to load examples from the original data files. You can find some of the scripts here: https://github.com/huggingface/datasets/datasets and easily upload yours to share them using the CLI ``datasets-cli``. 2. Run the dataset loading script which will: * Download the dataset file from the original URL (see the script) if it's not already downloaded and cached. * Process and cache the dataset in typed Arrow tables for caching. Arrow table are arbitrarily long, typed tables which can store nested objects and be mapped to numpy/pandas/python standard types. They can be directly access from drive, loaded in RAM or even streamed over the web. 3. Return a dataset build from the requested splits in ``split`` (default: all).