Create a `DataLoaders` suitable for collaborative filtering from `ratings`.
CollabDataLoaders_from_df(
ratings,
valid_pct = 0.2,
user_name = NULL,
item_name = NULL,
rating_name = NULL,
seed = NULL,
path = ".",
bs = 64,
val_bs = NULL,
shuffle_train = TRUE,
device = NULL
)
ratings
The random percentage of the dataset to set aside for validation (with an optional seed)
The name of the column containing the user (defaults to the first column)
The name of the column containing the item (defaults to the second column)
The name of the column containing the rating (defaults to the third column)
random seed
The folder where to work
The batch size
The batch size for the validation DataLoader (defaults to bs)
If we shuffle the training DataLoader or not
the device, e.g. cpu, cuda, and etc.
None
if (FALSE) {
URLs_MOVIE_LENS_ML_100k()
c(user,item,title) %<-% list('userId','movieId','title')
ratings = fread('ml-100k/u.data', col.names = c(user,item,'rating','timestamp'))
movies = fread('ml-100k/u.item', col.names = c(item, 'title', 'date', 'N', 'url',
paste('g',1:19,sep = '')))
rating_movie = ratings[movies[, .SD, .SDcols=c(item,title)], on = item]
dls = CollabDataLoaders_from_df(rating_movie, seed = 42, valid_pct = 0.1, bs = 64,
item_name=title, path='ml-100k')
}