Intro

First, we need to install fastaudio module.

reticulate::py_install('fastaudio',pip = TRUE)

Dataset

Grab data from TensorFlow Speech Commands (2.3 GB):

commands_path = "SPEECHCOMMANDS"
audio_files = get_audio_files(commands_path)
length(audio_files$items)
# [1] 105835

Preprocess

Prepare dataset and put into data loader:

DBMelSpec = SpectrogramTransformer(mel=TRUE, to_db=TRUE)
a2s = DBMelSpec()
crop_4000ms = ResizeSignal(4000)
tfms = list(crop_4000ms, a2s)
auds = DataBlock(blocks = list(AudioBlock(), CategoryBlock()),  
                 get_items = get_audio_files, 
                 splitter = RandomSplitter(),
                 item_tfms = tfms,
                 get_y = parent_label)

audio_dbunch = auds %>% dataloaders(commands_path, item_tfms = tfms, bs = 20)

See batch:

audio_dbunch %>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180)

Model

Before fitting, 3 channels to 1 channel:

torch = torch()
nn = nn()

learn = Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy)

# channel from 3 to 1
learn$model[0][0][['in_channels']] %f% 1L
# reshape
new_weight_shape <- torch$nn$parameter$Parameter(
  (learn$model[0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L))

# assign with %f%
learn$model[0][0][['weight']] %f% new_weight_shape

Add callbacks

Weights and biases could be save and visualized on wandb.ai:

# login for the 1st time then remove it
login("API_key_from_wandb_dot_ai")
init(project='R')
wandb: Currently logged in as: henry090 (use `wandb login --relogin` to force relogin)
wandb: Tracking run with wandb version 0.10.8
wandb: Syncing run macabre-zombie-2
wandb: ⭐️ View project at https://wandb.ai/henry090/speech_recognition_from_R
wandb: 🚀 View run at https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv
wandb: Run data is saved locally in wandb/run-20201030_224503-2sjw3juv
wandb: Run `wandb off` to turn off syncing.

Conclusion

Now we can train our model:

learn %>% fit_one_cycle(3, lr_max=slice(1e-2), cbs = list(WandbCallback()))
epoch   train_loss   valid_loss   accuracy   time 
------  -----------  -----------  ---------  -----
epoch   train_loss   valid_loss   accuracy   time 
------  -----------  -----------  ---------  -----
WandbCallback requires use of "SaveModelCallback" to log best model
0       0.590236     0.728817     0.787121   04:18 
WandbCallback was not able to get prediction samples -> wandb.log must be passed a dictionary
1       0.288492     0.310335     0.908490   04:19 
2       0.182899     0.196792     0.941088   04:10 

See beautiful dashboard here:

https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv?workspace=user-henry090