Machine Learning¶
PyTorch CNNs¶
classes for pytorch machine learning models in opensoundscape
For tutorials, see notebooks on opensoundscape.org
-
class
opensoundscape.torch.models.cnn.
CnnResampleLoss
(architecture, classes, single_target=False)¶ Subclass of PytorchModel with ResampleLoss.
ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.
Parameters: - architecture – a model architecture object, for example one generated with the torch.architectures.cnn_architectures module
- classes – list of class names. Must match with training dataset classes.
- single_target –
- True: model expects exactly one positive class per sample
- False: samples can have an number of positive classes
[default: False]
-
class
opensoundscape.torch.models.cnn.
InceptionV3
(classes, freeze_feature_extractor=False, use_pretrained=True, single_target=False)¶ -
train_epoch
()¶ perform forward pass, loss, backpropagation for one epoch
need to override parent because Inception returns different outputs from the forward pass (final and auxiliary layers)
Returns: (targets, predictions, scores) on training files
-
-
class
opensoundscape.torch.models.cnn.
InceptionV3ResampleLoss
(classes, freeze_feature_extractor=False, use_pretrained=True, single_target=False)¶
-
class
opensoundscape.torch.models.cnn.
PytorchModel
(architecture, classes, single_target=False)¶ Generic Pytorch Model with .train() and .predict()
flexible architecture, optimizer, loss function, parameters
for tutorials and examples see opensoundscape.org
methods include train(), predict(), save(), and load()
Parameters: - architecture – a model architecture object, for example one generated with the torch.architectures.cnn_architectures module
- classes – list of class names. Must match with training dataset classes.
- single_target –
- True: model expects exactly one positive class per sample
- False: samples can have an number of positive classes
[default: False]
-
load
(path, load_weights=True, load_classifier_weights=True, load_optimizer_state_dict=True, verbose=False)¶ load model and optimizer state_dict from disk
the object should be saved with model.save() which uses torch.save with keys for ‘model_state_dict’ and ‘optimizer_state_dict’
Parameters: - path – where the file is saved
- load_weights – if False, ignore network weights [default:True]
- load_classifier_weights – if False, ignore classifier layer weights Use False to only load feature weights, eg to re-use trained cnn’s feature extractor for new class [default: True]
- load_optimizer_state_dict – if False, ignore saved parameters for optimizer’s state [default: True]
- verbose – if True, print missing and unused keys for model weights
-
predict
(prediction_dataset, batch_size=1, num_workers=0, activation_layer=None, binary_preds=None, threshold=0.5, error_log=None)¶ Generate predictions on a dataset
Choose to return any combination of scores, labels, and single-target or multi-target binary predictions. Also choose activation layer for scores (softmax, sigmoid, softmax then logit, or None).
Note: the order of returned dataframes is (scores, preds, labels)
Parameters: - prediction_dataset – a Preprocessor or DataSset object that returns tensors, such as AudioToSpectrogramPreprocessor (no augmentation) or CnnPreprocessor (w/augmentation) from opensoundscape.datasets
- batch_size – Number of files to load simultaneously [default: 1]
- num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
- activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
- binary_preds – Optionally return binary (thresholded 0/1) predictions options: - ‘single_target’: max scoring class = 1, others = 0 - ‘multi_target’: scores above threshold = 1, others = 0 - None: do not create or return binary predictions [default: None]
- threshold – prediction threshold for sigmoid scores. Only relevant when binary_preds == ‘multi_target’
- error_log – if not None, saves a list of files that raised errors to the specified file location [default: None]
- Returns: 3 DataFrames (or Nones), w/index matching prediciton_dataset.df
- scores: post-activation_layer scores predictions: 0/1 preds for each class labels: labels from dataset (if available)
- Note: if loading an audio file raises a PreprocessingError, the scores
- and predictions for that sample will be np.nan
Note: if no return type selected for labels/scores/preds, returns None instead of a DataFrame in the returned tuple
-
save
(path=None, save_weights=True, save_optimizer=True, extras={})¶ save model with weights (default location is self.save_path)
Parameters: - path – destination for saved model. if None, uses self.save_path
- save_weights – if False, only save metadata/metrics [default: True]
- save_optimizer – if False, don’t save self.optim.state_dict()
- extras – arbitrary dictionary of things to save, eg valid-preds
-
train
(train_dataset, valid_dataset, epochs=1, batch_size=1, num_workers=0, save_path='.', save_interval=1, log_interval=10, unsafe_sample_log='./unsafe_samples.log')¶ train the model on samples from train_dataset
If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().
Parameters: - train_dataset – a Preprocessor that loads sample (audio file + label) to Tensor in batches (see docs/tutorials for details)
- valid_dataset – a Preprocessor for evaluating performance
- epochs – number of epochs to train for [default=1] (1 epoch constitutes 1 view of each training sample)
- batch_size – number of training files to load/process before re-calculating the loss function and backpropagation
- num_workers – parallelization (ie, cores or cpus) Note: use 0 for single (root) process (not 1)
- save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
- save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.
- log_interval – interval in epochs to evaluate model with validation dataset and print metrics to the log
- unsafe_sample_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
-
train_epoch
()¶ perform forward pass, loss, backpropagation for one epoch
Returns: (targets, predictions, scores) on training files
-
class
opensoundscape.torch.models.cnn.
Resnet18Binary
(classes)¶ Subclass of PytorchModel with Resnet18 architecture
This subclass allows separate training parameters for the feature extractor and classifier
Parameters: - classes – list of class names. Must match with training dataset classes.
- single_target –
- True: model expects exactly one positive class per sample
- False: samples can have an number of positive classes
[default: False]
-
class
opensoundscape.torch.models.cnn.
Resnet18Multiclass
(classes, single_target=False)¶ Multi-class model with resnet18 architecture and ResampleLoss.
Can be single or multi-target.
Parameters: - classes – list of class names. Must match with training dataset classes.
- single_target –
- True: model expects exactly one positive class per sample
- False: samples can have an number of positive classes
[default: False]
Notes - Allows separate parameters for feature & classifier blocks
via self.optimizer_params’s keys: “feature” and “classifier” (by using hand-built architecture)- Uses ResampleLoss which requires class counts as an input.
-
class
opensoundscape.torch.models.utils.
BaseModule
¶ Base class for a pytorch model pipeline class.
All child classes should define load, save, etc
-
opensoundscape.torch.models.utils.
cas_dataloader
(dataset, batch_size, num_workers)¶ Return a dataloader that uses the class aware sampler
Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.
Parameters: - dataset – a pytorch dataset type object
- batch_size – see DataLoader
- num_workers – see DataLoader
-
opensoundscape.torch.models.utils.
get_dataloader
(dataset, batch_size=64, num_workers=1, shuffle=False, sampler='')¶ Create a DataLoader from a DataSet - chooses between normal pytorch DataLoader and ImbalancedDatasetSampler. - Sampler: None -> default DataLoader; ‘imbalanced’->ImbalancedDatasetSampler
Module to initialize PyTorch CNN architectures with custom output shape
This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or Inception V3 architecture.
We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
To use these wrappers, for example, if your model has 10 output classes, write
my_arch=resnet18(10)
Then you can initialize a model object from opensoundscape.torch.models.cnn with your architecture:
model=PytorchModel(classes,my_arch)
or override an existing model’s architecture:
model.network = my_arch
Note: the InceptionV3 architecture must be used differently than other architectures - the easiest way is to simply use the InceptionV3 class in opensoundscape.torch.models.cnn.
-
opensoundscape.torch.architectures.cnn_architectures.
alexnet
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for AlexNet architecture
input size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
densenet121
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for densenet121 architecture
input size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
inception_v3
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for Inception v3 architecture
Input: 229x229
WARNING: expects (299,299) sized images and has auxiliary output. See InceptionV3 class in opensoundscape.torch.models.cnn for use.
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
resnet101
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for ResNet101 architecture
input_size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
resnet152
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for ResNet152 architecture
input_size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
resnet18
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for ResNet18 architecture
input_size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
resnet34
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for ResNet34 architecture
input_size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
resnet50
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for ResNet50 architecture
input_size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
set_parameter_requires_grad
(model, freeze_feature_extractor)¶ if necessary, remove gradients of all model parameters
if freeze_feature_extractor is True, we set requires_grad=False for all features in the feature extraction block. We would do this if we have a pre-trained CNN and only want to change the shape of the final layer, then train only that final classification layer without modifying the weights of the rest of the network.
-
opensoundscape.torch.architectures.cnn_architectures.
squeezenet1_0
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for squeezenet architecture
input size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
-
opensoundscape.torch.architectures.cnn_architectures.
vgg11_bn
(num_classes, freeze_feature_extractor=False, use_pretrained=True)¶ Wrapper for vgg11 architecture
input size = 224
Parameters: - num_classes – number of output nodes for the final layer
- freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
- use_pretrained – if True, uses pre-trained ImageNet features from Pytorch’s model zoo.
defines feature extractor and Architecture class for ResNet CNN
This implementation of the ResNet18 architecture allows for separate access to the feature extraction and classification blocks. This can be useful, for instance, to freeze the feature extractor and only train the classifier layer; or to specify different learning rates for the two blocks.
This implementation is used in the Resnet18Binary and Resnet18Multiclass classes of opensoundscape.torch.models.cnn.
-
class
opensoundscape.torch.architectures.resnet.
ResNetArchitecture
(num_cls, weights_init='ImageNet', num_layers=18, init_classifier_weights=False)¶ ResNet architecture with 18 or 50 layers
This implementation enables separate access to feature and classification blocks.
Parameters: - num_cls – number of classes (int)
- weights_init –
- “ImageNet”: load the pre-trained weights for ImageNet dataset
- path: load weights from a path on your computer or a url
- None: initialize with random weights
- num_layers – 18 for Resnet18 or 50 for Resnet50
- init_classifier_weights –
- if True, load the weights of the classification layer as well as
feature extraction layers - if False (default), only load the weights of the feature extraction layers
-
load
(init_path, init_classifier_weights=True, verbose=False)¶ load state dict (weights) of the feature+classifier optionally load only feature weights not classifier weights
Parameters: - init_path –
- url containing “http”: download weights from web
- path: load weights from local path
- init_classifier_weights –
- if True, load the weights of the classification layer as well as
feature extraction layers - if False (default), only load the weights of the feature extraction layers
- verbose – if True, print missing/unused keys [default: False]
- init_path –
-
class
opensoundscape.torch.architectures.resnet.
ResNetFeature
(block, layers, zero_init_residual=False, groups=1, width_per_group=64, replace_stride_with_dilation=None, norm_layer=None)¶
-
class
opensoundscape.torch.architectures.utils.
BaseArchitecture
¶ Base architecture for reference.
Loss Functions¶
loss function classes to use with opensoundscape models
-
class
opensoundscape.torch.loss.
BCEWithLogitsLoss_hot
¶ use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float
-
class
opensoundscape.torch.loss.
CrossEntropyLoss_hot
¶ use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels
throws a ValueError if labels are not one-hot
-
class
opensoundscape.torch.loss.
ResampleLoss
(class_freq, reduction='mean', loss_weight=1.0)¶
-
opensoundscape.torch.loss.
reduce_loss
(loss, reduction)¶ Reduce loss as specified.
Parameters: - loss (Tensor) – Elementwise loss tensor.
- reduction (str) – Options are “none”, “mean” and “sum”.
Returns: Reduced loss tensor.
Return type: Tensor
-
opensoundscape.torch.loss.
weight_reduce_loss
(loss, weight=None, reduction='mean', avg_factor=None)¶ Apply element-wise weight and reduce loss.
Parameters: - loss (Tensor) – Element-wise loss.
- weight (Tensor) – Element-wise weights.
- reduction (str) – Same as built-in losses of PyTorch.
- avg_factor (float) – Avarage factor when computing the mean of losses.
Returns: Processed loss values.
Return type: Tensor
Safe Dataloading¶
Dataset wrapper to handle errors gracefully in Preprocessor classes
A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._unsafe_indices.
The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._unsafe_indices after using a SafeDataset.
implemented by @msamogh in nonechucks (github.com/msamogh/nonechucks/)
-
class
opensoundscape.torch.safe_dataset.
SafeDataset
(dataset, eager_eval=False)¶ A wrapper for a Dataset that handles errors when loading samples
WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).
Parameters: - dataset – a torch Dataset instance or child such as a Preprocessor
- eager_eval – If True, checks if every file is able to be loaded during initialization (logs _safe_indices and _unsafe_indices)
Attributes: _safe_indices and _unsafe_indices can be accessed later to check which samples threw errors.
-
_build_index
()¶ tries to load each sample, logs _safe_indices and _unsafe_indices
-
__getitem__
(index)¶ If loading an index fails, keeps trying the next index until success
-
_safe_get_item
()¶ Tries to load a sample, returns None if error occurs
-
is_index_built
¶ Returns True if all indices of the original dataset have been classified into safe_samples_indices or _unsafe_samples_indices.
Sampling¶
classes for strategically sampling within a DataLoader
-
class
opensoundscape.torch.sampling.
ClassAwareSampler
(labels, num_samples_cls=1)¶ In each batch of samples, pick a limited number of classes to include and give even representation to each class
-
class
opensoundscape.torch.sampling.
ImbalancedDatasetSampler
(dataset, indices=None, num_samples=None, callback_get_label=None)¶ Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments - dataset and index
Data Selection¶
-
opensoundscape.data_selection.
upsample
(input_df, label_column='Labels', random_state=None)¶ Given a input DataFrame upsample to maximum value
Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.
Parameters: - input_df – A DataFrame to upsample
- label_column – The column to draw unique labels from
- random_state – Set the random_state during sampling
Returns: An upsampled DataFrame
Return type: df
Performance Metrics¶
-
opensoundscape.metrics.
binary_metrics
(targets, preds, class_names=[0, 1])¶ labels should be single-target
-
opensoundscape.metrics.
multiclass_metrics
(targets, preds, class_names)¶ provide a list or np.array of 0,1 targets and predictions
-
opensoundscape.metrics.
predict
(scores, single_target=False, threshold=0.5)¶ convert numeric scores to binary predictions
return 0/1 for an array of scores: samples (rows) x classes (columns)
Parameters: - scores – a 2-d list or np.array. row=sample, columns=classes
- single_target – if True, predict 1 for highest scoring class per sample, 0 for other classes. If False, predict 1 for all scores > threshold [default: False]
- threshold – Predict 1 for score > threshold. only used if single_target = False. [default: 0.5]
Grad Cam¶
GradCAM is a method of visualizing the activation of the network on parts of an image
# Author: Kazuto Nakashima # URL: http://kazuto1011.github.io # Created: 2017-05-26