Data loader module

data augmentation: standard deviation to use for Gaussian filtered images during high pass filtering (dtype=int)

--SubtractGaussSigma 5

data augmentation: use only Gauss-Sigma filtered images (dtype=bool)

--nooriginal False

data augmentation: deformation grid spacing in pixels (dtype=int); if zero: no deformation will be applied

--deform 0

data augmentation: given a deformation grid spacing, this determines the standard deviations for each dimension of the random deformation vectors (dtype=float)

--deformSigma 0.0

data augmentation: activate random mirroring along the specified axes during training (dtype=bool)

--mirror [0]

data augmentation: random multiplicative Gaussian noise with unit mean, unit variance (dtype=bool)

--gaussiannoise False

data augmentation: amount of randomly scaling images per dimension as a factor (dtype=float)

--scaling 0.0

data augmentation: amount in radians to randomly rotate the input around a randomly drawn vector (dtype=float)

--rotation 0.0

sampling outside of discrete coordinates (dtype=float)

--shift 0.0

interpolation when using no deformation grids (dtype=bool)

--interpolate_always False

define random seed for deformation variables (dtype=int)

--deformseed 1234

spline order interpolation_order in 0 (constant), 1 (linear), 2 (cubic) (dtype=int)

--interpolation_order 3

rule on how to add values outside image boundaries (“constant”, “nearest”, “reflect”, “wrap”) (dtype=str)

--padding_rule constant

whiten image data to mean 0 and unit variance (dtype=bool)

--whiten True

force each n-th sample to contain labelled data (dtype=int)

--each_with_labels 0

whether channels appear first (PyTorch) or last (TensorFlow) (dtype=bool)

--channels_first False

if multiple masks are provided, we select one at random for each sample (dtype=bool)

--choose_mask_at_random False

perform one-hot-encoding from probability distributions (dtype=bool)

--perform_one_hot_encoding True

ignore missing masks (dtype=bool)

--ignore_missing_mask False

correct nifti orientation (dtype=bool)

--correct_orientation True

DataCollection

class mdgru.data.DataCollection(kw)[source]

Bases: object

Abstract class for all data handling classes.

Parameters:kw (dict containing the following options.) –
  • seed [default: 1234] Seed to be used for deterministic random sampling, given no threading is used
  • nclasses [default: None]
_defaults = {'nclasses': None, 'seed': {'help': 'Seed to be used for deterministic random sampling, given no threading is used', 'value': 1234}}
_one_hot_vectorize(indexlabels, nclasses=None, zero_out_label=None)[source]

simplified onehotlabels method. we discourage using interpolated labels anyways, hence this only allows integer values in indexlabels

Parameters:
  • indexlabels (ndarray) – array containing labels or indices for each class, starting at 0 until nclasses-1
  • nclasses (int) – number of classes
  • zero_out_label (int) – label to assign probability of zero for the whole probability distribution
Returns:

ndarray – Probabilitydistributions per pixel where at position indexlabels the value is set to 1, otherwise to 0

static get_all_tps(folder, featurefiles, maskfiles)[source]

computes list of all folders that are subfolders of folder and contain all provided featurefiles and maskfiles.

Parameters:
  • folder (str) – location at which timepoints are searched
  • featurefiles (list of str) – necessary featurefiles to be contained in a timepoint
  • maskfiles (list of str) – necessary maskfiles to be contained in a timepoint
Returns:

sorted list – valid timepoints in string format

get_data_dims()[source]

Returns the dimensionality of the whole collection (even if samples are returned/computed on the fly, the theoretical size is returned). Has between two and three entries (Depending on the type of data. A dataset with sequence of vectors has 3, a dataset with sequences of indices has two, etc)

Returns:list – A shape array of the dimensionality of the data.
get_shape()[source]
get_states()[source]

Get states of this data collection

random_sample(**kw)[source]

Randomly samples from our dataset. If the implementation knows different datasets, the dataset string can be used to choose one, if not, it will be ignored.

Parameters:**kw (keyword args) – batch_size can be set, amongst other parameters. See implementing methods for more detail.
Returns:array – A random sample of length batch_size.
reset_seed(seed=12345678)[source]

reset main random number generator with given seed

set_states(state)[source]

reset random state generators given the states in “states”

Parameters:states (object) – Random generator state

GridDataCollection