Evaluation module

iterations after which to evaluate on validation set (dtype=int)

--test_each 2500

iterations after which to create a checkpoint (dtype=int)

--save_each None

iteration after which to create a plot (dtype=int)

--plot_each 2500

test size (dtype=int)

--test_size 1

whether to perform validation on the full images (dtype=bool)

--perform_full_image_validation True

save only labels and no probability distributions (dtype=bool)

--only_save_label False

always pick other random samples for validation (dtype=bool)

--validate_same True

number times we want to evaluate one volume; this only makes sense using a keep rate of less than 1 during evaluation (dropout_during_evaluation less than 1) (dtype=int)

--evaluate_uncertainty_times 1

keep rate of weights during evaluation; useful to visualize uncertainty in conjunction with a number of samples per volume (dtype=float)

--evaluate_uncertainty_dropout 1.0

Save each evaluation sample per volume. Without this flag, only the standard deviation and mean over all samples is kept. (dtype=bool)

--evaluate_uncertainty_saveall False

Supervised evaluation

class mdgru.eval.SupervisedEvaluation(modelcls, datacls, kw)[source]

Bases: object

Handler for the evaluation of model defined in modelcls using data coming from datacls.

Parameters:
  • kw (dict containing the following options.) –
    • dropout_rate [default: 0.5] “keep rate” for weights using dropconnect. The higher the value, the closer the sampled models to the full model.
    • namespace [default: default] override default model name (if no ckpt is provided). Probably not a good idea!
    • only_save_labels [default: False] save only labels and no probability distributions
    • validate_same [default: True] always pick other random samples for validation!
    • evaluate_uncertainty_times [default: 1] Number times we want to evaluate one volume. This only makes sense using a keep rate of less than 1 during evaluation (dropout_during_evaluation less than 1)
    • evaluate_uncertainty_dropout [default: 1.0] Keeprate of weights during evaluation. Useful to visualize uncertainty in conjunction with a number of samples per volume
    • evaluate_uncertainty_saveall [default: False] Save each evaluation sample per volume. Without this flag, only the standard deviation and mean over all samples is kept.
    • show_f05 [default: True]
    • show_f1 [default: True]
    • show_f2 [default: True]
    • show_l2 [default: True]
    • show_cross_entropy [default: True]
    • print_each [default: 1] print execution time and losses each # iterations
    • batch_size [default: 1] Minibatchsize
    • datapath path where training, validation and testing folders lie. Can also be some other path, as long as the other locations are provided as absolute paths. An experimentsfolder will be created in this folder, where all runs and checkpoint files will be saved.
    • locationtraining [default: None] absolute or relative path to datapath to the training data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • locationtesting [default: None] absolute or relative path to datapath to the testing data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • locationvalidation [default: None] absolute or relative path to datapath to the validation data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • output_dims number of output channels, e.g. number of classes the model needs to create a probability distribution over.
    • windowsize window size to be used during training, validation and testing, if not specified otherwise
    • padding [default: [0]] padding to be used during training, validation and testing, if not specified otherwise. During training, the padding specifies the amount a patch is allowed to reach outside of the image along all dimensions, during testing, it specifies also the amount of overlap needed between patches.
    • windowsizetesting [default: None] override windowsize for testing
    • windowsizevalidation [default: None]
    • paddingtesting [default: None] override padding for testing
    • paddingvalidation [default: None]
    • testbatchsize [default: 1] batchsize for testing
  • modelcls (cls) – Python class defining the model to evaluate
  • datacls (cls) – Python class implementing the data loading and storing
_defaults = {'batch_size': {'help': 'Minibatchsize', 'name': 'batchsize', 'short': 'b', 'type': <class 'int'>, 'value': 1}, 'datapath': {'help': 'path where training, validation and testing folders lie. Can also be some other path, as long as the other locations are provided as absolute paths. An experimentsfolder will be created in this folder, where all runs and checkpoint files will be saved.'}, 'dropout_rate': {'help': '"keep rate" for weights using dropconnect. The higher the value, the closer the sampled models to the full model.', 'value': 0.5}, 'evaluate_uncertainty_dropout': {'help': 'Keeprate of weights during evaluation. Useful to visualize uncertainty in conjunction with a number of samples per volume', 'name': 'dropout_during_evaluation', 'type': <class 'float'>, 'value': 1.0}, 'evaluate_uncertainty_saveall': {'help': 'Save each evaluation sample per volume. Without this flag, only the standard deviation and mean over all samples is kept.', 'name': 'save_individual_evaluations', 'value': False}, 'evaluate_uncertainty_times': {'help': 'Number times we want to evaluate one volume. This only makes sense using a keep rate of less than 1 during evaluation (dropout_during_evaluation less than 1)', 'name': 'number_of_evaluation_samples', 'type': <class 'int'>, 'value': 1}, 'locationtesting': {'help': 'absolute or relative path to datapath to the testing data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+', 'value': None}, 'locationtraining': {'help': 'absolute or relative path to datapath to the training data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+', 'value': None}, 'locationvalidation': {'help': 'absolute or relative path to datapath to the validation data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+', 'value': None}, 'namespace': {'alt': ['modelname'], 'help': 'override default model name (if no ckpt is provided). Probably not a good idea!', 'value': 'default'}, 'only_save_labels': {'help': 'save only labels and no probability distributions', 'value': False}, 'output_dims': {'alt': ['nclasses'], 'help': 'number of output channels, e.g. number of classes the model needs to create a probability distribution over.', 'type': <class 'int'>}, 'padding': {'help': 'padding to be used during training, validation and testing, if not specified otherwise. During training, the padding specifies the amount a patch is allowed to reach outside of the image along all dimensions, during testing, it specifies also the amount of overlap needed between patches.', 'nargs': '+', 'short': 'p', 'type': <class 'int'>, 'value': [0]}, 'paddingtesting': {'help': 'override padding for testing', 'nargs': '+', 'type': <class 'int'>, 'value': None}, 'paddingvalidation': None, 'print_each': {'help': 'print execution time and losses each # iterations', 'type': <class 'int'>, 'value': 1}, 'show_cross_entropy': True, 'show_f05': True, 'show_f1': True, 'show_f2': True, 'show_l2': True, 'testbatchsize': {'help': 'batchsize for testing', 'value': 1}, 'validate_same': {'help': 'always pick other random samples for validation!', 'invert_meaning': 'dont_', 'value': True}, 'windowsize': {'help': 'window size to be used during training, validation and testing, if not specified otherwise', 'nargs': '+', 'short': 'w', 'type': <class 'int'>}, 'windowsizetesting': {'help': 'override windowsize for testing', 'nargs': '+', 'type': <class 'int'>, 'value': None}, 'windowsizevalidation': None}
_load(f)[source]

Load model in current framework from f

Parameters:f (location of stored model) –
_predict(batch, dropout, testing)[source]

Predict given batch and keeprate dropout.

Parameters:
  • batch (ndarray) –
  • dropout (float) – Keeprate for dropconnect
  • testing
Returns:

ndarray (Prediction based on data batch)

_predict_with_loss(batch, batchlabs)[source]

Predict for given batch and return loss compared to labels in batchlabs

Parameters:
  • batch (image data) –
  • batchlabs (corresponding label data) –
Returns:

tuple of ndarray prediction and losses

_save(f)[source]

Save to file f in current framework

Parameters:f (location to save model at) –
_set_session(sess, cachefolder)[source]
_train()[source]

Performs one training iteration in respective framework and returns loss(es)

add_summary_simple_value(text, value)[source]
get_globalstep()[source]

Return number of iterations this model has been trained in

Returns:int (iteration count)
load(f)[source]

loads model at location f from disk

Parameters:f (str) – location of stored model
save(f)[source]

saves model to disk at location f

Parameters:f (str) – location to save model to
set_session(sess, cachefolder, train=False)[source]
test_all_available(batch_size=None, dc=None, return_results=False, dropout=None, testing=False)[source]

Completely evaluates each full image in tps using grid sampling.

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • return_results (bool) – should results be returned or stored right away?
  • dropout (float) – keeprate of dropconnect for inference
  • testing
Returns:

either tuple of predictions and errors or only errors, depending on return_results flag

test_all_random(batch_size=None, dc=None, resample=True)[source]

Test random samples

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • resample (bool) – indicates if we need to sample before evaluating
Returns:

tuple of loss and prediction ndarray

test_scores(pred, ref)[source]

Evaluates all selected scores between reference data ref and prediction pred.

Parameters:
  • pred (ndarray) – prediction, as probability distributions per pixel / voxel
  • ref (ndarray) – labelmap, either as probability distributions per pixel / voxel or as label map
train()[source]

Measures and logs time when performing data sampling and training iteration.

TensorFlow backend

PyTorch backend