force-magic-parameters

You may find yourself in a situation, where you want to iterate over one or more parameters (in any FORCE module). As an example: for training machine learning models, you probably have several feature sets, which you want to train and validate. For this, you would need to generate one parameter file for each feature set.

force-magic-parameters introduces a convenient, and more automatic way to accomplish this. Based on a replacement variable defined in a main paramaterfile, multiple new parameterfiles are generated according to a vector that holds the replacement values. If multiple replacement variables are defined, all possible combinations of the vector elements are generated.

Usage

force-magic-parameters [-h] [-c {all,paired}] [-o] parameter-file

optional:
-h = show this help
-c = combination type
     all:    all combinations (default)
     paired: pairwise combinations
-o = output directory, defaults to directory of parameter-file

mandatory:
parameter-file: base parameter-file that includes replacement vectors
  • parameter-file

    Any FORCE parameterfile can be used.
  • combination type

    If this argument is not given, we will use all combinations of all replacement vectors.
    This is the same as -c all.
    If -c paired, pairwise combinations are used.
    In this case, the replacement vectors must be of the same length.

Syntax

  • The replacement variables need to be defined at the top of a main parameterfile - before the starting line, e.g. ++PARAM_TRAIN_START++

  • Multiple replacement variables can be defined

  • Each replacement variable is defined in a separate line

  • A variable is defined like this: %VAR%: 001 002 003 005 010 100 VAR can be any variable name. The values can be integers, text, filenames, etc.

  • The replacement variable is used in the main body like this: FILE_FEATURES = /data/FEATURESET_{%VAR%}.txt

Example

In force-train, let’s assume we have 6 feature sets, i.e. 6 files.

  • data/FEATURESET_001.txt

  • data/FEATURESET_002.txt

  • data/FEATURESET_003.txt

  • data/FEATURESET_005.txt

  • data/FEATURESET_010.txt

  • data/FEATURESET_100.txt

We want to perform a Random Forest Classification, and want to test different tree sizes, e.g. 100, 250, 500 and 1000.

We additionally want to test different tree depths, e.g. 5, 10 and maximal depth (0).

Thus we define following in main.prm:

%SET%:   001 002 003 005 010 100
%NTREE%: 100 250 500 1000
%DEPTH%: 5 10 0

++PARAM_TRAIN_START++

FILE_FEATURES = /data/FEATURESET_{%SET%}.txt

# other parameters omitted

FILE_MODEL = /data/FEATURESET_{%SET%}_NTREE_{%NTREE%}_DEPTH_{%DEPTH%}.xml
FILE_LOG   = /data/FEATURESET_{%SET%}_NTREE_{%NTREE%}_DEPTH_{%DEPTH%}.log

# other parameters omitted

RF_NTREE = {%NTREE%}
RF_DT_MAXDEPTH = {%DEPTH%}

# other parameters omitted

++PARAM_TRAIN_END++

Then, we use force-magic-parameters to replace the variables and generate all possible parameterfiles.

force-magic-parameters main.prm

3 replacement vectors detected
72 parameter files were generated

72 new parameterfiles were generated (6*4*3 combinations). You can now run these parameterfiles, either sequentially or parallely (if this makes sense).

# example for sequential execution
for p in *.prm; do force-train $p; done

# example for parallel execution
ls *.prm | parallel force-train {}