This page describes the organization of the simrisc configuration files. These
files are formatted like standard unix configuration files. Lines are
interpreted after removing initial white-space (blanks and tabs) and after
removing all characters from lines starting at the first #
character: this
is considered comment and is ignored. If a line (not containing a #
character) ends in a backslash (\
), then the next line (initial
white-space removed) is appended to the current line.
Note that all parameter identifiers are interpreted case sensitively. E.g.,
Costs:
is a different parameter than costs:
. The numeric values used
in this man-page are for illustration purposes only. Some restrictions apply
though: standard deviations cannot be negative; proportions and probabilities
must lie in the range 0..1; multiple probabilities (like the ones used for
breast densities) must add up to 1; etc. If restrictions apply then they are
mentioned at the various parameter descriptions below.
The configuration file provided in the simrisc distribution is
/usr/share/doc/simrisc/simrisc.gz
.
Usually this file is unzipped by the user to the user's ~/.config
directory:
gunzip < /usr/share/doc/simrisc/simrisc.gz > ~/.config/whereafter
~/.config/simrisc
can be edited to contain local
modifications.
Various parameters specify probability distributions. Usually the Normal
distribution is specified. The program also recognizes the LogNormal
and
Uniform
distributions, and uses the Beta
distribution when handling
parameter variations of the beta parameter used for lung cancer simulations
(note that the similarity of the names beta
(the parameter) and Beta
(the distribution) is sheer accidentally).
Parameter specifications start with keywords, followed by a colon. All keywords are covered below. The format of the specifications is fixed, but empty lines and white space may be used to improve the specifications' readabilities.
Parameter specifications starting with uppercase letters (like Scenario:
)
specify (sub)sections and contain no additional specifications. Specifications
starting with lowercase letters (like ageGroup:
) are followed by actual
parameter values.
The configuration file must define all parameters of all configuration sections, but configuration parameters can be modified using a separate analysis file or they can be modified by command-line parameters.
Several section namess are optional, e.g., `Scenario:
'. In this manual
page the label `(opt.)
' is appended to the names of those sections. In
actual confiuration specifications they can stil be used, but they can also
completely be omitted.
This section may start with a line containing Scenario:
and specifies
some general parameters of the simulation process. The default configuration
file contains the following specifications:
spread: false
true
parameter spreading is used;
iterations: 1
generator: fixed
fixed
the modes random
and increasing
are
available.simrisc's
random number
generators are initialized. When mode fixed
is used the random
number generators are initialized with seed's
value; mode
random
results in the random number generators being initialized
by randomly selected seeds and seed
(below) is not used; mode
increasing
results in incrementing the seeds of the random number
generators by a fixed increment at each iteration;
seed: 1
generator: random
was specified;
cases: 1000
death: ...
death:
parameter may either be followed by the path to a file
(if its initial character is a tilde (~) it is replaced by the user's
home directory; if it's a plus (+) it is replaced by the base
directory (specified with the --base
option, see the
simrisc(1) man-page)), or it must be followed by 101 cumulative
death proportions, where each line starts with <nr>:
, where
<nr>
is the next order number to read. The default configuration
file specifies:
death: 1: .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 11: .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 21: .00000 .00014 .00028 .00042 .00056 .00070 .00106 .00142 .00178 .00214 31: .00250 .00382 .00514 .00646 .00778 .00910 .01118 .01326 .01534 .01742 41: .01950 .02312 .02674 .03036 .03398 .03760 .04414 .05068 .05722 .06376 51: .07030 .07776 .08522 .09268 .10014 .10760 .11564 .12368 .13172 .13976 61: .14780 .15718 .16656 .17594 .18532 .19470 .20658 .21846 .23034 .24222 71: .25410 .27560 .29710 .31860 .34010 .36160 .39368 .42576 .45784 .48992 81: .52200 .56240 .60280 .64320 .68360 .72400 .75826 .79252 .82678 .86104 91: .89530 .90962 .92394 .93826 .95258 .96690 .97104 .97518 .97932 .98346 101: .98760which are the 101 cumulative death proportions used for breast cancer simulations.
If the 101 cumulative death proportions are available in, e.g., the
user's .config/
directory as the file cumdeath
then the
specification could have been:
death: ~/.config/cumdeathwhich might be convenient when using different values in
Analysis:
specifications (see section ANALYSES
in the simrisc(1)
man-page).
This section may start with a line containing Costs:
and specifies several
parameters used for cost-calculations. Modality-specific cost parameters are
specified in Section Modalities:
(see below). The default configuration
file specifies:
biop: 176
diameters: 0: 6438 20: 7128 50: 7701
diameter: cost
values specifying the treatment cost
starting at the specified tumor diameter, up to the next pair's
diameter (if specified) or all diameters starting at the diameter
specified at the last pair. The first diameter must be 0. The
second value of each pair specifies the (non-negative) treatment costs
for that age-group.
Discount: (opt.)
discount:
contains two values specifying, respectively, the
discount proportions for breast- and lung-cancer simulations. When
only one proportion is specified it represents the proportion of the
actually used simulation type (i.e., breast- or lung-cancer); the line
age:
specifies the discount's starting age (for both simulation
types):
Discount: # breast lung proportion: 0 .04 age: 50
This section starts with a line containing BreastDensities:
which are used
with breast-cancer simulations. It defines breast density values for various
age groups, covering ages 0 through the maximum age for simulated cases. The
default configuration file contains the following specifications:
# bi-rad: a b c d ageGroup: 0 - 40 0.05 0.30 0.48 0.17 40 - 50 0.06 0.34 0.47 0.13 50 - 60 0.08 0.50 0.37 0.05 60 - 70 0.15 0.53 0.29 0.03 70 - * 0.18 0.54 0.26 0.02Optionally, each line may start with an
ageGroup:
specification, which
is required for the first specification line.*
can be used, indicating that all ages at or above the last age group's
begin age are handled by that group.
This section may start with a line containing Modalities:
and specifies
cancer-scanning modalities. Currently three modalities are supported for
breast-cancer simulations: Mammo, Tomo, MRI
, and one modality is supported
for lung-cancer simulations: CT
.
Some modalities specify age groups, which are (like the age ranges used for
breastDensities
) half-open ranges: they start at their first ages, and end
at (not including) their second-ages, while subsequent age ranges must
connect. Also, the last age group may use the end-age specification *
.
The default configuration file contains (below the line Modalities:
) the
following specifications (if modalities aren't used their specifications are
optional):
CT:
CT
modality is used when performing lung cancer
simulations. The screening costs are used instead of the value
configured at the Costs: biop:
specification. The diagnosis:
costs specifies the costs of performing a CT-scan. The M0
and
M1
values specify the costs when, respectively, no metastatis or a
matastasis has been detected.
The radiation dose of CT
scans is configured at the CT: dose:
specification. The sensitivity depends on the tumor diameter. For
tumor sizes between 3 and 5 mm. the sensitivity is computed using the
formula (.5 * diameter - 1..5) * 100
(e.g., 50% for tumors of 4
mm.)
The default configuration file contains the following CT
specifications:
CT: # screening diagnosis M0 M1 (M0, M1: Table S3) costs: 176 1908 37909 56556 dose: 1 # diam. value (must be integral 0..100 or -1) sensitivity: 0 - 3: 0 3 - 5: -1 # formula: (.5 * diam - 1.5) * 100 5 - *: 100 # mean stddev dist specificity: .992 .076 NormalOptionally, each
sensitivity
specification line may start with a
sensitivity:
specification, which is required for the first
specification line.
Mammo:
Mammo
modality the costs, radiation doses and m:
parameter specifications per bi-rad category, specificity
probabilities for age groups, the parameters of the beta-function, and
the systematic error probability must be specified.Mammo
sensitivity is computed using the beta-function published
by Isheden and Humphreys (2017, Statistical Methods in Medical
Research, 28(3), 681-702). From a randomly generated probability and
the case's age the case's bi-rad category is determined and that
category is then used to select the m-parameter that is used in the
beta-function.Mammo
specifications:
Mammo: costs: 64 # bi-rad: a b c d dose: 3 3 3 3 m: .136 .136 .136 .136 # ageGroups: specificity: 0 - 40: .961 40 - *: .965 # 1 2 3 4 beta: -4.38 .49 -1.34 -7.18 systematicError: 0.1
Tomo:
Tomo
modality the costs, radiation doses per bi-rad
category, sensitivity probabilities per bi-rad category, and
specificity probabilities for age groups must be specified.Tomo
specifications:
Tomo: costs: 64 # bi-rad: a b c d dose: 3 3 3 3 sensitivity: .87 .84 .73 .65 # ageGroups: specificity: 0 - 40: .961 40 - *: .965
MRI:
MRI
modality the costs, and the sensitivity and
specificity probabilities must be specified.MRI:
) the
following specifications:
costs: 280 # proportion: sensitivity: .94 specificity: .95
This section may start with a line containing Screening:
and it specifies
the ages at which screenings are performed, the used screening
modality/modalities for each of the used screening ages, and the probabilities
that screening rounds are attended. If no screening rounds should be used then
specify a single round-specification line:
round: noneOtherwise, the first screening round must start with the keyword
round:
followed by an age, which in turn is followed by a list of at least
one space delimited modality specification. Subsequent screening round
specifications may optionally start with the keyword round:
. Currently
Mammo, Tomo, MRI
and CT
are available. Mammo, Tomo,
and MRI
can be specified when performing breast-cancer simulations, CT
can be
specified when performing lung-cancer simulations. The default configuration
file contains the following round
specifications:
# round: 50 CT # 52 CT # 54 CT # 56 CT # 58 CT # 60 CT # 62 CT # 64 CT # 66 CT # 68 CT # 70 CT # 72 CT # 74 CT round: 50 Mammo 52 Mammo 54 Mammo 56 Mammo 58 Mammo 60 Mammo 62 Mammo 64 Mammo 66 Mammo 68 Mammo 70 Mammo 72 Mammo 74 Mammo
The probability that a case will attend a screening round is specified by
the attendanceRate:
parameter:
# probability: attendanceRate: .8
This section may start with a line containing Tumor:
and it specifies the
characteristics of tumors. Several of the parameters in this section can be
varied by specifying spread: true
in the section Scenario:
, in which
case statistical variations are applied to these parameters.
Supported distributions are Normal, Uniform, LogNormal,
and (for the
lung-cancer's Beir7
parameters) the Beta
distribution. If
value
is the specified value
parameter value, and spread
the
specified spread
parameter then the values that are actually used during
the simulations are:
Normal
distribution N(mean, stddev)
:
N(value, spread)
Uniform
distribution U(begin, end)
:
U(value - spread / 2, value + spread / 2)
LogNormal
distribution L(mean, stddev)
:
L(value, spread)
Beta
distribution is used when requestin lung cancer
simulations. For male cases the 95% confidence intervals for the
beta
parameters ranges from .15 to .70, for female cases it ranges
from .94 to 2.10, and values drawn from these distributions are used
when spread: true
has been specified (see also section BETA
DISTRIBUTIONS).
The spread
parameters may not be negative. If spread
values are
configured then their distributions must also be specified. If spread
is
not specified, then the value
parameter won't vary if spread: true
is
specified in the Scenario
section. The same holds true for the Beta
distribution: if no spreading should be applied, even though spread: true
was specified, then the Beta
distribution's specificatins should be
omitted.
The Tumor:
section has four subsections: Beir7:, Growth, Incidence:,
Survival:,
and S3:
. They contain the following parameter specifications:
Beir7:
BEIR (tumor induction) parameters: only tumor induction type 7 (i.e., beir7) is used. The default configuration file contains specifications for breast cancer simulations and for male and female lung cancer simulations:
# eta beta spread dist. breast: -2.0 0.51 0.32 Normal # Beta-distribution parameters: # LC: eta beta dist constant factor aParam bParam male: -1.4 .32 Beta .234091 1.72727 2.664237 5.184883 female: -1.4 1.40 Beta .744828 .818966 3.366115 4.813548If
spread: true
is specified then the actually used beta
parameters
are drawn from their respective distributions.
See also National Research Council. 2006. Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2. Washington, DC: The National Academies Press (https://doi.org/10.17226/11340).
Growth:
Tumor growth specifications consist of three elements: start diameters, self-detect parameterss and doubling time specifications.
The start
parameters define the start diameters (in millimeters) of
emerging tumors used with respectively, breast and lung cancer
simulations. The default configuration file specifies
# breast lung start: 5 3
The default configuration file contains the following specifications of the self-detect parameters for breast and lung cancer simulations:
#selfDetect: # stdev mean spread dist breast: 2.01375 18.5413 2.31637 Normal lung: 1.0141 20.8426 1.84043 Normal
Four parameters are used when determining the diameter at which self-detection is possible. These parameters are:
stdev
) used by the lognormal
distribution to compute the diameter at which self-detection
occurs. This parameter is required and cannot be negative;
mean
(see below) used by the lognormal distribution. This
parameter is required and cannot be negative. Its value will vary
using the following two parameters if spread: true
was
specified;
spread: true
was specified. It can be omitted
in which case the mean won't vary;
The actually used self-detect diameter is computed using:
diameter = L(mean, stdev)
Finally, the Growth:
subsection also defines tumor doubling times for
various age groups when using breast cancer simulations and for all ages
when using lung cancer simulations.
Doubling times are computed like the self-detect diameters, i.e., using
millimeters and lognormal distributions. Thus, for each age group and for
the lung cancer simulation four parameters are specified (of which the
final two are optional): the standard deviation of the lognormal
distribution, the mean value of the lognormal distribution, and the spread
and name of the distribution that is used when spread: true
was
specified.
The age groups (used with breast cancer simulations) must cover ages 0
through the maximum age for simulated cases, and are specified as
described at section BreastDensities:
. The default configuration file
contains the following specifications:
DoublingTime: # stdev mean spread dist. ageGroup: 1 - 50: 1.84043 79.838 1.53726 Normal 50 - 70: 1.29693 157.591 1.1853 Normal 70 - * : 1.56831 188.67 1.2586 Normal # all ages stdev mean spread dist. lung: 1.23368 98.4944 2.09594 NormalOptionally, each
ageGroup
line may start with an ageGroup:
specification, which is required for the first specification line.Incidence: (opt.)
For breast cancer simulations three carrier types are supported:
Normal, BRCA1
and BRCA2
. Each having a probability of
occurrence. The probabilities of these carriers must add to 1. In the
default configuration file BRCA1
and BRCA2
are specified, but
their probabilties are set to 0, in which case their specifications can
also be removed from configuration files.
Each carrier is identified by name (i.e., when performing breast cancer
simulations Breast:, BRCA1:
, and BRCA2
; when performing lung
cancer simulations: Male:
and Female:
) followed by their parameter
specifications:
The lifetime risk, mean age and standard deviation parameters may
optionally be followed by the standard deviation (spread) and distribution
used to vary the probability when spread: true
is specified;
The default configuration file contains these specifications:
Male: # value spread distr. lifetimeRisk: .22 .005 Normal meanAge: 72.48 1.08 Normal stdDev: 9.28 1.62 Normal Female: # value spread distr. lifetimeRisk: .20 .004 Normal meanAge: 69.62 1.49 Normal stdDev: 9.73 1.83 Normal Breast: probability: 1 # value spread distr. lifetimeRisk: .226 .0053 Normal meanAge: 72.9 .552 Normal stdDev: 21.1 BRCA1: probability: 0 # value spread distr. lifetimeRisk: .96 meanAge: 53.9 stdDev: 16.51 BRCA2: probability: 0 # value spread distr. lifetimeRisk: .96 meanAge: 53.9 stdDev: 16.51Instead of specifying the parameters of a distribution, a
riskTable
can also be used. If a riskTable
is specified at a category (i.e.,
Male: .. BRCA2:
) then the lifetimeRisk, meanAge,
and stdDev
parameters are ignored. A riskTable
specification contains pairs of
values. The first value of a pair specifies an age, the second value the
probability of a tumor developing until that age. Both ages and probabilities
must be cumulative and at least two pairs must be specified.
Unless specified in the riskTable
specification itself simrisc adds a pair
0, .00
at the beginning, and an age specification 100
at the end. Its
cumulative tumor probability is computed using linear extrapolation of the two
age values before age 100, using a maximum value of 1.00. For ages in between
pairs of age values linear interpolation is used, using the surrounding age
specifications. Here is a (fictitious) example of a riskTable
specification in the Male:
category:
Male: riskTable: 40 .01 50 .1 55 .15 60 .22 65 .55 70 .62 75 .67
Survival: (opt.)
For breast cancer simulations four types of survival parameters must be
specified. Each type (a..d) specifies a mean, and (optionally) a
spread and distribution (which are used when spread: true
has been
specified). The default configuration file specifies:
# value spread dist: type: a .00004475 .000004392 Normal b 1.85867 .0420 Normal c -.271 .0101 Normal d 2.0167 .0366 Normal e 1.00 .01 NormalOptionally, each line may start with a
type:
specification, which
is required for the first specification line.
The e
parameters can be used to estimate relative survival if not enough
data are available (and can reflect the quality of care). The value 1
indicates that enough data are available to estimate survival using the a
to d
parameters. If not enough data are available the e
parameter can
be used to adjust the survival estimate to correct the use of the a
to
f
parameters to the quality level of the medical care relative to a
country for which the a
to d
parameters are available. E.g., in Dutch
breast cancer research studying the survival in Indonesia the value 0.9 was
used since using that factor in combination with the provided a
to d
parameter values the correct survival probabilities for Indonesia were
obtained.
When performing breast cancer simulations TNM indices (cf. the description of
the S3 table below) are also determined. With breast cancer simulations the
second TNM value is always 0, and the first TNM value is, as with table S3,
determined by the tumor's diameter. The default configuration file contains
the following bc:
specification (see also option --tnm
):
# BC TNM categories thru (<=) diameters (mm): bc: 20 50 * # TNM: T1 T2 T3(cf. https://www.cancerresearchuk.org/about-cancer/breast-cancer/stages-types-grades/tnm-staging).
For lung cancer simulations table S4
is used to determine the a..e
parameters. Table S4
contains four categories (lung0..lung3) defining
these a..e parameters, where (for a known cancer's diameter) the category is
randomly determined using table S3
(see below).
Table S4
is appended to the breast cancer specifications. The default
configuration file contains the following specifications of table S4
:
# table S4: 4 columns per a..d parameter # lungX: X is table S4's column index lung0: a .00143 .00095 Normal b 1.84559 .33748 Normal c -.22794 .07823 Normal d 1.06799 .16226 Normal e 1.00 .01 Normal lung1: a .01530 .00381 Normal b 1.69434 .10979 Normal c -.19358 .02105 Normal d .66690 .03869 Normal e 1.00 .01 Normal lung2: a .78600 .29815 Normal b .69791 .05425 Normal c .0 .0 Normal d .0 .0 Normal e 1.00 .01 Normal lung3: a 1.25148 .32305 Normal b .77852 .34149 Normal c .0 .0 Normal d .0 .0 Normal e 1.00 .01 NormalOptionally, each subsequent line following the first
lung0: .. lung3:
line may repeat its lung1: .. lung3:
label.
S3: (opt.)
With lung cancer specifications tables S3
and S4
are used to determine
the survival parameters. The tumor's diameter determines the row of table S3,
and then its column is randomly determined using the probabilities listed in
S3's rows. For each row the probabilities must sum to 1. Once the S3 column
has been determined the column index which of the lungX:
specifications is
used. The row and column indices are 0-based. E.g., if a tumor diameter is 24,
then row 2 (diameter <= 30) is selected. Then, if the random value is .630,
column 1 is used (column N1-3,M0). Whenever a tumor is present these pairs of
indices are reported in the comma-separated data file in the column marked as
TNM
, using an entry like 2,1
.
The default configuration file contains the following table S3
:
S3: # diameter (mm) # T-row <= N0,M0 N1-3,M0 N1-3,M1a-b N0-3M1c prob: 10: .756 .157 .048 .039 # T1a,b 20: .703 .197 .055 .045 # T1b 30: .559 .267 .095 .078 # T1c 50: .345 .351 .167 .137 # T2a,b 70: .196 .408 .218 .178 # T3 *: .187 .347 .256 .210 # T4Optionally, each line may start with a
prob:
specification, which
is required for the first specification line.
Values generated from Beta distributions range between 0 and 1 (cf. https://en.wikipedia.org/wiki/Beta_distribution). The Beta distribution is computed using two Gamma distributions (cf. https://www.fmrib.ox.ac.uk/datasets/techrep/tr03tb1/tr03tb1/node24.html, https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution):
gamma1 = Gamma(aParam, 1) Beta(aParam, bParam) = gamma1 / (gamma1 + Gamma(bParam, 1))When using lung cancer simulations the 95% confidence interval (CI) for male cases ranges from .15 to .70, with a mean value of .32, and for women ranging from .94 to 2.10 with a mean value of 1.40.
The male and female CI ranges are transformed to .025 to .975 ranges using
linear transformations. To transform values x
from the male range to the
.025 to .975 range the transformation y = 1.72727 * x - .234091
is used,
and to transform back the transformation (y + .234091) / 1.72727
is used.
For the female CI the transformations are y = .818966 * x - .744828
and
(y + .744828) / .818966
.
The aParam
and bParam
values are determined by first generating 1000
values so that their CI span the range 0.025 to .975, with a mean value of
0.318635 (for male cases) and 0.401724 (for female cases). Next the parameters
of the corresponding beta distribution were estimated using maximum likelihood
fitting, resulting in aParam = 2.664237
and bParam = 5.184883
for the
distribution used with male lungcancer simulations, and aParam = 3.366115
and bParam = 4.813548
for the distribution used with female lungcancer
simulations.
The default configuration file shows these values at the Beir7 beta parameters.
Parameters can be respecified by defining a separate parameter configuration
file or by providing alternate parameter specifications in Analysis:
sections of the program's input file, or by providing alternative parameter
specifications as command-line arguments (cf. the simrisc(1) man-page)
Configuration files
~/.config/simrisc
: the default location of the program's
configuration file;
simrisc-VERSION/stdconfig/simrisc
, where VERSION
is
replaced by simrisc's actual release version;
.deb
files) the default configuration file is commonly available
as /usr/shared/doc/simrisc/simrisc.gz
simrisc(1)
Versions before version 15.03.00 should not be used for lung cancer simulations.