Configuring KIM#
The configurations are written in Python dictionary and passed into the related Python class.
Configuring data class#
Below are the parameters used to configure and initialize the Data
class:
data_params = {
"xscaler_type": "minmax", # scaler for the x (input) data
"yscaler_type": "minmax", # scaler for the y (output) data
}
xscaler_type
(str, default=’’): The type of x data scaler, eitherminmax
,normalize
,standard
,log
, or ``yscaler_type
(str, default=’’): The type of y data scaler, eitherminmax
,normalize
,standard
,log
, or ``
Configuring preliminary analysis#
Below are the parameters to configure the preliminary analysis and passed into Data.calculate_sensitivity
method.
sensitivity_params = {
"method": "pc", "metric": "it-knn",
"sst": True, "ntest": 100, "alpha": 0.05, "bins":10, "k": 3,
"n_jobs": 100, "seed_shuffle": 1234,
"verbose": 1
}
method
(str, default=’gsa’): The preliminary analysis method, including:gsa
: the pairwise global sensitivity analysis;pc
: a modified PC algorithm that include conditional indendpence test for redundancy check/filtering after gsa
metric
(str, default=’it-bins’): The metric calculating the sensitivity, including:it-bins
: the information-theoretic measures (MI and CMI) using the binning approachit-knn
: the information-theoretic measures (MI and CMI) using the k-nearest-neighbor approachcorr
: the correlation coefficient
sst
(bool, default=False): Whether to perform the statistical significance test or the shuffle testntest
(int, default=100): The number of shuffled samples in sstalpha
(float, default=0.05): The significance level used in the shuffle testbins
(int, default=10): The number of bins for each dimension when"metric"
== “it-bins”k
(int, default=3): The number of nearest neighbors when"metric"
== “it-knn”n_jobs
(int, default=-1): The number of processers/threads used byjoblib.Parallel
seed_shuffle
(int, default=1234): The random seed number for doing shuffle testverbose
(int, default=0): The verbosity level (0: normal, 1: debug)
Configuring ensemble learning#
Below are the parameters to configure the ensemble learning and passed into initializing KIM
class instance.
map_configs = {
"map_option": "many2one",
"mask_option": "cond_sensitivity",
"map_configs": map_configs,
}
map_option
(str, default=’many2one’): The option of selecting the type of mapping, including:many2one
: the knowledge-informed mapping using the preliminary analysis resultmany2many
: the original mapping without being knowledge-informed
mask_option
(str, default=’cond_sensitivity’): The option of which preliminary analysis result is used to mask the critial inputs used to estimate the outputs, including:sensitivity
: using the global sensitivity analysis result, \(\mathbf{X}^{S_1}_j\) (usingData.sensitivity_mask
)cond_sensitivity
: using both the global sensitivity analysis and the redundancy filtering, \(\mathbf{X}^{S}_j\) (usingData.cond_sensitivity_mask
)
map_configs
(dict): The configurations for the mapping, including all the arguments ofkim.Map
class except x and y. See below
map_configs = {
"model_type": MLP,
'n_model': 100,
'ensemble_type': 'ens_random',
'model_hp_choices': {
"depth": [1,3,5,6],
"width_size": [3,6,10]
},
'model_hp_fixed': {
"hidden_activation": 'sigmoid',
"final_activation": 'leaky_relu',
"model_seed": 100
},
'optax_hp_choices': {
'learning_rate': [0.01, 0.005, 0.003],
},
'optax_hp_fixed': {
'nsteps': 300,
'optimizer_type': 'adam',
},
'dl_hp_choices': {
},
'dl_hp_fixed': {
'dl_seed': 10,
'num_train_sample': 400,
'num_val_sample': 50,
'batch_size': 64
},
'ens_seed': 1024,
'training_parallel': True,
'parallel_config': {
'n_jobs': 2,
'backend': 'loky',
'verbose': 1
},
}
model_type
(type, default=’kim.mapping_model.MLP’): The type of the mapping inequinox.Module
classn_model
(int, default=1): The number of ensemble models or mappingsensemble_type
(str, default=’single’): The type of ensemble learning, includingsingle
: no ensemble with only one neural network to be trainedens_random
: generating the ensemble by performing a randomized search based on the defined hyperparameters configs inmodel_hp_choices
,optax_hp_choices
, anddl_hp_choices
ens_grid
: generating the ensemble by performing a grid search based on the defined hyperparameters configs inmodel_hp_choices
,optax_hp_choices
, anddl_hp_choices
ens_seed
(int, default=100): The random seed for generating ensemble configurations whenensemble_type
is set toens_random
training_parallel
(bool, default=True): Whether to perform parallel trainingparallel_config
(dict, default=None): The parallel training configurations following the arguments ofjoblib.Parallel
model_hp_choices
(dict, default={}): The tunable model hyperparameters, in dictionary format{key: [value1, value2,...]}
. The model hyperparameters must follow the arguments of the specifiedmodel_type
model_hp_fixed
(dict, default={}): The fixed model hyperparameters, in dictionary format{key: value}
. The model hyperparameters must follow the arguments of the specifiedmodel_type
optax_hp_choices
(dict, default={}): The tunable optimizer hyperparameters, in dictionary format{key: [value1, value2,...]}
. The optimizer hyperparameters must follow the arguments of the specifiedoptax
optimizer. Hyperparameters that must be provided areoptimizer_type
(str),nsteps
(int), andloss_func
(callable), unless they are provided inoptax_hp_fixed
optax_hp_fixed
(dict, default={}): The fixed optimizer hyperparameters, in dictionary format{key: value}
. The optimizer hyperparameters must follow the arguments of the specifiedoptax
optimizer. Hyperparameters that must be provided areoptimizer_type
(str),nsteps
(int), andloss_func
(callable), unless they are provided inoptax_hp_choices
dl_hp_choices
(dict, default={}): The tunable dataloader hyperparameters, in dictionary format{key: [value1, value2,...]}
. The optimizer hyperparameters must follow the arguments ofkim.mapping_model.dataloader_torch.make_big_data_loader
. Hyperparameters that must be provided arebatch_size
(int) andnum_train_sample
(int), unless they are provided indl_hp_fixed
dl_hp_fixed
(dict, default={}): The fixed dataloader hyperparameters, in dictionary format{key: value}
. The optimizer hyperparameters must follow the arguments ofkim.mapping_model.dataloader_torch.make_big_data_loader
. Hyperparameters that must be provided arebatch_size
(int) andnum_train_sample
(int), unless they are provided indl_hp_fixed