API Reference#

kim.pre_analysis Modules#

pairwise_analysis(xdata: Array, ydata: Array, metric_calculator: MetricBase, sst: bool = False, ntest: int = 100, alpha: float = 0.05, n_jobs: int = -1, seed_shuffle: int = 1234, verbose: int = 0)[source]#

Perform the pairwise analysis using either mutual information or correlation coefficient.

Parameters:
  • xdata (array-like) – the predictors with shape (Ns, Nx)

  • ydata (array-like) – the predictands with shape (Ns, Ny)

  • metric_calculator (class) – the metric calculator

  • sst (bool) – whether to perform statistical significance test. Defaults to False.

  • ntest (int) – number of shuffled samples in sst. Defaults to 100.

  • alpha (float) – the significance level. Defaults to 0.05.

  • n_jobs (int) – the number of processers/threads used by joblib. Defaults to -1.

  • seed_shuffle (int) – the random seed number for doing shuffle test. Defaults to 1234.

  • verbose (int) – the verbosity level (0: normal, 1: debug). Defaults to 0.

Returns:

the sensitivity, the sensitivity mask

Return type:

(array, array)

pc(xdata: Array, ydata: Array, metric_calculator: MetricBase, cond_metric_calculator: MetricBase, ntest: int = 100, alpha: float = 0.05, Ncond_max: int = 3, n_jobs: int = -1, seed_shuffle: int = 1234, verbose: int = 0)[source]#

The modified PC algorithm adapted to the X –> Y mapping problem.

Parameters:
  • xdata (array-like) – the predictors with shape (Ns, Nx)

  • ydata (array-like) – the predictands with shape (Ns, Ny)

  • metric_calculator (class) – the metric calculator for unconditional relation

  • cond_metric_calculator (class) – the metric calculator for conditional relation

  • ntest (int) – number of shuffled samples in sst. Defaults to 100.

  • alpha (float) – the significance level. Defaults to 0.05.

  • Ncond_max (int) – the maximum number of conditions used by cond_metric_calculator. Defaults to 3.

  • n_jobs (int) – the number of processers/threads used by joblib. Defaults to -1.

  • seed_shuffle (int) – the random seed number for doing shuffle test. Defaults to 1234.

  • verbose (int) – the verbosity level (0: normal, 1: debug). Defaults to 0.

Returns:

the sensitivity, the sensitivity mask, the conditional sensitivity mask

Return type:

(array, array, array)

kim.data.Data Class#

class Data(xdata: Array | None = None, ydata: Array | None = None, fdata: PosixPath | None = None, xscaler_type: str = '', yscaler_type: str = '')[source]#

The Data object.

xdata#

the copy of xdata

Type:

array-like

ydata#

the copy of ydata

Type:

array-like

Ns#

the number of samples

Type:

int

Nx#

the number of predictors

Type:

int

Ny#

the number of predictands

Type:

int

xscaler_type#

the type of xdata scaler, either ‘minmax’, ‘normalize’, ‘standard’, or ‘log’

Type:

str

yscaler_type#

the type of ydata scaler, either ‘minmax’, ‘normalize’, ‘standard’, or ‘log’

Type:

str

xscaler#

the xdata scaler

Type:

str

yscaler#

the ydata scaler

Type:

str

sensitivity_config#

the sensitivity analysis configuration

Type:

dict

sensitivity_done#

whether the sensitivity analysis is performed

Type:

bool

sensitivity#

the calculated sensitivity with shape (Nx, Ny)

Type:

array-like

sensitivity_mask#

the calculated sensitivity mask with shape (Nx, Ny)

Type:

array-like

cond_sensitivity_mask#

the calculated conditional sensitivity mask with shape (Nx, Ny)

Type:

array-like

__init__(xdata: Array | None = None, ydata: Array | None = None, fdata: PosixPath | None = None, xscaler_type: str = '', yscaler_type: str = '')[source]#

Initialization function.

Parameters:
  • xdata (array-like) – the predictors with shape (Ns, Nx)

  • fdata (PosixPath) – the root path where an existing data instance will be loaded

  • ydata (array-like) – the predictands with shape (Ns, Ny)

  • xscaler_type (str) – the type of xdata scaler, either minmax, normalize, standard, log, or ``

  • yscaler_type (str) – the type of ydata scaler, either minmax, normalize, standard, log, or ``

calculate_sensitivity(method: str = 'gsa', metric: str = 'it-bins', sst: bool = False, ntest: int = 100, alpha: float = 0.05, bins: int = 10, k: int = 5, n_jobs=-1, seed_shuffle: int = 1234, verbose: int = 0)[source]#
Calculate the sensitivity between self.xdata and self.ydata using either pairwise_analysis or pc method.

The results are updated in self.sensitivity_done, self.sensitivity, self.sensitivity_mask, and self.cond_sensitivity_mask.

Parameters:
  • method (str) – The preliminary analysis method, including: gsa: the pairwise global sensitivity analysis pc: a modified PC algorithm that include conditional indendpence test after gsa Defaults to gsa.

  • metric (str) – The metric calculating the sensitivity, including: it-bins: the information-theoretic measures (MI and CMI) using binning approach it-knn: the information-theoretic measures (MI and CMI) using knn approach corr: the correlation coefficient Defaults to corr.

  • sst (bool) – Whether to perform the statistical significance test or the shuffle test. Defaults to False.

  • ntest (int) – The number of shuffled samples in sst. Defaults to 100.

  • alpha (float) – The significance level. Defaults to 0.05.

  • bins (int) – The number of bins for each dimension when metric == “it-bins”. Defaults to 10.

  • k (int) – The number of nearest neighbors when metric == “it-knn”. Defaults to 5.

  • n_jobs (int) – The number of processers/threads used by joblib.Parallel. Defaults to -1.

  • seed_shuffle (int) – The random seed number for doing shuffle test. Defaults to 5.

  • verbose (int) – The verbosity level (0: normal, 1: debug). Defaults to 0.

load(rootpath: PosixPath = PosixPath('.'), check_xy: bool = True, overwrite: bool = False)[source]#
load data and sensitivity analysis results from specified location, including:
  • data (x, y) and scaler

  • sensitivity analysis configuration

  • sensitivity analysis results

Parameters:

rootpath (PosixPath) – the root path where data will be loaded

save(rootpath: PosixPath = PosixPath('.'))[source]#
Save data and sensitivity analysis results to specified location, including:
  • data (x, y) and scaler

  • sensitivity analysis configuration

  • sensitivity analysis results

Parameters:

rootpath (PosixPath) – the root path where data will be saved

property xdata_scaled#

Perform normalization on self.xdata based on the given normalization type self.xscaler_type.

Returns:

the scaled self.xdata

Return type:

array-like

property ydata_scaled#

Perform normalization on self.ydata based on the given normalization type self.yscaler_type.

Returns:

the scaled self.ydata

Return type:

array-like

kim.map.KIM Class#

class KIM(data: Data, map_configs: dict, mask_option: str = 'cond_sensitivity', map_option: str = 'many2one', other_mask: Array | None = None, name: str = 'kim')[source]#

The class for knowledge-informed mapping training, prediction, saving and loading.

Attributes:#

dataData

the copy of the __init__ argument

map_configsdict

the copy of the __init__ argument

map_optionstr

the copy of the __init__ argument

mask_optionstr

the copy of the __init__ argument

trainedbool

whether KIM has been trained

loaded_from_other_sourcesbool

whether KIM is loaded from other sources.

Nsint

the number of ensemble members (from data.Ns)

Nxint

the number of input features (from data.Nx)

Nyint

the number of output features (from data.Ny)

maskArray

the masked array with shape (Nx, Ny)

_n_mapsint

the number of maps

_mapsint

the trained maps

__init__(data: Data, map_configs: dict, mask_option: str = 'cond_sensitivity', map_option: str = 'many2one', other_mask: Array | None = None, name: str = 'kim')[source]#

Initialization function.

Parameters:
  • data (Data) – the Data object containing the ensemble data and sensitivity analysis result.

  • map_configs (dict) – the mapping configuration, including all the arguments of Map class except x and y.

  • mask_option (str) – the masking option including “sensitivity” (using data.sensitivity_mask), and “cond_sensitivity” (using data.cond_sensitivity_mask).

  • map_option (str) – the map option including “many2one”: knowledge-informed mapping using sensitivity analysis result as filter, and “many2many”: normal mapping without being knowledge-informed

  • other_mask (List) – the additional mask to be assigned to self.mask with size Nx. Default to None.

  • name (str) – the name of the KIM object

evaluate_maps_on_givendata()[source]#

Perform predictions on the given dataset

load(rootpath: PosixPath = PosixPath('.'))[source]#

load the trained KIM from specified location.

Parameters:

rootpath (PosixPath) – the root path where KIM will be loaded

property maps#
property n_maps#
predict(x: Array | None = None)[source]#

Prediction using the trained KIM.

Parameters:

x (Array) – predictors with shape (Ns,…,Nx)

save(rootpath: PosixPath = PosixPath('.'))[source]#
Save the KIM, including:
  • the data object

  • all the mappings

  • the remaining configurations

Parameters:

rootpath (PosixPath) – the root path where data will be saved

train(verbose: int = 0)[source]#

kim.map.Map Class#

class Map(x: ~jax.Array, y: ~jax.Array, model_type: type = <class 'kim.mapping_model.mlp.MLP'>, n_model: int = 0, ensemble_type: str = 'single', model_hp_choices: dict = {}, model_hp_fixed: dict = {}, optax_hp_choices: dict = {}, optax_hp_fixed: dict = {}, dl_hp_choices: dict = {}, dl_hp_fixed: dict = {}, training_parallel: bool = True, ens_seed: int | None = None, parallel_config: dict | None = None, device: ~jaxlib._jax.Device | None = None)[source]#

The class for one mapping training, prediction, saving and loading. Ensemble training is supported through either serial or parallel way, using joblib.

x#

the copy of the __init__ argument

Type:

array_like

y#

the copy of the __init__ argument

Type:

array_like

n_model#

the copy of the __init__ argument

Type:

int

training_parallel#

the copy of the __init__ argument

Type:

bool

model_type#

the copy of the __init__ argument

Type:

type

ensemble_type#

the copy of the __init__ argument

Type:

str

model_hp_choices#

the copy of the __init__ argument

Type:

dict

model_hp_fixed#

the copy of the __init__ argument

Type:

dict

optax_hp_choices#

the copy of the __init__ argument

Type:

dict

optax_hp_fixed#

the copy of the __init__ argument

Type:

dict

dl_hp_choices#

the copy of the __init__ argument

Type:

dict

dl_hp_fixed#

the copy of the __init__ argument

Type:

dict

training_parallel#

the copy of the __init__ argument

Type:

bool

ens_seed#

the copy of the __init__ argument

Type:

Optional[int], optional)

parallel_config#

the copy of the __init__ argument

Type:

Optional[dict], optional)

device#

the copy of the __init__ argument

Type:

Optional[Device], optional

trained#

whether the mapping has been trained

Type:

bool

loaded_from_other_sources#

whether the mapping is loaded from other sources.

Type:

bool

Ns#

number of samples

Type:

int

Nx#

number of input features

Type:

int

Ny#

number of output features

Type:

int

model_configs#

model hyperparameters for all ensemble models

Type:

list

optax_configs#

optimizer hyperparameters for all ensemble models

Type:

list

dl_configs#

dataloader hyperparameters for all ensemble models

Type:

list

model_ens#

list of trained model ensemble

Type:

list

loss_train_ens#

list of the training losses over steps

Type:

list

loss_val_ens#

list of the val losses over steps

Type:

list

__init__(x: ~jax.Array, y: ~jax.Array, model_type: type = <class 'kim.mapping_model.mlp.MLP'>, n_model: int = 0, ensemble_type: str = 'single', model_hp_choices: dict = {}, model_hp_fixed: dict = {}, optax_hp_choices: dict = {}, optax_hp_fixed: dict = {}, dl_hp_choices: dict = {}, dl_hp_fixed: dict = {}, training_parallel: bool = True, ens_seed: int | None = None, parallel_config: dict | None = None, device: ~jaxlib._jax.Device | None = None)[source]#

Initialization function.

Parameters:
  • x (array-like) – the predictors with shape (Ns, Nx)

  • y (array-like) – the predictands with shape (Ns, Ny)

  • model_type (type) – the equinox model class

  • n_model (int) – the number of ensemble models

  • ensemble_type (str) – the ensemble type, either ‘single’, ‘serial’ or ‘parallel’.

  • model_hp_choices (dict) – the tunable model hyperparameters, in dictionary format {key: [value1, value2,…]}. The model hyperparameters must follow the arguments of the specified model_type

  • model_hp_fixed (dict) – the fixed model hyperparameters, in dictionary format {key: value}. The model hyperparameters must follow the arguments of the specified model_type

  • optax_hp_choices (dict) – the tunable optimizer hyperparameters, in dictionary format {key: [value1, value2,…]}. The optimizer hyperparameters must follow the arguments of the specified optax optimizer. Key hyperparameters: ‘optimizer_type’ (str), ‘nsteps’ (int), and ‘loss_func’ (callable)

  • optax_hp_fixed (dict) – the fixed optimizer hyperparameters, in dictionary format {key: value}. The optimizer hyperparameters must follow the arguments of the specified model_type. Key hyperparameters: ‘optimizer_type’ (str), ‘nsteps’ (int), and ‘loss_func’ (callable)

  • dl_hp_choices (dict) – the tunable dataloader hyperparameters, in dictionary format {key: [value1, value2,…]}. The optimizer hyperparameters must follow the arguments of make_big_data_loader. Key hyperparameters: ‘batch_size’ (int) and ‘num_train_sample’ (int)

  • dl_hp_fixed (dict) – the fixed dataloader hyperparameters, in dictionary format {key: value}. The optimizer hyperparameters must follow the arguments of make_big_data_loader. Key hyperparameters: ‘batch_size’ (int) and ‘num_train_sample’ (int

  • training_parallel (bool) – whether to perform parallel training

  • ens_seed (Optional[int], optional) – the random seed for generating ensemble configurations.

  • parallel_config (Optional[dict], optional) – the parallel training configurations following the arguments of joblib.Parallel

  • device (Optional[Device], optional) – the computing device to be set

property dl_configs#
load(rootpath: PosixPath = PosixPath('.'))[source]#

load the trained mapping from specified location.

Parameters:

rootpath (PosixPath) – the root path where mappings will be loaded

property loss_train_ens#
property loss_val_ens#
property model_configs#
property model_ens#
property n_model#
property optax_configs#
predict(x: Array)[source]#

Prediction using the trained mapping.

Parameters:

x (Array) – predictors with shape (Ns,…,Nx)

save(rootpath: PosixPath = PosixPath('.'))[source]#
Save the trained mapping to specified location, including:
  • trained models

  • model/optax/dl configurations

  • loss values for both training and validation sets

Parameters:

rootpath (PosixPath) – the root path where mappings will be saved

train(verbose: int = 0)[source]#

Mapping training.

Parameters:

verbose (int) – the verbosity level (0: normal, 1: debug)