This notebook shows how to use KIM to perform inverse modeling for a cloud chamber model.


In [1]:
# Libraries
from pathlib import Path
import pandas as pd

from kim.map import KIM
from kim.data import Data
from kim.mapping_model import MLP

%load_ext autoreload
%autoreload 2


# Read the data
- The `Output.csv` file includes the ensemble of the two parameters to be estimated, represented by $\mathbf{Y}$.
- The `Input_np_holodec.csv` file includes the ensemble of the simulations generated by the cloud chamber model, represented by $\mathbf{X}$.

In [2]:
# File and folder paths
f_para = Path("./data-holodec-PoissonPertb/Output.csv")
f_state = Path("./data-holodec-PoissonPertb/Input_np_holodec.csv")


In [3]:
# Read the data
df_para, df_state = pd.read_csv(f_para),pd.read_csv(f_state)
y_keys, x_keys = df_para.keys(), df_state.keys()
y, x = df_para.values, df_state.values
x.shape, y.shape

((513, 1458), (513, 2))

# Configurations

## Preliminary analysis configuration

In [4]:
# The random seed used in the statistical significance test
seed_shuffle = 1234

# The folder where the data analysis results will be saved
f_data_save = Path("./results-holodec/data")


In [5]:
# Data configuration
data_params = {
    "xscaler_type": "minmax",  # scaler for the x (input) data
    "yscaler_type": "minmax",  # scaler for the y (output) data
}

# Sensitivity analysis configuration
sensitivity_params = {
    "method": "pc", "metric": "it-knn",
    "sst": True, "ntest": 100, "alpha": 0.05, "k": 3,
    "n_jobs": 100, "seed_shuffle": seed_shuffle,
    "verbose": 1
}


## Ensemble learning configuration

In [6]:
# Some basis ensemble learning configuration
Ns_train = 400
Ns_val = 50
hidden_activation = 'sigmoid'
final_activation = 'leaky_relu'
seed_ens = 1024
seed_predict = 3636
seed_dl = 10
seed_model = 100
training_verbose = 1
n_models = 100
n_jobs = 20  # Number of parallel jobs used in joblib

# Locations where the ensemble learning results will be saved
f_kim_save1 = Path("./results-holodec/map_many2many")
f_kim_save2 = Path("./results-holodec/map_many2one")
f_kim_save3 = Path("./results-holodec/map_many2one_cond")


In [7]:
# Mapping parameters for each test below
map_configs = {
    "model_type": MLP,
    'n_model': n_models,
    'ensemble_type': 'ens_random',
    'model_hp_choices': {
        "depth": [1,3,5,6],
        "width_size": [3,6,10]
    },
    'model_hp_fixed': {
        "hidden_activation": hidden_activation,
        "final_activation": final_activation,
        "model_seed": seed_model
    },
    'optax_hp_choices': {
        'learning_rate': [0.01, 0.005, 0.003],
    },
    'optax_hp_fixed': {
        'nsteps': 300,
        'optimizer_type': 'adam',
    },
    'dl_hp_choices': {
    },
    'dl_hp_fixed': {
        'dl_seed': seed_dl,
        'num_train_sample': Ns_train,
        'num_val_sample': Ns_val,
        'batch_size': 64
    },
    'ens_seed': seed_ens,
    'training_parallel': True,
    'parallel_config': {
        'n_jobs': n_jobs, 
        'backend': 'loky',
        'verbose': 1
    },
    'device': None,
}

# Perform preliminary data analysis
The analysis include both sensitivity analysis and redundancy filtering check

In [8]:
data = Data(x, y, **data_params)
data.calculate_sensitivity(**sensitivity_params)


Using the kNN-based information theoretic metrics ...
Performing pairwise analysis to remove insensitive inputs ...


100%|██████████| 1458/1458 [10:28<00:00,  2.32it/s]  


Performing conditional independence testing to remove redundant inputs ...


# Train the inverse mapping

Now, let's train the inverse mappings via ensemble learning. We are training three types of inverse mappings:
- `kim1`: The naive inverse mapping from all $\mathbf{Y}$ to all $\mathbf{X}$
- `kim2`: The knowledge-informed inverse mapping from sensitive $\mathbf{Y}$ to each of $\mathbf{X}$ using global sensitivity analysis
- `kim3`: The knowledge-informed inverse mapping from sensitive $\mathbf{Y}$ to each of $\mathbf{X}$ using global sensitivity analysis + redundancy filtering check


In [9]:
# Initialize three diffferent KIMs
kim1 = KIM(data, map_configs, map_option='many2many')
kim2 = KIM(data, map_configs, mask_option="sensitivity", map_option='many2one')
kim3 = KIM(data, map_configs, mask_option="cond_sensitivity", map_option='many2one')

# Train the mappings
kim1.train()
kim2.train()
kim3.train()



 Performing ensemble training in parallel with 100 model configurations...



[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
  pid = os.fork()
100%|██████████| 300/300 [00:03<00:00, 88.71it/s] 
100%|██████████| 300/300 [00:03<00:00, 85.63it/s] 
100%|██████████| 300/300 [00:03<00:00, 84.97it/s] 
100%|██████████| 300/300 [00:03<00:00, 83.72it/s] 
100%|██████████| 300/300 [00:03<00:00, 78.77it/s]
100%|██████████| 300/300 [00:03<00:00, 77.45it/s]
100%|██████████| 300/300 [00:03<00:00, 75.99it/s]
100%|██████████| 300/300 [00:04<00:00, 73.51it/s]
100%|██████████| 300/300 [00:04<00:00, 71.85it/s]
100%|██████████| 300/300 [00:04<00:00, 69.90it/s]
100%|██████████| 300/300 [00:04<00:00, 70.15it/s]
 86%|████████▌ | 257/300 [00:04<00:00, 78.81it/s][Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:   34.4s
100%|██████████| 300/300 [00:04<00:00, 65.90it/s]
100%|██████████| 300/300 [00:04<00:00, 63.90it/s]
100%|██████████| 300/300 [00:04<00:00, 64.37it/s]
100%|██████████| 300/300 [00:04<00:00, 63.90it/s]
100%|██████████| 300/300 [00:04<00:00, 6

Training completes.

 Performing ensemble training in parallel with 100 model configurations...



100%|██████████| 300/300 [00:03<00:00, 95.44it/s] 
100%|██████████| 300/300 [00:03<00:00, 92.17it/s] 
100%|██████████| 300/300 [00:03<00:00, 89.40it/s] 
100%|██████████| 300/300 [00:03<00:00, 89.71it/s] 
100%|██████████| 300/300 [00:03<00:00, 87.52it/s]
100%|██████████| 300/300 [00:03<00:00, 87.72it/s] 
100%|██████████| 300/300 [00:03<00:00, 79.90it/s] 
100%|██████████| 300/300 [00:03<00:00, 76.54it/s]
100%|██████████| 300/300 [00:03<00:00, 76.83it/s]
100%|██████████| 300/300 [00:04<00:00, 70.75it/s]
100%|██████████| 300/300 [00:04<00:00, 70.82it/s]
[Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:    4.7s
100%|██████████| 300/300 [00:04<00:00, 68.49it/s]
100%|██████████| 300/300 [00:04<00:00, 68.38it/s]
100%|██████████| 300/300 [00:04<00:00, 64.42it/s]
100%|██████████| 300/300 [00:04<00:00, 63.51it/s]
100%|██████████| 300/300 [00:04<00:00, 63.83it/s]
100%|██████████| 300/300 [00:04<00:00, 64.53it/s]
100%|██████████| 300/300 [00:05<00:00, 59.58it/s]]
100%|██████████| 300/300 [00:05

Training completes.

 Performing ensemble training in parallel with 100 model configurations...



100%|██████████| 300/300 [00:02<00:00, 100.32it/s]
100%|██████████| 300/300 [00:02<00:00, 100.22it/s]
100%|██████████| 300/300 [00:03<00:00, 98.96it/s] 
100%|██████████| 300/300 [00:03<00:00, 99.55it/s] 
100%|██████████| 300/300 [00:03<00:00, 96.81it/s] 
100%|██████████| 300/300 [00:03<00:00, 98.77it/s] 
100%|██████████| 300/300 [00:03<00:00, 84.73it/s] 
100%|██████████| 300/300 [00:03<00:00, 84.13it/s] 
100%|██████████| 300/300 [00:03<00:00, 83.98it/s]
100%|██████████| 300/300 [00:03<00:00, 83.60it/s]
  3%|▎         | 10/300 [00:00<00:18, 15.83it/s][Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:    4.1s
100%|██████████| 300/300 [00:04<00:00, 73.63it/s]
100%|██████████| 300/300 [00:04<00:00, 72.93it/s]
100%|██████████| 300/300 [00:04<00:00, 69.77it/s]]
100%|██████████| 300/300 [00:04<00:00, 70.11it/s]
100%|██████████| 300/300 [00:04<00:00, 69.17it/s]
100%|██████████| 300/300 [00:04<00:00, 68.57it/s]
100%|██████████| 300/300 [00:04<00:00, 67.80it/s]]
100%|██████████| 300/300 [00:0

Training completes.

 Performing ensemble training in parallel with 100 model configurations...



100%|██████████| 300/300 [00:03<00:00, 99.90it/s] 
100%|██████████| 300/300 [00:03<00:00, 98.12it/s] 
100%|██████████| 300/300 [00:03<00:00, 95.87it/s] 
100%|██████████| 300/300 [00:03<00:00, 94.29it/s] 
100%|██████████| 300/300 [00:03<00:00, 93.40it/s] 
100%|██████████| 300/300 [00:03<00:00, 93.39it/s] 
100%|██████████| 300/300 [00:03<00:00, 84.53it/s] 
100%|██████████| 300/300 [00:03<00:00, 84.25it/s]
100%|██████████| 300/300 [00:03<00:00, 82.34it/s]
100%|██████████| 300/300 [00:03<00:00, 80.26it/s]
 86%|████████▌ | 257/300 [00:04<00:00, 81.91it/s][Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:    4.3s
100%|██████████| 300/300 [00:04<00:00, 73.34it/s]
100%|██████████| 300/300 [00:04<00:00, 73.15it/s]
100%|██████████| 300/300 [00:04<00:00, 70.63it/s]
100%|██████████| 300/300 [00:04<00:00, 68.42it/s]
100%|██████████| 300/300 [00:04<00:00, 68.11it/s]
100%|██████████| 300/300 [00:04<00:00, 66.48it/s]
100%|██████████| 300/300 [00:04<00:00, 65.92it/s]
100%|██████████| 300/300 [00:04<

Training completes.

 Performing ensemble training in parallel with 100 model configurations...



[Parallel(n_jobs=20)]: Done 100 out of 100 | elapsed:   44.8s finished
[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
100%|██████████| 300/300 [00:03<00:00, 98.24it/s] 
100%|██████████| 300/300 [00:03<00:00, 98.80it/s] 
100%|██████████| 300/300 [00:03<00:00, 98.59it/s] 
100%|██████████| 300/300 [00:03<00:00, 87.73it/s] 
100%|██████████| 300/300 [00:04<00:00, 72.98it/s]
100%|██████████| 300/300 [00:04<00:00, 72.85it/s]
100%|██████████| 300/300 [00:04<00:00, 70.92it/s]
100%|██████████| 300/300 [00:04<00:00, 69.81it/s]
100%|██████████| 300/300 [00:04<00:00, 68.48it/s]
100%|██████████| 300/300 [00:04<00:00, 72.20it/s]
100%|██████████| 300/300 [00:04<00:00, 64.56it/s]
 75%|███████▌  | 226/300 [00:03<00:00, 86.19it/s][Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:    7.1s
100%|██████████| 300/300 [00:04<00:00, 69.45it/s]
100%|██████████| 300/300 [00:04<00:00, 67.90it/s]
100%|██████████| 300/300 [00:04<00:00, 71.22it/s]
100%|██████████| 300/300 [00:04<00:00

Training completes.


100%|██████████| 300/300 [00:04<00:00, 71.18it/s]
[Parallel(n_jobs=20)]: Done 100 out of 100 | elapsed:   47.7s finished


# Save all the results to disk

In [10]:
# Preliminary analysis results
data.save(f_data_save)

# Inverse mapping results
kim1.save(f_kim_save1)
kim2.save(f_kim_save2)
kim3.save(f_kim_save3)
