Explainability | KoraSafe™ developers

Human oversight of consequential AI decisions is meaningless without explainability. A reviewer staring at a black-box output cannot effectively override it. Explainability tooling bridges the gap: it surfaces which inputs drove the decision and what would have flipped it, giving reviewers the context to act. Requirements for explainability appear across multiple frameworks, including GDPR Article 22, FCRA adverse-action notice obligations, and human oversight requirements that apply to consequential AI.

Pick SHAP for tabular models

Fastest and most accurate for tree-based models (GBMs, XGBoost, LightGBM, random forests). Exact for linear models. Use TreeExplainer or LinearExplainer depending on model type.

Pick LIME for non-tree models

Model-agnostic: works on neural nets, SVMs, text classifiers, and image classifiers. Slower than SHAP, but handles model families SHAP does not support efficiently.

Pick counterfactuals for adverse actions

The preferred format for customer-facing explanations. Answers "what would have flipped the decision?" rather than "which feature contributed most?" Aligns with FCRA adverse-action notice structure.

All three scripts output the same CSV format: subgroup_category, subgroup_value, metric, value. Import the file at Compliance > Bias Testing > Import CSV.

Tabular models

SHAP for tabular classification

Decomposes a model's output into per-feature contributions using Shapley values. Positive values push toward the predicted class; negative values push away. Aggregated by subgroup to surface disparate feature influence across protected attributes.

pip install shap scikit-learn pandas numpy

shap-quickstart.py

import shap
import pandas as pd
import numpy as np
import csv, pathlib
from sklearn.ensemble import GradientBoostingClassifier

# --- Replace with your model and data ---
FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"]
PROTECTED_ATTRIBUTE = "age_group"
SUBGROUP_VALUES = ["18-34", "35-54", "55+"]
# model = your_trained_model
# X_test = your_test_dataframe[FEATURE_NAMES]
# X with PROTECTED_ATTRIBUTE column must be available

# Compute SHAP values
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap_df = pd.DataFrame(shap_values, columns=FEATURE_NAMES)
shap_df[PROTECTED_ATTRIBUTE] = X.loc[X_test.index, PROTECTED_ATTRIBUTE].values

# Aggregate by subgroup
rows = []
for subgroup_value in SUBGROUP_VALUES:
    mask = shap_df[PROTECTED_ATTRIBUTE] == subgroup_value
    subset = shap_df[mask][FEATURE_NAMES]
    for feature in FEATURE_NAMES:
        rows.append({
            "subgroup_category": PROTECTED_ATTRIBUTE,
            "subgroup_value": subgroup_value,
            "metric": feature,
            "value": round(float(subset[feature].abs().mean()), 6),
        })

# Write CSV
output_path = pathlib.Path("shap_bias_testing.csv")
with output_path.open("w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"])
    writer.writeheader()
    writer.writerows(rows)
print(f"Written {len(rows)} rows to {output_path}")

Output format

subgroup_category, subgroup_value, metric, value

Mean absolute SHAP value per (subgroup, feature) pair. Higher values indicate that feature drove the decision more strongly for that subgroup. Import at Compliance > Bias Testing > Import CSV.

Loan scoring Fraud detection Credit decisions Claims classification scikit-learn, XGBoost, LightGBM

View full script on GitHub . Includes synthetic data setup, train/test split, and runnable end-to-end example.

Non-tree models

LIME for model-agnostic explanations

LIME fits a simple surrogate model locally around each prediction. Model-agnostic: works on neural nets, SVMs, text classifiers, and image classifiers where SHAP TreeExplainer does not apply. Slower than SHAP, so sample a representative subset for aggregation.

pip install lime scikit-learn pandas numpy

lime-alternative.py

from lime.lime_tabular import LimeTabularExplainer
import numpy as np
import csv, pathlib

# --- Replace with your model, training data, and feature names ---
FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"]
PROTECTED_ATTRIBUTE = "age_group"
SUBGROUP_VALUES = ["18-34", "35-54", "55+"]
# model = your_trained_model (must expose predict_proba)
# X_train_scaled, X_test_scaled = scaled arrays

explainer = LimeTabularExplainer(
    training_data=X_train_scaled,
    feature_names=FEATURE_NAMES,
    class_names=["denied", "approved"],
    mode="classification",
)

MAX_SAMPLES = min(200, len(X_test_scaled))
sample_indices = np.random.default_rng(seed=0).choice(len(X_test_scaled), MAX_SAMPLES, replace=False)

feature_weights = {}
for idx in sample_indices:
    exp = explainer.explain_instance(X_test_scaled[idx], model.predict_proba, num_features=len(FEATURE_NAMES))
    subgroup_value = X_test_with_attr.loc[idx, PROTECTED_ATTRIBUTE]
    for feature, weight in exp.as_list():
        feature_name = feature.split(" ")[0]
        key = (PROTECTED_ATTRIBUTE, subgroup_value, feature_name)
        feature_weights.setdefault(key, []).append(abs(weight))

rows = [
    {"subgroup_category": sc, "subgroup_value": sv, "metric": m, "value": round(float(np.mean(w)), 6)}
    for (sc, sv, m), w in feature_weights.items()
]

output_path = pathlib.Path("lime_bias_testing.csv")
with output_path.open("w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"])
    writer.writeheader()
    writer.writerows(rows)
print(f"Written {len(rows)} rows to {output_path}")

Output format

subgroup_category, subgroup_value, metric, value

Mean absolute LIME weight per (subgroup, feature) pair across the sampled instances. Higher values indicate stronger local influence for that subgroup.

Neural networks SVMs Text classifiers Image classifiers Any model with predict_proba

View full script on GitHub . Includes MLP example, scaler setup, and protected attribute subgroup aggregation.

LLM-style decisions

Counterfactual explanations for adverse actions

Counterfactuals answer "what is the minimum change that would have flipped the decision?" This format is preferred for customer-facing adverse-action explanations because it gives the affected person an actionable path, not just a list of feature weights. Use DiCE (Diverse Counterfactual Explanations) from Microsoft.

pip install dice-ml scikit-learn pandas numpy

counterfactuals-dice.py

import dice_ml
import numpy as np
import csv, pathlib

# --- Replace with your model, dataframe, and feature names ---
FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"]
TARGET_COLUMN = "approved"
PROTECTED_ATTRIBUTE = "age_group"
SUBGROUP_VALUES = ["18-34", "35-54", "55+"]
# df, X_train_df, X_test_df = your dataframes
# model = your sklearn-compatible model

data_interface = dice_ml.Data(
    dataframe=X_train_df[FEATURE_NAMES + [TARGET_COLUMN]],
    continuous_features=FEATURE_NAMES,
    outcome_name=TARGET_COLUMN,
)
model_interface = dice_ml.Model(model=model, backend="sklearn")
explainer = dice_ml.Dice(data_interface, model_interface, method="random")

# Generate counterfactuals for denied instances
denied_test = X_test_df[X_test_df[TARGET_COLUMN] == 0].head(50)
delta_records = {}

for _, row in denied_test.iterrows():
    instance = row[FEATURE_NAMES].to_frame().T.reset_index(drop=True)
    subgroup_value = row[PROTECTED_ATTRIBUTE]
    try:
        cfs = explainer.generate_counterfactuals(instance, total_CFs=3, desired_class="opposite")
        cf_df = cfs.cf_examples_list[0].final_cfs_df
        if cf_df is None: continue
        for _, cf_row in cf_df.iterrows():
            for feature in FEATURE_NAMES:
                delta = abs(float(cf_row[feature]) - float(row[feature]))
                key = (PROTECTED_ATTRIBUTE, subgroup_value, f"cf_delta_{feature}")
                delta_records.setdefault(key, []).append(delta)
    except Exception:
        continue

rows = [
    {"subgroup_category": sc, "subgroup_value": sv, "metric": m, "value": round(float(np.mean(d)), 6)}
    for (sc, sv, m), d in delta_records.items()
]

output_path = pathlib.Path("counterfactuals_bias_testing.csv")
with output_path.open("w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"])
    writer.writeheader()
    writer.writerows(rows)
print(f"Written {len(rows)} rows")

Output format

subgroup_category, subgroup_value, cf_delta_<feature>, value

Mean absolute delta per (subgroup, feature) needed to flip a denial. Lower values mean smaller changes were required for that subgroup, which may indicate disparate ease of recourse across groups.

Loan denial explanations FCRA adverse-action notices GDPR Art. 22 explanations Credit decisions Employment screening

View full script on GitHub . Includes synthetic data setup, denied-instance filtering, and full CSV output.

Questions about the right tool for your model? Contact us at Contact-us@korasafe.ai

Developer hub Methodology