Explain your AI model's decisions.
Copy-paste Python scripts for SHAP, LIME, and counterfactual explanations. Each script outputs the KoraSafe bias testing CSV format, ready for import into your compliance package.
Copy-paste Python scripts for SHAP, LIME, and counterfactual explanations. Each script outputs the KoraSafe bias testing CSV format, ready for import into your compliance package.
Human oversight of consequential AI decisions is meaningless without explainability. A reviewer staring at a black-box output cannot effectively override it. Explainability tooling bridges the gap: it surfaces which inputs drove the decision and what would have flipped it, giving reviewers the context to act. Requirements for explainability appear across multiple frameworks, including GDPR Article 22, FCRA adverse-action notice obligations, and human oversight requirements that apply to consequential AI.
Fastest and most accurate for tree-based models (GBMs, XGBoost, LightGBM, random forests). Exact for linear models. Use TreeExplainer or LinearExplainer depending on model type.
Model-agnostic: works on neural nets, SVMs, text classifiers, and image classifiers. Slower than SHAP, but handles model families SHAP does not support efficiently.
The preferred format for customer-facing explanations. Answers "what would have flipped the decision?" rather than "which feature contributed most?" Aligns with FCRA adverse-action notice structure.
All three scripts output the same CSV format: subgroup_category, subgroup_value, metric, value. Import the file at Compliance > Bias Testing > Import CSV.
Decomposes a model's output into per-feature contributions using Shapley values. Positive values push toward the predicted class; negative values push away. Aggregated by subgroup to surface disparate feature influence across protected attributes.
pip install shap scikit-learn pandas numpy
import shap import pandas as pd import numpy as np import csv, pathlib from sklearn.ensemble import GradientBoostingClassifier # --- Replace with your model and data --- FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"] PROTECTED_ATTRIBUTE = "age_group" SUBGROUP_VALUES = ["18-34", "35-54", "55+"] # model = your_trained_model # X_test = your_test_dataframe[FEATURE_NAMES] # X with PROTECTED_ATTRIBUTE column must be available # Compute SHAP values explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test) shap_df = pd.DataFrame(shap_values, columns=FEATURE_NAMES) shap_df[PROTECTED_ATTRIBUTE] = X.loc[X_test.index, PROTECTED_ATTRIBUTE].values # Aggregate by subgroup rows = [] for subgroup_value in SUBGROUP_VALUES: mask = shap_df[PROTECTED_ATTRIBUTE] == subgroup_value subset = shap_df[mask][FEATURE_NAMES] for feature in FEATURE_NAMES: rows.append({ "subgroup_category": PROTECTED_ATTRIBUTE, "subgroup_value": subgroup_value, "metric": feature, "value": round(float(subset[feature].abs().mean()), 6), }) # Write CSV output_path = pathlib.Path("shap_bias_testing.csv") with output_path.open("w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"]) writer.writeheader() writer.writerows(rows) print(f"Written {len(rows)} rows to {output_path}")
subgroup_category, subgroup_value, metric, value
Mean absolute SHAP value per (subgroup, feature) pair. Higher values indicate that feature drove the decision more strongly for that subgroup. Import at Compliance > Bias Testing > Import CSV.
View full script on GitHub . Includes synthetic data setup, train/test split, and runnable end-to-end example.
LIME fits a simple surrogate model locally around each prediction. Model-agnostic: works on neural nets, SVMs, text classifiers, and image classifiers where SHAP TreeExplainer does not apply. Slower than SHAP, so sample a representative subset for aggregation.
pip install lime scikit-learn pandas numpy
from lime.lime_tabular import LimeTabularExplainer import numpy as np import csv, pathlib # --- Replace with your model, training data, and feature names --- FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"] PROTECTED_ATTRIBUTE = "age_group" SUBGROUP_VALUES = ["18-34", "35-54", "55+"] # model = your_trained_model (must expose predict_proba) # X_train_scaled, X_test_scaled = scaled arrays explainer = LimeTabularExplainer( training_data=X_train_scaled, feature_names=FEATURE_NAMES, class_names=["denied", "approved"], mode="classification", ) MAX_SAMPLES = min(200, len(X_test_scaled)) sample_indices = np.random.default_rng(seed=0).choice(len(X_test_scaled), MAX_SAMPLES, replace=False) feature_weights = {} for idx in sample_indices: exp = explainer.explain_instance(X_test_scaled[idx], model.predict_proba, num_features=len(FEATURE_NAMES)) subgroup_value = X_test_with_attr.loc[idx, PROTECTED_ATTRIBUTE] for feature, weight in exp.as_list(): feature_name = feature.split(" ")[0] key = (PROTECTED_ATTRIBUTE, subgroup_value, feature_name) feature_weights.setdefault(key, []).append(abs(weight)) rows = [ {"subgroup_category": sc, "subgroup_value": sv, "metric": m, "value": round(float(np.mean(w)), 6)} for (sc, sv, m), w in feature_weights.items() ] output_path = pathlib.Path("lime_bias_testing.csv") with output_path.open("w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"]) writer.writeheader() writer.writerows(rows) print(f"Written {len(rows)} rows to {output_path}")
subgroup_category, subgroup_value, metric, value
Mean absolute LIME weight per (subgroup, feature) pair across the sampled instances. Higher values indicate stronger local influence for that subgroup.
View full script on GitHub . Includes MLP example, scaler setup, and protected attribute subgroup aggregation.
Counterfactuals answer "what is the minimum change that would have flipped the decision?" This format is preferred for customer-facing adverse-action explanations because it gives the affected person an actionable path, not just a list of feature weights. Use DiCE (Diverse Counterfactual Explanations) from Microsoft.
pip install dice-ml scikit-learn pandas numpy
import dice_ml import numpy as np import csv, pathlib # --- Replace with your model, dataframe, and feature names --- FEATURE_NAMES = ["credit_score", "debt_to_income", "employment_length_years"] TARGET_COLUMN = "approved" PROTECTED_ATTRIBUTE = "age_group" SUBGROUP_VALUES = ["18-34", "35-54", "55+"] # df, X_train_df, X_test_df = your dataframes # model = your sklearn-compatible model data_interface = dice_ml.Data( dataframe=X_train_df[FEATURE_NAMES + [TARGET_COLUMN]], continuous_features=FEATURE_NAMES, outcome_name=TARGET_COLUMN, ) model_interface = dice_ml.Model(model=model, backend="sklearn") explainer = dice_ml.Dice(data_interface, model_interface, method="random") # Generate counterfactuals for denied instances denied_test = X_test_df[X_test_df[TARGET_COLUMN] == 0].head(50) delta_records = {} for _, row in denied_test.iterrows(): instance = row[FEATURE_NAMES].to_frame().T.reset_index(drop=True) subgroup_value = row[PROTECTED_ATTRIBUTE] try: cfs = explainer.generate_counterfactuals(instance, total_CFs=3, desired_class="opposite") cf_df = cfs.cf_examples_list[0].final_cfs_df if cf_df is None: continue for _, cf_row in cf_df.iterrows(): for feature in FEATURE_NAMES: delta = abs(float(cf_row[feature]) - float(row[feature])) key = (PROTECTED_ATTRIBUTE, subgroup_value, f"cf_delta_{feature}") delta_records.setdefault(key, []).append(delta) except Exception: continue rows = [ {"subgroup_category": sc, "subgroup_value": sv, "metric": m, "value": round(float(np.mean(d)), 6)} for (sc, sv, m), d in delta_records.items() ] output_path = pathlib.Path("counterfactuals_bias_testing.csv") with output_path.open("w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["subgroup_category", "subgroup_value", "metric", "value"]) writer.writeheader() writer.writerows(rows) print(f"Written {len(rows)} rows")
subgroup_category, subgroup_value, cf_delta_<feature>, value
Mean absolute delta per (subgroup, feature) needed to flip a denial. Lower values mean smaller changes were required for that subgroup, which may indicate disparate ease of recourse across groups.
View full script on GitHub . Includes synthetic data setup, denied-instance filtering, and full CSV output.
Questions about the right tool for your model? Contact us at Contact-us@korasafe.ai