Statistical Testing · Independence

Adaptive Multiscale Binary Expansion Tests for Independence

CoBET, dCoBET, and wa-dCoBET — a family of nonparametric tests for independence in high-dimensional data, built on dyadic binary expansion features and SNR-guided adaptive weighting. Benchmarked against HSIC and dCov across Clayton-copula simulation settings.

CoBET dCoBET wa-dCoBET Clayton Copula BH-FDR
▶ Interactive Demo View Code

Overview

This project develops a suite of nonparametric independence tests that exploit the dyadic (binary expansion) structure of the probability integral transform. The tests are designed to be powerful against diverse alternatives — including trigonometric, exponential-quadratic, linear, and log-quadratic dependence structures — while maintaining valid Type I error control.

Key Contributions

The core statistic decomposes cross-dependence via centered indicator functions on dyadic cells. Three test statistics are presented: CoBET (univariate baseline), dCoBET (multivariate extension with identity or J-weighted matrices), and wa-dCoBET (weight-adaptive, using 10-fold SNR blending to select between weight matrices on each coordinate pair).

The pairwise heatmap visualization allows simultaneous testing of all d × d coordinate dependencies with BH-FDR multiple testing correction, providing an interpretable discovery map across high-dimensional random vectors.

Methods

Proposed
CoBET
Copula-based Binary Expansion Test. Univariate independence test using dyadic features of the probability integral transform with plug-in variance normalization.
Proposed
dCoBET
Multivariate extension of CoBET. Supports identity and spectral (J) weight matrices with full U-statistic decomposition T₁ − 2T₂ + T₃.
Proposed
wa-dCoBET
Weight-Adaptive dCoBET. 10-fold cross-validation selects the optimal blend of identity vs. J-weighting per coordinate pair, with pairwise heatmap and BH-FDR output.
Baseline
HSIC
Hilbert-Schmidt Independence Criterion with Gaussian kernels and the median heuristic. Permutation test via hyppo. Baseline comparator.
Baseline
dCov / dCor
Distance covariance-based independence test via hyppo.Dcorr, using permutation calibration. Second baseline comparator across all transforms.

Test Statistic

The full statistic combines three U-statistic terms for an unbiased estimator of the dependence measure:

eq. (1) T = T₁ − 2T₂ + T₃

T₁ = (1/n(n−1)) Σᵢ≠ⱼ Kₐ(i,j) Kᵦ(i,j)
T₂ = (1/n(n−1)(n−2)) [ΣS₂ − ΣS₃ terms for KC, KA, KB]
T₃ = (1/n(n−1)(n−2)(n−3)) [S₁² − 4(S₂−S₃) − 2S₃ terms]

The plug-in variance estimate is:

eq. (2) Var̂(T̃₁) = (2 / n(n−1)) · tr(Wₐ Ŝₐ Wₐ Ŝₐ) · tr(Wᵦ Ŝᵦ Wᵦ Ŝᵦ)

Transform Families (Simulation DGPs)

trigU
Trigonometric nonlinearity via standard normal quantile transform.
X = sin(Φ⁻¹(u))
Y = cos(b·X + v)
expquad
Exponential-quadratic bump at unit amplitude.
X = exp(−Z²)
Y = exp(−b(X−1)² + v)
linear
Simple linear regression model with additive noise.
X = u
Y = b·X + v
logquad
Phase + amplitude modulation with log-quadratic feature compression.
X = log(1+Z²)/(1+log(1+Z²))
Y = cos(b·X+v)·exp(−b(X−0.7)²)

wa-dCoBET: 10-Fold Adaptive Weighting

1
Split into 10 Folds
For each coordinate pair (r, s), randomly partition the n observations into 10 equal folds using a fixed random seed.
2
SNR Comparison per Fold
On each fold, compute the Z-statistic under identity weights (W = I) and under J-weights (W = J). Pick the weight that yields higher SNR: argmax{Z_id, Z_J}.
3
Blend Weights
Let w_id = (# folds choosing I)/10 and w_J = 1 − w_id. Form the blended weight matrix W_blend = w_id · I + w_J · J.
4
Full-Data Test with Blended W
Apply the blended weight matrix to compute T and Var̂(T) on the full n observations. Obtain Z = T / √Var̂(T) for pair (r, s).
5
BH-FDR Correction
Collect all d² p-values P_{rs} = 1 − Φ(Z_{rs}). Apply Benjamini-Hochberg at level q = 0.05. Stars (★) mark discoveries in the pairwise heatmap.

Interactive Pairwise Heatmap

Simulate the wa-dCoBET pairwise Z-statistic heatmap. Adjust parameters and click Run Simulation to generate a new heatmap. Stars mark BH-FDR significant pairs (q = 0.05).

wa-dCoBET · Pairwise Z-Statistic Heatmap
Transform
Signal b b = 0.40
Dims d
Theta θ
Low Z
High Z
Discoveries: — Mean w_id: — Mean w_J: —

Simulation Results

Power comparison at α = 0.05 across methods, transforms, and sample sizes. Data generated under Clayton(θ=2) copula in dimension d = 5. R = 1000 replications.

Method Transform n = 250 n = 500 n = 1000
wa-dCoBETtrigU 0.720.910.99
dCoBET (J)trigU 0.680.880.97
dCoBET (id)trigU 0.550.750.90
HSICtrigU 0.510.720.89
dCovtrigU 0.430.660.84
wa-dCoBETlogquad 0.650.850.97
dCoBET (J)logquad 0.580.790.93
HSIClogquad 0.420.630.82
dCovlogquad 0.380.580.77
wa-dCoBETlinear 0.810.961.00
HSIClinear 0.790.951.00
dCovlinear 0.800.950.99

↑ Green ≥ 0.8 · Yellow 0.5–0.8 · Red < 0.5. Type I error uniformly ≤ 0.06 for all methods.

Code Snippets

wa-dCoBET: 10-Fold Blended Weight

def blended_weight_from_10fold(A_pair, B_pair, W_id, W_J, rng): folds = ten_folds_indices(A_pair.shape[1], rng) picks = [] for fidx in folds: A_f, B_f = A_pair[:, fidx], B_pair[:, fidx] Z_id = Z_for_pair(A_f, B_f, W_id) Z_J = Z_for_pair(A_f, B_f, W_J) picks.append("identity" if Z_id >= Z_J else "J") w_id = sum(p == "identity" for p in picks) / 10.0 W_blend = w_id * W_id + (1 - w_id) * W_J return W_blend, w_id, 1 - w_id

BH-FDR Correction

def bh_fdr_mask(pvals_2d, q=0.05): p = pvals_2d.flatten(); m = p.size order = np.argsort(p); p_sorted = p[order] thresh = q * (np.arange(1, m + 1) / m) below = p_sorted <= thresh reject = np.zeros(m, dtype=bool) if np.any(below): kmax = np.max(np.where(below)) reject[order[:kmax + 1]] = True return reject.reshape(pvals_2d.shape)

Run Pairwise Heatmap

# Reproduce the pairwise heatmap (requires the BET module) out = pairwise_heatmap_new_stat( n=500, d_coords=10, theta=2, K=4, b=0.4, transform_key="logquad", q_fdr=0.05, seed_data=123, unbiased_plugin=True, J_reuse=True ) # out keys: Z, T, Var, p, sig_bh, w_id, w_J