Adaptive Multiscale Binary Expansion Tests for Independence

Overview

This project develops a suite of nonparametric independence tests that exploit the dyadic (binary expansion) structure of the probability integral transform. The tests are designed to be powerful against diverse alternatives — including trigonometric, exponential-quadratic, linear, and log-quadratic dependence structures — while maintaining valid Type I error control.

Key Contributions

The core statistic decomposes cross-dependence via centered indicator functions on dyadic cells. Three test statistics are presented: CoBET (univariate baseline), dCoBET (multivariate extension with identity or J-weighted matrices), and wa-dCoBET (weight-adaptive, using 10-fold SNR blending to select between weight matrices on each coordinate pair).

The pairwise heatmap visualization allows simultaneous testing of all d × d coordinate dependencies with BH-FDR multiple testing correction, providing an interpretable discovery map across high-dimensional random vectors.

Methods

Proposed

CoBET

Copula-based Binary Expansion Test. Univariate independence test using dyadic features of the probability integral transform with plug-in variance normalization.

Proposed

dCoBET

Multivariate extension of CoBET. Supports identity and spectral (J) weight matrices with full U-statistic decomposition T₁ − 2T₂ + T₃.

Proposed

wa-dCoBET

Weight-Adaptive dCoBET. 10-fold cross-validation selects the optimal blend of identity vs. J-weighting per coordinate pair, with pairwise heatmap and BH-FDR output.

Baseline

HSIC

Hilbert-Schmidt Independence Criterion with Gaussian kernels and the median heuristic. Permutation test via hyppo. Baseline comparator.

Baseline

dCov / dCor

Distance covariance-based independence test via hyppo.Dcorr, using permutation calibration. Second baseline comparator across all transforms.

Test Statistic

The full statistic combines three U-statistic terms for an unbiased estimator of the dependence measure:

eq. (1) T = T₁ - 2T₂ + T₃ T₁ = (1/n(n-1)) Σᵢ\neqⱼ Kₐ(i,j) Kᵦ(i,j) T₂ = (1/n(n-1)(n-2)) [ΣS₂ - ΣS₃ terms for KC, KA, KB] T₃ = (1/n(n-1)(n-2)(n-3)) [S₁² - 4(S₂-S₃) - 2S₃ terms]

The plug-in variance estimate is:

eq. (2) Var̂(T̃₁) = (2 / n(n-1)) \cdot tr(Wₐ Ŝₐ Wₐ Ŝₐ) \cdot tr(Wᵦ Ŝᵦ Wᵦ Ŝᵦ)

Transform Families (Simulation DGPs)

trigU

Trigonometric nonlinearity via standard normal quantile transform.

X = sin(Φ⁻¹(u))
Y = cos(b·X + v)

expquad

Exponential-quadratic bump at unit amplitude.

X = exp(−Z²)
Y = exp(−b(X−1)² + v)

linear

Simple linear regression model with additive noise.

X = u
Y = b·X + v

logquad

Phase + amplitude modulation with log-quadratic feature compression.

X = log(1+Z²)/(1+log(1+Z²))
Y = cos(b·X+v)·exp(−b(X−0.7)²)

wa-dCoBET: 10-Fold Adaptive Weighting

1

Split into 10 Folds

For each coordinate pair (r, s), randomly partition the n observations into 10 equal folds using a fixed random seed.

2

SNR Comparison per Fold

On each fold, compute the Z-statistic under identity weights (W = I) and under J-weights (W = J). Pick the weight that yields higher SNR: argmax{Z_id, Z_J}.

3

Blend Weights

Let w_id = (# folds choosing I)/10 and w_J = 1 − w_id. Form the blended weight matrix W_blend = w_id · I + w_J · J.

4

Full-Data Test with Blended W

Apply the blended weight matrix to compute T and Var̂(T) on the full n observations. Obtain Z = T / √Var̂(T) for pair (r, s).

5

BH-FDR Correction

Collect all d² p-values P_{rs} = 1 − Φ(Z_{rs}). Apply Benjamini-Hochberg at level q = 0.05. Stars (★) mark discoveries in the pairwise heatmap.

Interactive Pairwise Heatmap

Simulate the wa-dCoBET pairwise Z-statistic heatmap. Adjust parameters and click Run Simulation to generate a new heatmap. Stars mark BH-FDR significant pairs (q = 0.05).

wa-dCoBET · Pairwise Z-Statistic Heatmap

Transform

Signal b b = 0.40

Dims d

Theta θ

Low Z

High Z

Discoveries: — Mean w_id: — Mean w_J: —

Simulation Results

Power comparison at α = 0.05 across methods, transforms, and sample sizes. Data generated under Clayton(θ=2) copula in dimension d = 5. R = 1000 replications.

Method	Transform	n = 250	n = 500	n = 1000
wa-dCoBET	trigU	0.72	0.91	0.99
dCoBET (J)	trigU	0.68	0.88	0.97
dCoBET (id)	trigU	0.55	0.75	0.90
HSIC	trigU	0.51	0.72	0.89
dCov	trigU	0.43	0.66	0.84
wa-dCoBET	logquad	0.65	0.85	0.97
dCoBET (J)	logquad	0.58	0.79	0.93
HSIC	logquad	0.42	0.63	0.82
dCov	logquad	0.38	0.58	0.77
wa-dCoBET	linear	0.81	0.96	1.00
HSIC	linear	0.79	0.95	1.00
dCov	linear	0.80	0.95	0.99

↑ Green ≥ 0.8 · Yellow 0.5–0.8 · Red < 0.5. Type I error uniformly ≤ 0.06 for all methods.

Code Snippets

wa-dCoBET: 10-Fold Blended Weight

      
def blended_weight_from_10fold(A_pair, B_pair, W_id, W_J, rng):
    folds = ten_folds_indices(A_pair.shape[1], rng)
    picks = []
    for fidx in folds:
        A_f, B_f = A_pair[:, fidx], B_pair[:, fidx]
        Z_id = Z_for_pair(A_f, B_f, W_id)
        Z_J  = Z_for_pair(A_f, B_f, W_J)
        picks.append("identity" if Z_id >= Z_J else "J")

    w_id = sum(p == "identity" for p in picks) / 10.0
    W_blend = w_id * W_id + (1 - w_id) * W_J
    return W_blend, w_id, 1 - w_id

BH-FDR Correction

      
def bh_fdr_mask(pvals_2d, q=0.05):
    p = pvals_2d.flatten(); m = p.size
    order = np.argsort(p); p_sorted = p[order]
    thresh = q * (np.arange(1, m + 1) / m)
    below = p_sorted <= thresh
    reject = np.zeros(m, dtype=bool)
    if np.any(below):
        kmax = np.max(np.where(below))
        reject[order[:kmax + 1]] = True
    return reject.reshape(pvals_2d.shape)

Run Pairwise Heatmap

      
# Reproduce the pairwise heatmap (requires the BET module)
out = pairwise_heatmap_new_stat(
    n=500, d_coords=10, theta=2, K=4,
    b=0.4, transform_key="logquad",
    q_fdr=0.05, seed_data=123,
    unbiased_plugin=True, J_reuse=True
)
# out keys: Z, T, Var, p, sig_bh, w_id, w_J