Adaptive Multiscale Binary Expansion Tests for Independence
CoBET, dCoBET, and wa-dCoBET — a family of nonparametric tests for independence in high-dimensional data, built on dyadic binary expansion features and SNR-guided adaptive weighting. Benchmarked against HSIC and dCov across Clayton-copula simulation settings.
This project develops a suite of nonparametric independence tests that exploit the dyadic (binary expansion) structure of the probability integral transform. The tests are designed to be powerful against diverse alternatives — including trigonometric, exponential-quadratic, linear, and log-quadratic dependence structures — while maintaining valid Type I error control.
Key Contributions
The core statistic decomposes cross-dependence via centered indicator functions on dyadic cells. Three test statistics are presented:
CoBET (univariate baseline), dCoBET (multivariate extension with identity or J-weighted matrices), and wa-dCoBET (weight-adaptive, using 10-fold SNR blending to select between weight matrices on each coordinate pair).
The pairwise heatmap visualization allows simultaneous testing of all d × d coordinate dependencies with BH-FDR multiple testing correction, providing an interpretable discovery map across high-dimensional random vectors.
Methods
Proposed
CoBET
Copula-based Binary Expansion Test. Univariate independence test using dyadic features of the probability integral transform with plug-in variance normalization.
Proposed
dCoBET
Multivariate extension of CoBET. Supports identity and spectral (J) weight matrices with full U-statistic decomposition T₁ − 2T₂ + T₃.
Proposed
wa-dCoBET
Weight-Adaptive dCoBET. 10-fold cross-validation selects the optimal blend of identity vs. J-weighting per coordinate pair, with pairwise heatmap and BH-FDR output.
Baseline
HSIC
Hilbert-Schmidt Independence Criterion with Gaussian kernels and the median heuristic. Permutation test via hyppo. Baseline comparator.
Baseline
dCov / dCor
Distance covariance-based independence test via hyppo.Dcorr, using permutation calibration. Second baseline comparator across all transforms.
Test Statistic
The full statistic combines three U-statistic terms for an unbiased estimator of the dependence measure:
Trigonometric nonlinearity via standard normal quantile transform.
X = sin(Φ⁻¹(u)) Y = cos(b·X + v)
expquad
Exponential-quadratic bump at unit amplitude.
X = exp(−Z²) Y = exp(−b(X−1)² + v)
linear
Simple linear regression model with additive noise.
X = u Y = b·X + v
logquad
Phase + amplitude modulation with log-quadratic feature compression.
X = log(1+Z²)/(1+log(1+Z²)) Y = cos(b·X+v)·exp(−b(X−0.7)²)
wa-dCoBET: 10-Fold Adaptive Weighting
1
Split into 10 Folds
For each coordinate pair (r, s), randomly partition the n observations into 10 equal folds using a fixed random seed.
2
SNR Comparison per Fold
On each fold, compute the Z-statistic under identity weights (W = I) and under J-weights (W = J). Pick the weight that yields higher SNR: argmax{Z_id, Z_J}.
3
Blend Weights
Let w_id = (# folds choosing I)/10 and w_J = 1 − w_id. Form the blended weight matrix W_blend = w_id · I + w_J · J.
4
Full-Data Test with Blended W
Apply the blended weight matrix to compute T and Var̂(T) on the full n observations. Obtain Z = T / √Var̂(T) for pair (r, s).
5
BH-FDR Correction
Collect all d² p-values P_{rs} = 1 − Φ(Z_{rs}). Apply Benjamini-Hochberg at level q = 0.05. Stars (★) mark discoveries in the pairwise heatmap.
Interactive Pairwise Heatmap
Simulate the wa-dCoBET pairwise Z-statistic heatmap. Adjust parameters and click Run Simulation to generate a new heatmap. Stars mark BH-FDR significant pairs (q = 0.05).
wa-dCoBET · Pairwise Z-Statistic Heatmap
Transform
Signal bb = 0.40
Dims d
Theta θ
Low ZHigh Z
Discoveries: —Mean w_id: —Mean w_J: —
Simulation Results
Power comparison at α = 0.05 across methods, transforms, and sample sizes. Data generated under Clayton(θ=2) copula in dimension d = 5. R = 1000 replications.
Method
Transform
n = 250
n = 500
n = 1000
wa-dCoBET
trigU
0.72
0.91
0.99
dCoBET (J)
trigU
0.68
0.88
0.97
dCoBET (id)
trigU
0.55
0.75
0.90
HSIC
trigU
0.51
0.72
0.89
dCov
trigU
0.43
0.66
0.84
wa-dCoBET
logquad
0.65
0.85
0.97
dCoBET (J)
logquad
0.58
0.79
0.93
HSIC
logquad
0.42
0.63
0.82
dCov
logquad
0.38
0.58
0.77
wa-dCoBET
linear
0.81
0.96
1.00
HSIC
linear
0.79
0.95
1.00
dCov
linear
0.80
0.95
0.99
↑ Green ≥ 0.8 · Yellow 0.5–0.8 · Red < 0.5. Type I error uniformly ≤ 0.06 for all methods.