Skip to content

Benchmarks

Run with:

mkdir -p docs/benchmark_reports
uv run pytest -n0 -m benchmark --benchmark-enable \
  --benchmark-json=docs/benchmark_reports/benchmark_latest.json tests/benchmarks/
Rscript scripts/benchmark_realistic_sd_r.R \
  --repeats=3 --max-repeats=1 \
  --output=docs/benchmark_reports/realistic_sd_r_latest.csv
uv run python scripts/update_benchmarks_doc.py docs/benchmark_reports/benchmark_latest.json \
  --realistic-sd-r-csv=docs/benchmark_reports/realistic_sd_r_latest.csv

Results

R baselines use installed R NNS 13.0.

Python speed vs R is computed as R baseline / Python mean. Values above 1.00x mean Python is faster; values below 1.00x mean Python is slower.

Benchmark Python mean R baseline Python speed vs R
lpm small 0.011 ms 0.090 ms 8.00x
pm matrix scale, 10 0.085 ms 3.600 ms 42.11x
pm matrix scale, 50 0.487 ms 7.200 ms 14.79x
pm matrix scale, 100 13.430 ms 21.200 ms 1.58x
sd efficient set degree 2 scale 24.352 ms 4.400 ms 0.18x
nns sd cluster 252x50 degree2 46.482 ms 16.600 ms 0.36x
nns sd cluster 252x50 degree2 dendrogram 44.213 ms 18.667 ms 0.42x
nns cdf 1000 degree0 0.036 ms 1.100 ms 30.43x
nns cdf 1000 degree2 0.113 ms 1.250 ms 11.09x
nns cdf 500x3 degree1 47.558 ms 58.000 ms 1.22x
nns dep 1000 7.409 ms 8.700 ms 1.17x
nns dep asym 1000 7.342 ms 9.100 ms 1.24x
nns copula 1000 0.393 ms 1.900 ms 4.84x
nns causation 1000 15.763 ms 34.200 ms 2.17x
nns norm 1000x3 0.122 ms 0.620 ms 5.09x
nns distance 1000x3 0.712 ms 0.700 ms 0.98x
nns distance bulk 1000x3 100 5.684 ms 5.950 ms 1.05x
nns distance class 500x3 0.602 ms 0.570 ms 0.95x
nns distance bulk class 500x3 50 1.387 ms 1.900 ms 1.37x
nns diff sin 1.258 ms 3.050 ms 2.42x
dy dx numeric eval points 25.271 ms 37.350 ms 1.48x
dy_d, scalar wrt=1, eval_points=mean, N=2, T_obs=100 87.198 ms 274.800 ms 3.15x
dy_d, scalar wrt=1, eval_points=median, N=2, T_obs=100 86.289 ms 260.000 ms 3.01x
dy_d, scalar wrt=1, eval_points=last, N=2, T_obs=100 93.609 ms 265.800 ms 2.84x
dy_d, scalar wrt=1, eval_points=obs, N=2, T_obs=100 89.956 ms 279.600 ms 3.11x
dy_d, scalar wrt=1, eval_points=apd, N=2, T_obs=100 700.579 ms 1117.600 ms 1.60x
nns anova 100x2 7.713 ms 3.500 ms 0.45x
nns part 500 0.726 ms 2.450 ms 3.37x
nns reg 500 84.662 ms 30.400 ms 0.36x
nns reg 200 confidence interval 98.202 ms 85.200 ms 0.87x
nns reg 200 smooth 18.054 ms 43.200 ms 2.39x
nns reg factor predictor 200 27.413 ms 415.400 ms 15.15x
nns reg factor predictor dimred 120 59.522 ms 35.400 ms 0.59x
nns reg class 200 19.457 ms 29.800 ms 1.53x
nns reg class 200 confidence interval 34.230 ms 48.200 ms 1.41x
nns reg dimred 200x3 44.167 ms 34.400 ms 0.78x
nns m reg 200x3 114.786 ms 88.600 ms 0.77x
nns m reg 200x3 confidence interval 102.176 ms 125.000 ms 1.22x
nns m reg class 200x3 54.130 ms 114.800 ms 2.12x
nns m reg class 200x3 confidence interval 54.759 ms 123.000 ms 2.25x
nns stack 100x3 300.295 ms 360.333 ms 1.20x
nns stack factor predictor 60 method1 45.079 ms 207.333 ms 4.60x
nns stack mixed factor predictor 60 method2 37.416 ms 118.000 ms 3.15x
nns stack mixed factor predictor 100x3 method12 376.170 ms 332.333 ms 0.88x
nns stack 100x3 pred int 180.852 ms 304.000 ms 1.68x
nns stack 100x3 ts test 316.159 ms 285.333 ms 0.90x
nns stack class 100x3 131.847 ms 261.000 ms 1.98x
nns stack class 100x3 pred int 139.910 ms 333.333 ms 2.38x
nns stack class balance 150x3 194.450 ms 311.667 ms 1.60x
nns boost 50x3 197.738 ms 3548.000 ms 17.94x
nns boost 50x3 pred int 144.660 ms 3844.500 ms 26.58x
nns boost 50x3 ts test 152.450 ms 3510.000 ms 23.02x
nns boost stochastic 64x11 269.218 ms 3219.500 ms 11.96x
nns boost stochastic ts test 64x11 248.128 ms 3956.000 ms 15.94x
nns boost factor predictor 50x2 157.353 ms 3738.000 ms 23.76x
nns boost multi factor predictor 50x3 202.788 ms 4429.000 ms 21.84x
nns boost class 50x3 176.135 ms 4333.000 ms 24.60x
nns boost class 50x3 pred int 263.675 ms 4183.000 ms 15.86x
nns boost class balance 80x3 401.302 ms 4508.500 ms 11.23x
nns mode continuous 1000 0.467 ms 0.090 ms 0.19x
nns seas 1000 0.012 ms 1.250 ms 104.57x
nns seas 5000 0.026 ms 5.900 ms 230.05x
nns arma 500 auto nonlin 20.021 ms 334.333 ms 16.70x
nns arma 500 explicit12 nonlin 70.419 ms 350.333 ms 4.97x
nns arma 200 explicit4 lin predint 169.046 ms 213.400 ms 1.26x
nns arma 200 auto nonlin predint 181.461 ms 373.800 ms 2.06x
nns arma optim 80 small 35.850 ms 544.333 ms 15.18x
nns_var, dim_red_method=cor, N=3, T_obs=80, h=3, tau=2 834.707 ms 3778.667 ms 4.53x
nns_var, dim_red_method=NNS.dep, N=3, T_obs=80, h=3, tau=2 1572.523 ms 6381.667 ms 4.06x
nns_var, dim_red_method=NNS.caus, N=3, T_obs=80, h=3, tau=2 3394.805 ms 9718.667 ms 2.86x
nns_var, dim_red_method=all, N=3, T_obs=80, h=3, tau=2 4087.817 ms 9976.333 ms 2.44x
nns meboot 500 reps100 71.581 ms 98.333 ms 1.37x
nns meboot 1000 reps100 102.197 ms 147.667 ms 1.44x
nns mc 500 reps30 by02 301.693 ms 638.000 ms 2.11x
nns mc 500 reps30 by01 631.741 ms 1334.333 ms 2.11x
nns ss 1000 0.051 ms 0.260 ms 5.08x
nns ss 200 ci reps100 161.687 ms 173.667 ms 1.07x

Realistic Finance SD North Stars

These benchmarks use the static daily-return fixture at tests/fixtures/finance/sp500_daily_returns_2019_2023.csv. That finance fixture is local-only and not tracked in git; the latest recorded run used 1257 daily return rows and 480 clean return columns after dropping tickers with missing or non-finite returns. Constituent-universe benchmarks exclude SPY and GSPC, leaving 478 columns. Market-relative workflows prefer GSPC and fall back to SPY; tradable-proxy examples use SPY.

Benchmark-column sanity metadata:

  • SPY/GSPC correlation: 0.998873
  • Mean absolute daily return difference: 0.000372
  • Max absolute daily return difference: 0.010417

Python timings come from pytest-benchmark. R timings come from scripts/benchmark_realistic_sd_r.R when --realistic-sd-r-csv is supplied to the updater. Rows marked manual placeholder use the last manually recorded R baseline so Python/R comparisons remain visible when R has not been rerun.

Run only the realistic Python benchmarks with:

PYNNS_OFFLINE=1 uv run pytest -q -n0 -m benchmark --benchmark-enable \
  --benchmark-json=docs/benchmark_reports/realistic_sd_python_latest.json \
  tests/benchmarks/test_stochastic_dominance_realistic.py \
  tests/benchmarks/test_finance_sd_rolling.py \
  tests/benchmarks/test_finance_partial_moment_workflows.py

Run matching R baselines with:

Rscript scripts/benchmark_realistic_sd_r.R \
  --repeats=3 --max-repeats=1 \
  --output=docs/benchmark_reports/realistic_sd_r_latest.csv

Python/R slowdown is computed as Python mean / R mean. Values above 1.00x mean Python is slower than R.

Realistic benchmark Python mean R mean R source Python/R slowdown
nns_sd_cluster, degree=1, N=50, T_obs=252 32.096 ms 3.000 ms measured 10.70x
sd_efficient_set, degree=1, N=50, T_obs=252 27.599 ms 2.667 ms measured 10.35x
nns_sd_cluster, degree=2, N=50, T_obs=252 27.428 ms 6.000 ms measured 4.57x
sd_efficient_set, degree=2, N=50, T_obs=252 17.809 ms 2.000 ms measured 8.90x
nns_sd_cluster, degree=1, N=100, T_obs=252 4.535 ms 6.333 ms measured 0.72x
sd_efficient_set, degree=1, N=100, T_obs=252 9.349 ms 6.000 ms measured 1.56x
nns_sd_cluster, degree=2, N=100, T_obs=252 19.124 ms 19.000 ms measured 1.01x
sd_efficient_set, degree=2, N=100, T_obs=252 9.151 ms 4.667 ms measured 1.96x
nns_sd_cluster, degree=2, N=250, T_obs=252 76.885 ms 59.333 ms measured 1.30x
sd_efficient_set, degree=2, N=250, T_obs=252 31.741 ms 15.000 ms measured 2.12x
nns_sd_cluster, degree=2, N=478, T_obs=252 326.310 ms 194.333 ms measured 1.68x
sd_efficient_set, degree=2, N=478, T_obs=252 106.190 ms 37.000 ms measured 2.87x
sd_efficient_set, degree=2, N=100, T_obs=1257 32.352 ms 21.333 ms measured 1.52x
nns_sd_cluster, degree=2, N=250, T_obs=1257 296.694 ms 209.333 ms measured 1.42x
sd_efficient_set, degree=2, N=250, T_obs=1257 192.960 ms 70.667 ms measured 2.73x
nns_sd_cluster, degree=2, N=478, T_obs=1257 992.481 ms 663.000 ms measured 1.50x
sd_efficient_set, degree=2, N=478, T_obs=1257 979.089 ms 194.000 ms measured 5.05x

Additional realistic finance workflow benchmarks:

Benchmark Python mean R mean R source Python/R slowdown Summary metadata
Lower/upper constituent dispersion ratio, N=100, T_obs=252 0.138 ms n/a n/a n/a n/a
Magnificent Seven downside stress components with SPY 0.406 ms n/a n/a n/a n/a
Magnificent Seven market-downside stress components 10.824 ms 47.000 ms measured 0.23x downside obs: 172; stress R2: 0.7852; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042
Market-relative daily dispersion, full fixture 11.549 ms 37.667 ms measured 0.31x signal len: 1257; finite: 1257; next-day corr: 0.06635; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042
Market-relative rolling dispersion signal, 252d 11.956 ms 37.667 ms measured 0.32x signal len: 1006; finite: 1006; next-day corr: 0.03746; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042
Market-relative rolling dispersion signal, 63d 9.073 ms 39.333 ms measured 0.23x signal len: 1195; finite: 1195; next-day corr: 0.02139; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042
Partial-moment covariance workflow, 1257d-degree1-mean 30.235 ms 1.587 s measured 0.02x rows: 1257; cols: 478; matrix N: 478
Partial-moment covariance workflow, 252d-degree1-mean 17.619 ms 296.333 ms measured 0.06x rows: 252; cols: 478; matrix N: 478
Partial-moment covariance workflow, 252d-degree2-zero 25.084 ms 302.000 ms measured 0.08x rows: 252; cols: 478; matrix N: 478
Rolling SD cluster, 252-day monthly, degree=2, n100 785.532 ms 787.000 ms measured 1.00x windows: 48; avg set: 14.29; avg clusters: 8.375
Rolling SD cluster, 252-day monthly, degree=2, nmax 10.759 s 9.873 s measured 1.09x windows: 48; avg set: 29.48; avg clusters: 13.65
Rolling SD cluster, 252-day quarterly, degree=1 1.809 s 1.226 s measured 1.48x windows: 16; avg set: 468.5; avg clusters: 1.812
Rolling SD cluster, 756-day quarterly, degree=2 4.327 s 4.182 s measured 1.03x windows: 9; avg set: 33.11; avg clusters: 11.89
Rolling SD efficient set, 252-day monthly, degree=2, n100 296.756 ms 259.667 ms measured 1.14x windows: 48; avg set: 14.29; avg turnover: 0.4598
Rolling SD efficient set, 252-day monthly, degree=2, nmax 3.701 s 2.400 s measured 1.54x windows: 48; avg set: 29.48; avg turnover: 0.5228
Rolling SD efficient set, 252-day quarterly, degree 1 vs 2 3.100 s 1.931 s measured 1.61x windows: 16; avg d1 set: 468.5; avg d2 set: 29.56
Rolling SD efficient set, 252-day quarterly, degree=1 1.722 s 1.259 s measured 1.37x windows: 16; avg set: 468.5; avg turnover: 0.03102

Interpretation:

  • Large degree-1 discrete SD uses an exact order-statistic dominance matrix: one empirical sample FSD-dominates another iff every sorted order statistic is at least as large, with at least one strict improvement.
  • Guarded prefix-pair evaluation skips curve work for min/mean/identical impossible pairs, and the standalone efficient-set path only checks already-kept candidates for degree 2/3 and degree-1 continuous cases.
  • The implementation deliberately follows R's C++ SD algorithmic structure: sorted columns, prefix sums, pair-threshold dominance checks, exact guards, and no tolerance-based shortcuts.
  • Full-fixture NNS Python runs are feasible for research iteration, but R's C++ SD core remains materially faster on the largest cluster cases.