Benchmarks¶
Run with:
mkdir -p docs/benchmark_reports
uv run pytest -n0 -m benchmark --benchmark-enable \
--benchmark-json=docs/benchmark_reports/benchmark_latest.json tests/benchmarks/
Rscript scripts/benchmark_realistic_sd_r.R \
--repeats=3 --max-repeats=1 \
--output=docs/benchmark_reports/realistic_sd_r_latest.csv
uv run python scripts/update_benchmarks_doc.py docs/benchmark_reports/benchmark_latest.json \
--realistic-sd-r-csv=docs/benchmark_reports/realistic_sd_r_latest.csv
Results¶
R baselines use installed R NNS 13.0.
Python speed vs R is computed as R baseline / Python mean. Values above 1.00x mean Python is faster; values below 1.00x mean Python is slower.
| Benchmark | Python mean | R baseline | Python speed vs R |
|---|---|---|---|
lpm small |
0.011 ms | 0.090 ms | 8.00x |
pm matrix scale, 10 |
0.085 ms | 3.600 ms | 42.11x |
pm matrix scale, 50 |
0.487 ms | 7.200 ms | 14.79x |
pm matrix scale, 100 |
13.430 ms | 21.200 ms | 1.58x |
sd efficient set degree 2 scale |
24.352 ms | 4.400 ms | 0.18x |
nns sd cluster 252x50 degree2 |
46.482 ms | 16.600 ms | 0.36x |
nns sd cluster 252x50 degree2 dendrogram |
44.213 ms | 18.667 ms | 0.42x |
nns cdf 1000 degree0 |
0.036 ms | 1.100 ms | 30.43x |
nns cdf 1000 degree2 |
0.113 ms | 1.250 ms | 11.09x |
nns cdf 500x3 degree1 |
47.558 ms | 58.000 ms | 1.22x |
nns dep 1000 |
7.409 ms | 8.700 ms | 1.17x |
nns dep asym 1000 |
7.342 ms | 9.100 ms | 1.24x |
nns copula 1000 |
0.393 ms | 1.900 ms | 4.84x |
nns causation 1000 |
15.763 ms | 34.200 ms | 2.17x |
nns norm 1000x3 |
0.122 ms | 0.620 ms | 5.09x |
nns distance 1000x3 |
0.712 ms | 0.700 ms | 0.98x |
nns distance bulk 1000x3 100 |
5.684 ms | 5.950 ms | 1.05x |
nns distance class 500x3 |
0.602 ms | 0.570 ms | 0.95x |
nns distance bulk class 500x3 50 |
1.387 ms | 1.900 ms | 1.37x |
nns diff sin |
1.258 ms | 3.050 ms | 2.42x |
dy dx numeric eval points |
25.271 ms | 37.350 ms | 1.48x |
dy_d, scalar wrt=1, eval_points=mean, N=2, T_obs=100 |
87.198 ms | 274.800 ms | 3.15x |
dy_d, scalar wrt=1, eval_points=median, N=2, T_obs=100 |
86.289 ms | 260.000 ms | 3.01x |
dy_d, scalar wrt=1, eval_points=last, N=2, T_obs=100 |
93.609 ms | 265.800 ms | 2.84x |
dy_d, scalar wrt=1, eval_points=obs, N=2, T_obs=100 |
89.956 ms | 279.600 ms | 3.11x |
dy_d, scalar wrt=1, eval_points=apd, N=2, T_obs=100 |
700.579 ms | 1117.600 ms | 1.60x |
nns anova 100x2 |
7.713 ms | 3.500 ms | 0.45x |
nns part 500 |
0.726 ms | 2.450 ms | 3.37x |
nns reg 500 |
84.662 ms | 30.400 ms | 0.36x |
nns reg 200 confidence interval |
98.202 ms | 85.200 ms | 0.87x |
nns reg 200 smooth |
18.054 ms | 43.200 ms | 2.39x |
nns reg factor predictor 200 |
27.413 ms | 415.400 ms | 15.15x |
nns reg factor predictor dimred 120 |
59.522 ms | 35.400 ms | 0.59x |
nns reg class 200 |
19.457 ms | 29.800 ms | 1.53x |
nns reg class 200 confidence interval |
34.230 ms | 48.200 ms | 1.41x |
nns reg dimred 200x3 |
44.167 ms | 34.400 ms | 0.78x |
nns m reg 200x3 |
114.786 ms | 88.600 ms | 0.77x |
nns m reg 200x3 confidence interval |
102.176 ms | 125.000 ms | 1.22x |
nns m reg class 200x3 |
54.130 ms | 114.800 ms | 2.12x |
nns m reg class 200x3 confidence interval |
54.759 ms | 123.000 ms | 2.25x |
nns stack 100x3 |
300.295 ms | 360.333 ms | 1.20x |
nns stack factor predictor 60 method1 |
45.079 ms | 207.333 ms | 4.60x |
nns stack mixed factor predictor 60 method2 |
37.416 ms | 118.000 ms | 3.15x |
nns stack mixed factor predictor 100x3 method12 |
376.170 ms | 332.333 ms | 0.88x |
nns stack 100x3 pred int |
180.852 ms | 304.000 ms | 1.68x |
nns stack 100x3 ts test |
316.159 ms | 285.333 ms | 0.90x |
nns stack class 100x3 |
131.847 ms | 261.000 ms | 1.98x |
nns stack class 100x3 pred int |
139.910 ms | 333.333 ms | 2.38x |
nns stack class balance 150x3 |
194.450 ms | 311.667 ms | 1.60x |
nns boost 50x3 |
197.738 ms | 3548.000 ms | 17.94x |
nns boost 50x3 pred int |
144.660 ms | 3844.500 ms | 26.58x |
nns boost 50x3 ts test |
152.450 ms | 3510.000 ms | 23.02x |
nns boost stochastic 64x11 |
269.218 ms | 3219.500 ms | 11.96x |
nns boost stochastic ts test 64x11 |
248.128 ms | 3956.000 ms | 15.94x |
nns boost factor predictor 50x2 |
157.353 ms | 3738.000 ms | 23.76x |
nns boost multi factor predictor 50x3 |
202.788 ms | 4429.000 ms | 21.84x |
nns boost class 50x3 |
176.135 ms | 4333.000 ms | 24.60x |
nns boost class 50x3 pred int |
263.675 ms | 4183.000 ms | 15.86x |
nns boost class balance 80x3 |
401.302 ms | 4508.500 ms | 11.23x |
nns mode continuous 1000 |
0.467 ms | 0.090 ms | 0.19x |
nns seas 1000 |
0.012 ms | 1.250 ms | 104.57x |
nns seas 5000 |
0.026 ms | 5.900 ms | 230.05x |
nns arma 500 auto nonlin |
20.021 ms | 334.333 ms | 16.70x |
nns arma 500 explicit12 nonlin |
70.419 ms | 350.333 ms | 4.97x |
nns arma 200 explicit4 lin predint |
169.046 ms | 213.400 ms | 1.26x |
nns arma 200 auto nonlin predint |
181.461 ms | 373.800 ms | 2.06x |
nns arma optim 80 small |
35.850 ms | 544.333 ms | 15.18x |
nns_var, dim_red_method=cor, N=3, T_obs=80, h=3, tau=2 |
834.707 ms | 3778.667 ms | 4.53x |
nns_var, dim_red_method=NNS.dep, N=3, T_obs=80, h=3, tau=2 |
1572.523 ms | 6381.667 ms | 4.06x |
nns_var, dim_red_method=NNS.caus, N=3, T_obs=80, h=3, tau=2 |
3394.805 ms | 9718.667 ms | 2.86x |
nns_var, dim_red_method=all, N=3, T_obs=80, h=3, tau=2 |
4087.817 ms | 9976.333 ms | 2.44x |
nns meboot 500 reps100 |
71.581 ms | 98.333 ms | 1.37x |
nns meboot 1000 reps100 |
102.197 ms | 147.667 ms | 1.44x |
nns mc 500 reps30 by02 |
301.693 ms | 638.000 ms | 2.11x |
nns mc 500 reps30 by01 |
631.741 ms | 1334.333 ms | 2.11x |
nns ss 1000 |
0.051 ms | 0.260 ms | 5.08x |
nns ss 200 ci reps100 |
161.687 ms | 173.667 ms | 1.07x |
Realistic Finance SD North Stars¶
These benchmarks use the static daily-return fixture at
tests/fixtures/finance/sp500_daily_returns_2019_2023.csv. That finance
fixture is local-only and not tracked in git; the latest recorded run used 1257
daily return rows and 480 clean return columns after dropping
tickers with missing or non-finite returns. Constituent-universe benchmarks exclude
SPY and GSPC, leaving 478 columns. Market-relative workflows
prefer GSPC and fall back to SPY; tradable-proxy examples use SPY.
Benchmark-column sanity metadata:
- SPY/GSPC correlation: 0.998873
- Mean absolute daily return difference: 0.000372
- Max absolute daily return difference: 0.010417
Python timings come from pytest-benchmark. R timings come from
scripts/benchmark_realistic_sd_r.R when --realistic-sd-r-csv is supplied to
the updater. Rows marked manual placeholder use the last manually recorded R
baseline so Python/R comparisons remain visible when R has not been rerun.
Run only the realistic Python benchmarks with:
PYNNS_OFFLINE=1 uv run pytest -q -n0 -m benchmark --benchmark-enable \
--benchmark-json=docs/benchmark_reports/realistic_sd_python_latest.json \
tests/benchmarks/test_stochastic_dominance_realistic.py \
tests/benchmarks/test_finance_sd_rolling.py \
tests/benchmarks/test_finance_partial_moment_workflows.py
Run matching R baselines with:
Rscript scripts/benchmark_realistic_sd_r.R \
--repeats=3 --max-repeats=1 \
--output=docs/benchmark_reports/realistic_sd_r_latest.csv
Python/R slowdown is computed as Python mean / R mean. Values above 1.00x
mean Python is slower than R.
| Realistic benchmark | Python mean | R mean | R source | Python/R slowdown |
|---|---|---|---|---|
nns_sd_cluster, degree=1, N=50, T_obs=252 |
32.096 ms | 3.000 ms | measured | 10.70x |
sd_efficient_set, degree=1, N=50, T_obs=252 |
27.599 ms | 2.667 ms | measured | 10.35x |
nns_sd_cluster, degree=2, N=50, T_obs=252 |
27.428 ms | 6.000 ms | measured | 4.57x |
sd_efficient_set, degree=2, N=50, T_obs=252 |
17.809 ms | 2.000 ms | measured | 8.90x |
nns_sd_cluster, degree=1, N=100, T_obs=252 |
4.535 ms | 6.333 ms | measured | 0.72x |
sd_efficient_set, degree=1, N=100, T_obs=252 |
9.349 ms | 6.000 ms | measured | 1.56x |
nns_sd_cluster, degree=2, N=100, T_obs=252 |
19.124 ms | 19.000 ms | measured | 1.01x |
sd_efficient_set, degree=2, N=100, T_obs=252 |
9.151 ms | 4.667 ms | measured | 1.96x |
nns_sd_cluster, degree=2, N=250, T_obs=252 |
76.885 ms | 59.333 ms | measured | 1.30x |
sd_efficient_set, degree=2, N=250, T_obs=252 |
31.741 ms | 15.000 ms | measured | 2.12x |
nns_sd_cluster, degree=2, N=478, T_obs=252 |
326.310 ms | 194.333 ms | measured | 1.68x |
sd_efficient_set, degree=2, N=478, T_obs=252 |
106.190 ms | 37.000 ms | measured | 2.87x |
sd_efficient_set, degree=2, N=100, T_obs=1257 |
32.352 ms | 21.333 ms | measured | 1.52x |
nns_sd_cluster, degree=2, N=250, T_obs=1257 |
296.694 ms | 209.333 ms | measured | 1.42x |
sd_efficient_set, degree=2, N=250, T_obs=1257 |
192.960 ms | 70.667 ms | measured | 2.73x |
nns_sd_cluster, degree=2, N=478, T_obs=1257 |
992.481 ms | 663.000 ms | measured | 1.50x |
sd_efficient_set, degree=2, N=478, T_obs=1257 |
979.089 ms | 194.000 ms | measured | 5.05x |
Additional realistic finance workflow benchmarks:
| Benchmark | Python mean | R mean | R source | Python/R slowdown | Summary metadata |
|---|---|---|---|---|---|
| Lower/upper constituent dispersion ratio, N=100, T_obs=252 | 0.138 ms | n/a | n/a | n/a | n/a |
| Magnificent Seven downside stress components with SPY | 0.406 ms | n/a | n/a | n/a | n/a |
| Magnificent Seven market-downside stress components | 10.824 ms | 47.000 ms | measured | 0.23x | downside obs: 172; stress R2: 0.7852; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042 |
| Market-relative daily dispersion, full fixture | 11.549 ms | 37.667 ms | measured | 0.31x | signal len: 1257; finite: 1257; next-day corr: 0.06635; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042 |
| Market-relative rolling dispersion signal, 252d | 11.956 ms | 37.667 ms | measured | 0.32x | signal len: 1006; finite: 1006; next-day corr: 0.03746; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042 |
| Market-relative rolling dispersion signal, 63d | 9.073 ms | 39.333 ms | measured | 0.23x | signal len: 1195; finite: 1195; next-day corr: 0.02139; SPY/GSPC corr: 0.9989; mean abs diff: 0.0003716; max abs diff: 0.01042 |
| Partial-moment covariance workflow, 1257d-degree1-mean | 30.235 ms | 1.587 s | measured | 0.02x | rows: 1257; cols: 478; matrix N: 478 |
| Partial-moment covariance workflow, 252d-degree1-mean | 17.619 ms | 296.333 ms | measured | 0.06x | rows: 252; cols: 478; matrix N: 478 |
| Partial-moment covariance workflow, 252d-degree2-zero | 25.084 ms | 302.000 ms | measured | 0.08x | rows: 252; cols: 478; matrix N: 478 |
| Rolling SD cluster, 252-day monthly, degree=2, n100 | 785.532 ms | 787.000 ms | measured | 1.00x | windows: 48; avg set: 14.29; avg clusters: 8.375 |
| Rolling SD cluster, 252-day monthly, degree=2, nmax | 10.759 s | 9.873 s | measured | 1.09x | windows: 48; avg set: 29.48; avg clusters: 13.65 |
| Rolling SD cluster, 252-day quarterly, degree=1 | 1.809 s | 1.226 s | measured | 1.48x | windows: 16; avg set: 468.5; avg clusters: 1.812 |
| Rolling SD cluster, 756-day quarterly, degree=2 | 4.327 s | 4.182 s | measured | 1.03x | windows: 9; avg set: 33.11; avg clusters: 11.89 |
| Rolling SD efficient set, 252-day monthly, degree=2, n100 | 296.756 ms | 259.667 ms | measured | 1.14x | windows: 48; avg set: 14.29; avg turnover: 0.4598 |
| Rolling SD efficient set, 252-day monthly, degree=2, nmax | 3.701 s | 2.400 s | measured | 1.54x | windows: 48; avg set: 29.48; avg turnover: 0.5228 |
| Rolling SD efficient set, 252-day quarterly, degree 1 vs 2 | 3.100 s | 1.931 s | measured | 1.61x | windows: 16; avg d1 set: 468.5; avg d2 set: 29.56 |
| Rolling SD efficient set, 252-day quarterly, degree=1 | 1.722 s | 1.259 s | measured | 1.37x | windows: 16; avg set: 468.5; avg turnover: 0.03102 |
Interpretation:
- Large degree-1 discrete SD uses an exact order-statistic dominance matrix: one empirical sample FSD-dominates another iff every sorted order statistic is at least as large, with at least one strict improvement.
- Guarded prefix-pair evaluation skips curve work for min/mean/identical impossible pairs, and the standalone efficient-set path only checks already-kept candidates for degree 2/3 and degree-1 continuous cases.
- The implementation deliberately follows R's C++ SD algorithmic structure: sorted columns, prefix sums, pair-threshold dominance checks, exact guards, and no tolerance-based shortcuts.
- Full-fixture NNS Python runs are feasible for research iteration, but R's C++ SD core remains materially faster on the largest cluster cases.