Conventions¶
Build¶
NNS Python is a Python-native NumPy/SciPy port with optional private native
acceleration through nns._nnscore where available. Source builds use
scikit-build-core and nanobind to compile the extension; published wheels
should be preferred when available. Public APIs keep Python implementations and
explicit fallback behavior, so native code remains a deliberate,
benchmark-backed implementation detail rather than a public API.
Degree-Zero Boundary¶
At degree zero, LPM uses x <= T and UPM uses x > T.
Equality is counted by LPM only.
For any non-empty finite input, LPM + UPM = 1 at degree zero.
Empty Input Divergence From R¶
R NNS returns NaN for empty input.
NNS Python raises ValueError.
Rationale: empty arrays in Python are upstream bugs, and NumPy convention is to warn or fail on empty reductions rather than silently produce a meaningful statistic.
Co-Moment Length Mismatch Divergence From R¶
R NNS warns when x and y lengths differ, computes over the shorter length, and divides by the longer length.
NNS Python raises ValueError.
Rationale: mismatched co-moment inputs lose observations silently in R. Python callers should fix alignment before computing a bivariate statistic.
PM Matrix Target Defaults¶
R PM.matrix uses column means when target is NULL or any non-numeric value.
NNS Python accepts None and "mean" for this behavior. NNS Python also broadcasts a
scalar numeric target across all variables; R requires callers to pass an
explicit vector such as rep(0, ncol(variable)). Target vectors whose length
does not match the number of variables raise ValueError.
Classical Moment Normalization¶
mean_pm, var_pm, skew_pm, and kurt_pm use population normalization by
default, matching NumPy defaults and NNS.moments(population = TRUE). var_pm
accepts ddof for NumPy-style variance scaling. skew_pm and kurt_pm do not
apply SciPy's optional finite-sample bias correction.
nns_moments is the public NNS.moments wrapper and returns R's dictionary
shape with mean, variance, skewness, and kurtosis. nns_gravity exposes
R's public NNS.gravity central-tendency helper. fsd_uni, ssd_uni, and
tsd_uni are the unidirectional stochastic-dominance wrappers behind R's
.uni exports. co_lpm_nd, co_upm_nd, and dpm_nd expose the public
n-dimensional partial-moment wrappers.
nns_ss maps to R's NNS.SS stochastic-superiority function, not to the
stochastic-dominance tests. It returns p_gt = P(X > Y), p_tie = P(X = Y),
and p_star = p_gt + 0.5 * p_tie. NaN values are omitted independently from
x and y, matching R's na.omit preprocessing. With
confidence_interval=True, intervals are computed through nns_meboot,
lpm_var, and upm_var; exact bootstrap parity with R is not expected because
the RNG streams differ. random_seed is a NNS Python-only reproducibility
convenience for that stochastic path.
nns_sd_cluster maps to R's NNS.SD.cluster default path. It iteratively
peels sd_efficient_set results and returns a dictionary of Cluster_1,
Cluster_2, ... memberships. The output contains variable names, not numeric
cluster labels; when names are omitted, NNS Python uses R-style X_1, X_2, ...
names. type="continuous" is supported for first-degree efficient sets.
dendrogram=True returns a plain dictionary mirroring R's hclust fields:
merge, height, order, labels, method, call, and dist.method.
NNS Python does not plot the dendrogram; it only returns the object data.
The stochastic-dominance implementation is deliberately pure NumPy. It mirrors R's C++ SD core mathematically by sorting each column once, storing prefix sums, and evaluating dominance on each pair's merged threshold grid rather than on one global all-column grid. The full prefix-pair dominance matrix remains available internally for verification and fallback. Large degree-1 discrete calls use an exact order-statistic dominance matrix: with equal-length empirical samples, one sample first-order stochastically dominates another exactly when every sorted order statistic is at least as large and at least one is strictly larger. Large degree-1 continuous and degree 2/3 calls use a lazy kept-only prefix scan. Columns are visited in R's LPM-at-global-maximum order with original-index tie breaks, and only already-kept candidates are tested against the current column. Each prefix pair check applies min/mean/identical guards before evaluating curves, then exits as soon as dominance is disproved.
These choices preserve exact R-style dominance semantics: no tolerances,
approximate equality, output reordering, or diagonal/identical-column behavior
changes are introduced. Polars is intentionally not used in this SD kernel
because the hot path is dense pairwise threshold evaluation rather than
data-frame grouping or filtering. R remains faster on some large finance
fixtures because its C++ path walks merged sorted thresholds in tight parallel
loops with minimal temporaries; NNS Python instead uses NumPy order-statistic blocks,
searchsorted, contiguous column storage, and early-exit scans to stay
dependency-light and pure Python.
nns_cdf maps to R's NNS.CDF deterministic non-plotting paths. It is a
partial-moment distribution wrapper rather than a textbook ECDF: degree = 0
uses R's lower-partial-moment frequency convention, and positive degrees use
LPM.ratio deformation. Univariate output columns follow installed R (x plus
CDF, S(x), h(x), or H(x)), while multivariate output keeps the final
column named CDF for all types, including survival, hazard, and cumulative
hazard. Plotting is ignored. The univariate NA/Inf comparison quirks are
handled inside nns_cdf without loosening the global partial-moment APIs.
Dependence¶
nns_dep follows R's NNS.dep bivariate path, including NNS.gravity handling
for zero-range inputs and non-positive or non-finite bin widths. NNS Python also caps
the internal gravity bin count at 4 * len(input) to prevent pathological
allocations on inputs where R's C++ int conversion effectively collapses an
absurd bin count. abs(Correlation) <= Dependence is not guaranteed by
NNS.dep; both R and NNS Python can return signed correlation magnitudes above the
dependence component for near-binary inputs.
Copula¶
nns_copula(x, y) is the bivariate scalar form of R's NNS.copula(cbind(x, y)).
When targets are omitted, NNS Python uses column means, matching R's target = NULL.
The target_x and target_y arguments map to R's two-element target vector.
Causation¶
nns_causation(x, y) maps to R's NNS.caus(x, y, tau = 0, p.value = FALSE)
numeric-vector path. It returns the two directional components and the named
signed net log-ratio key selected by R, either C(x--->y) or C(y--->x).
causal_matrix maps to R's NNS.caus.matrix antisymmetric matrix convention.
tau='ts' uses nns_seas(... )["periods"] exactly like installed R: the first
period not exceeding sqrt(length(x)) is selected per variable, including
harmonics when R selects them. Inputs with no eligible selected period follow
R's failure convention and raise. Numeric tau lag values remain fully
supported.
Partition¶
nns_part maps to R's NNS.part but returns plain NumPy arrays instead of
data.table objects: "dt" and "regression.points" are dictionaries of
arrays. Installed R 13.0 only distinguishes type = NULL from any non-null
type: None uses XY quadrant splits, while every non-None value uses
X-only splits. This differs from documentation that implies separate "X",
"Y", and "XONLY" modes. NNS Python matches the installed binary.
order="max" is rejected with TypeError; installed R coerces it to NA and
returns a useless zero-order map. All five noise_reduction modes are
supported: "off", "mean", "median", "mode", and "mode_class".
Regression¶
nns_reg maps to R's univariate numeric NNS.reg path with
factor.2.dummy = FALSE and plotting disabled. Return keys match R's list names, but data.table outputs are plain
dictionaries of NumPy arrays. multivariate_call=True returns R's internal
two-column regression-point structure as {"x": ..., "y": ...} for
nns_m_reg, including after dimension-reduction projection. Matrix x without
dimension reduction dispatches to nns_m_reg.
Classification is supported for numeric/logical/factor-like class-code targets.
smooth=True follows installed R's ordinary piecewise fallback for univariate
inputs with fewer than four observations and for univariate order="max"; R
does not call smooth.spline there. Spline-eligible inputs use a private
fixed-spar cubic smoothing-spline adapter matching the stats::smooth.spline
subset used by NNS.reg: spar = (dependence + 0.5) / 2, R-style knots, and
R's interior-band trace ratio for lambda.
Factor predictor expansion is supported through the public nns_reg path.
When combined with dimension reduction, factor predictors are expanded with
R's full-rank dummy convention before synthetic x.star coefficients are
computed. For callers that want direct multivariate regression, use
prepare_factor_predictors(...) first and pass the returned numeric design
matrix into nns_m_reg(...):
from nns import nns_m_reg, prepare_factor_predictors
design = prepare_factor_predictors(
x,
point_est=point_est,
factor_levels=(["low", "mid", "high"], None),
names=("rating", "score"),
)
fit = nns_m_reg(design.x, y, point_est=design.point_est)
prepare_factor_predictors(...) uses the same full-rank dummy expansion as
nns_reg(..., factor_2_dummy=True), combines training x and point_est
before expansion, and returns deterministic feature names.
Numeric dimension reduction is supported for "cor", "NNS.dep",
"NNS.caus", "all", "equal", and numeric coefficient vectors. The
synthetic x.star projection follows R's min-max normalization and denominator
conventions, including joint normalization for point_est. In this dim-red
regression path, tau="ts" follows R's direct Uni.caus call and maps to a
fixed lag of 3; public nns_causation(..., tau="ts") still uses the
NNS.seas-derived lag path. The "NNS.caus" branch uses the ported Uni.caus
internals and may differ from installed R at small asymmetric dependence
granularity.
order="max" follows installed R's univariate convention: fitted values are the
observed y values and regression.points is the sorted observed (x, y) map.
The derivative table still comes from R's pre-reset regression-point construction,
which NNS Python matches rather than recomputing adjacent slopes from all observations.
The "mode" and "mode_class" noise-reduction modes are accepted in the
univariate path and use the shared nns_part/nns_mode implementation. The
"mode_class" default-order path can produce segment standard.errors values
that differ from R at floating grouping granularity: installed R groups the
gradient column through data.table's numeric radix grouping, while NumPy keeps
near-identical binary floating values as separate groups. Regression points,
coefficients, fitted values, and point estimates still match R on that path.
Regression confidence intervals are deterministic and use R's LPM.VaR /
UPM.VaR logic, not nns_mc / nns_meboot. In the univariate fitted table,
both conf.int.pos and conf.int.neg use UPM.VaR(..., degree = 1) on
segment residuals, matching installed R even though the lower side might look
like an LPM candidate. Univariate point_est prediction intervals use
UPM.VaR(..., degree = 0) for the upper column and LPM.VaR(..., degree = 0)
for the lower column. Below-range univariate point estimates follow R's
findInterval/data.table behavior: index 0 rows are dropped, so pred.int
can have fewer rows than Point.est. For class mode, fitted confidence columns
remain raw numeric values, while univariate pred.int columns are rounded with
R's x %% 1 < 0.5 rule. Spline-eligible smooth=True interval tables use the
same deterministic residual VaR logic after smoothing, matching installed R.
Multivariate Regression¶
nns_m_reg maps to installed R's numeric NNS.M.reg path with
factor.2.dummy = FALSE and plotting disabled.
Outputs use R's keys (R2, rhs.partitions, RPM, Point.est, pred.int,
and Fitted.xy) with data.table objects represented as dictionaries of NumPy
arrays. Numeric and class confidence intervals are deterministic and use the
global residual UPM.VaR(..., degree = 1) offset from installed R. In class
mode, fitted predictions and point estimates are rounded/clamped to class codes,
but pred.int lower/upper bounds and fitted confidence columns remain raw
numeric values. Classification mode (type="class") is supported for
numeric/logical/factor-like targets and returns numeric class codes. Direct
nns_m_reg(..., factor_2_dummy=True) remains rejected for raw factor
predictors because installed R errors on that path. This is an intentional API
boundary rather than a mathematical gap: nns_m_reg is the numeric
multivariate engine, while prepare_factor_predictors(...) performs the
R-compatible categorical design-matrix preparation. Public nns_reg factor
predictor expansion is also supported with factor_2_dummy=True and explicit
factor_levels= metadata; it combines training x and point_est before
full-rank dummy expansion, matching installed R's factor_2_dummy_FR path.
Point estimates match installed R, including the one-row outsider behavior in
the multi-point path where R drops matrix dimensions before extrapolating.
order="max" follows R's convention of using the original regressor matrix as
the regression-point matrix and defaulting n.best to 1.
Stack¶
nns_stack maps to R's numeric and deterministic classification NNS.stack
paths using the real nns_reg dimension-reduction and multivariate-regression
internals. type="class" is supported for numeric/logical/factor-like targets
and returns numeric class codes, not labels. Use class_levels= to reproduce R
factor level ordering. Raw string labels remain rejected unless explicit levels
are supplied. balance=True is supported for classification and follows R's
downSample + upSample structure: each non-empty class is downsampled to the
minority count without replacement, each class is upsampled to the majority
count with replacement, and the downsampled rows are concatenated before the
upsampled rows. Exact sampled-row parity with R is not expected because NNS Python
uses NumPy's RNG; random_seed is a NNS Python-only reproducibility convenience.
Numeric and class prediction intervals are supported and are combined by
installed R's weighted data.table arithmetic. For class stacks, single-method
method=1 and method=2 return the delegated interval table unchanged; when
method=(1,2), the weighted final interval table is rounded with R's
x %% 1 < 0.5 rule.
ts_test is supported and follows installed R's split exactly: CV training uses
the tail ts_test rows, while CV testing uses the earlier rows
1:(n - ts_test). This is intentionally not changed even though it is
counterintuitive. R's CV.size = NULL samples a random value between 0.2 and
1/3; NNS Python uses a deterministic default of 0.25. Pass cv_size explicitly for
exact R parity.
The installed-R 13.0 Iris classification vignette with folds=1 is a documented
stack disparity rather than a NNS Python correctness target. On the 141:150 holdout,
the true labels are all class code 3. Installed R 13.0 returns stack class code
2 for every row because its learned class-rounding threshold is about 0.60;
NNS Python returns class code 3 for every row because its learned threshold is about
0.29. Both implementations have the same high-level shape in that case
(reg = 2, dim.red = 3, raw combined stack near 2.5), but the final
threshold rounding differs. Since R default folds=5 also returns class code
3, NNS Python keeps the behavior that matches the practical classification result
instead of forcing installed-R-13.0 folds=1 parity.
Factor predictor expansion is supported for nns_stack(method=1) and
nns_stack(method=2) with explicit factor_levels= metadata. NNS Python expands
training and test predictors together using the same full-rank dummy convention
as installed R's aligned train/test builder. Pure factor-predictor method=2
and method=(1,2) match installed R's fallback to method 1. Mixed
factor/numeric method=2 uses the expanded numeric design directly. Mixed
factor/numeric method=(1,2) is supported for parity-covered cases that use
explicit factor_levels expansion.
Boost¶
nns_boost maps to R's numeric and deterministic classification NNS.boost
paths and uses the real nns_reg and nns_stack implementations. The
small-feature path (n_features <= 10, where R evaluates all feature
combinations) is supported. For n_features > 10, NNS Python follows R's stochastic
epoch structure: it samples learner-trial feature sets, builds a weighted
survivor feature pool, then samples epoch feature counts and survivor features
from that pool. Exact sampled-feature parity with R is not expected because
NNS Python uses NumPy's RNG, and random_seed is NNS Python-only. Installed R errors for
threshold= on this path because the threshold short-circuit leaves
test.features undefined, so NNS Python keeps that guard. ts_test is supported on
the stochastic path and follows R's separate epoch holdout split: initial
learner trials test rows 1:(n - ts_test), while epochs test the final
2 * ts_test + 1 rows. type="class" returns numeric class codes, not labels; use
class_levels= to reproduce R factor level ordering. Raw string labels remain
rejected unless explicit levels are supplied. balance=True is supported for
classification and uses the same R-style downSample + upSample structure as
nns_stack; exact sampled-row parity with R is not expected.
Explicit-level factor predictors are supported through factor_levels=. NNS Python
integer-codes those columns before deterministic feature selection, matching
installed R's data.matrix conversion under NNS Python' positional-column
convention. Pass None for numeric columns in mixed predictor matrices, for
example factor_levels=(["low", "mid", "high"], None). Multiple explicit-level
factor predictor columns use positional X1, X2, ... semantics; installed R
data frames with semantic column names sort columns alphabetically before
fitting, so callers should order NNS Python columns explicitly when reproducing those
named-data-frame cases. Numeric pred_int is supported and
delegates to nns_stack(pred_int=...), matching installed R; it is deterministic
and does not use MC/meboot. features_only=True returns before the final stack
fit and ignores pred_int, matching R. Classification pred_int is supported
and delegates to final stack method=1, so interval bounds remain raw numeric
values. ts_test is supported for deterministic and stochastic boost paths. R
requires usable column names for matrix inputs; NNS Python uses positional numeric columns. As with nns_stack, R
samples a random CV size when CV.size = NULL; NNS Python uses deterministic
cv_size=0.25 unless specified. For classification boost, final predictions,
feature weights, and feature frequencies are parity-tested against installed R
when balance is disabled and structurally tested when balance sampling is
enabled. The public n.best value is structural-only because R's final internal
NNS.stack call samples its own CV.size = NULL split, while NNS Python keeps the
deterministic stack default.
The installed-R 13.0 Iris boost vignette remains a true parity gap, but not a
quality target for exact output matching. On the same all-class-3 holdout,
installed R 13.0 balanced boost returns class code 1 for every row, while NNS Python
balanced boost returns class code 2 for every row; both are wrong for that
example. Installed R 13.0 also does not accept the folds argument shown in the
rendered upstream overview for NNS.boost, so this example is tracked as
R-version/upstream-example drift plus a boost parity gap rather than evidence
that NNS Python should copy the installed-R balanced output.
Seasonality¶
nns_seas maps to installed R's non-plotting NNS.seas path and ignores
plot, consistent with other NNS Python ports. Inputs shorter than five observations
return R's sentinel period 0. For mean-zero data, R falls back from coefficient
of variation to abs(acf1) ** -1; NNS Python follows the same fallback and
non-finite handling. Installed R can report harmonics rather than the visually
obvious period, so NNS Python matches R's candidate-period screening instead of a
textbook seasonality heuristic. Results are cached by input content and modulo
arguments with defensive copies on return; this preserves R semantics while
avoiding repeated reverse-step scans for identical series.
ARMA¶
nns_arma maps to R's installed NNS.ARMA forecast path. Without prediction
intervals it returns a NumPy forecast vector of length h; with pred_int it
returns a dict keyed like R's data.table columns (Estimates,
Lower <percent>% pred.int, Upper <percent>% pred.int). Forecasts are
recursive: each estimate is appended before the next horizon step. Plot
arguments are ignored. Prediction intervals use nns_mc / nns_meboot; exact
stochastic parity with R is not expected because RNG streams differ.
random_seed is a NNS Python-only convenience for reproducible interval tests.
No-pred_int deterministic forecasts are parity-tested except where NNS Python
intentionally uses a more direct seasonal-lag weighting convention.
seasonal_factor=True uses only the first detected period from nns_seas,
matching ARMA.seas.weighting(TRUE, ...); seasonal_factor=False uses the
selected best_periods rows. dynamic=True with numeric seasonal factors
raises with R's static-seasonality error. Constant-series behavior follows
installed R, including zero forecasts for automatic seasonality paths and NaN
forecasts for some explicit numeric-lag paths. Character weights with numeric
multi-lag seasonal factors is rejected because installed R errors during numeric
multiplication on that path.
For explicit numeric multi-lag seasonal factors such as
seasonal_factor=[132, 276], NNS Python intentionally weights each candidate lag by
the coefficient of variation of that actual lag's reverse component series.
Installed R NNS instead computes the coefficient-of-variation term with reverse
steps 1:length(seasonal.factor) while still applying the observation penalty
to the actual lag values. NNS Python keeps the actual-lag weighting because it better
matches the documented idea that each supplied seasonal factor is weighted by
its own seasonality strength and observation count. The R-compatible difference
is covered by a strict xfail practical test rather than hidden.
nns_arma_optim is supported for the installed-R optimizer path. It greedily
selects seasonal factors, evaluates the default co-moment-normalized objective,
then applies the same equal-weight, bias-shift, shrink, and smooth-regressed
variable checks as R. The optimizer's prediction intervals are deterministic
VaR bands around the in-sample optimizer errors; they are separate from
nns_arma(pred_int=...), which uses the Monte Carlo path. Custom Python
obj_fn callables may be supplied, but R expression objects are not part of the
Python API. nns_var is implemented for numeric matrix-like inputs with
dim_red_method="cor", dim_red_method="NNS.dep",
dim_red_method="NNS.caus", and dim_red_method="all" and returns
R-compatible public output keys. VAR's internal multivariate stack stage uses
ceil(0.2 * n) for the time-series validation window when that term exceeds
2 * h, matching installed R's effective trailing holdout size and preserving
the documented ts.test idea as a count of held-out observations. The h == 0
path is normalized to a Python dictionary containing interpolated_and_extrapolated
and names rather than R's bare data-frame return. The first-stage interpolation/extrapolation helper
_var_interpolate_and_extrapolate is implemented to match R's
missing-value handling and per-variable NNS.ARMA.optim forecasts. The private
multivariate stage _var_multivariate_stack_stage is implemented with
lag.mtx reconstruction, NNS.stack(method=(1,2), ts.test, dim.red.method)
logic, and R-style relevance extraction. The function returns multivariate
and relevant_variables in the same shape/naming pattern expected by
NNS.VAR.
Meboot¶
nns_meboot maps to R's NNS.meboot maximum-entropy bootstrap algorithm and
returns plain Python dictionaries instead of R's vectorized list-matrix wrapper.
Scalar rho returns one result dictionary; vector rho returns a list of result
dictionaries in R's vectorized order. rho=None follows installed R's empty
output behavior, and length-one input returns only {"x": x}.
Exact replicate parity with R is not expected because NNS Python uses NumPy's random
number generator and SciPy's optimizer while R uses its global RNG and
optim(). Deterministic diagnostics (xx, z, dv, dvtrim, xmin,
xmax, desintxb, ordxx, and kappa) are parity-tested exactly. Stochastic
outputs are tested structurally and statistically. random_seed is a NNS Python-only
convenience for reproducible bootstrap draws.
Monte Carlo¶
nns_mc maps to R's NNS.MC wrapper around NNS.meboot. The rho grid and
exponential rho transformation are parity-tested exactly against installed R.
As with nns_meboot, exact stochastic replicate parity is not expected because
R and NNS Python use different RNG streams and optimizer implementations.
random_seed is a NNS Python-only convenience passed through to nns_meboot.
NNS Python returns {"ensemble": array, "replicates": dict}. The replicates
mapping preserves R's names, such as "rho = 1" and "rho = -0.5", with each
value containing that rho block's replicate matrix. Sampling-vignette examples
are covered as smoke tests, but installed R behavior remains the parity source.
Normalization¶
nns_norm(x, linear=False) maps to R's numeric matrix NNS.norm path with
plotting disabled. NNS Python accepts finite 2D arrays. linear=True uses R's
mean-ratio scaling, while linear=False additionally weights scaling by
absolute correlation for fewer than 10 columns and NNS dependence for 10 or
more columns.
Distance¶
nns_distance and nns_distance_bulk map to R's regression-point-matrix
helpers. NNS Python accepts rpm as a finite 2D numeric array with R's y.hat
column in the final position. nns_distance applies R's per-target min-max
rescaling before computing weighted nearest-neighbor predictions. nns_distance_bulk
matches R's compiled bulk helper, including its raw-feature distance convention.
For nns_distance with k > 1, NNS Python matches the installed R 13.0 binary:
the exponential rank-weight family uses the R C API's Rf_dexp scale argument
as 1 / k. This differs from the nearby source-code comment that describes it
as a rate.
Classification distance mode returns numeric class codes, not original labels.
For single-target nns_distance(..., class_=...), installed R uses weighted
mode with integer replication counts ceil(100 * weight). NNS Python follows that
behavior. For equal-distance nearest-neighbor ties, NNS Python preserves RPM row order
to match installed R's first-row tie behavior. Installed R's
NNS.distance.bulk(..., class=...) currently ignores
the class flag in its compiled bulk helper and returns the same inverse-distance
numeric weighted average as non-class bulk distance; NNS Python matches the installed
binary rather than the higher-level classification intent.
Classification¶
R classification paths work with numeric class codes. R factors become
1-indexed numeric codes in factor-level order and predictions are returned as
codes rather than decoded labels. NNS Python provides factor_2_dummy,
factor_2_dummy_fr, encode_factor_codes, and prepare_factor_predictors;
pass explicit levels= / factor_levels= to reproduce R factor level order
because NumPy arrays do not carry factor metadata.
nns_reg(..., type="class"), nns_m_reg(..., type="class"), and
nns_stack(..., type="class") are supported for numeric, logical, and
factor-like targets. Use class_levels= when passing string/object labels so
NNS Python can reproduce R factor codes explicitly. Raw string classification remains
rejected where installed R errors or produces unusable NA conversions.
Predictions and point estimates are numeric class codes, not original labels,
matching installed R. Class confidence intervals are supported in nns_reg and
nns_m_reg; stack/boost class pred_int is supported through those regression
interval tables.
Differentiation¶
nns_diff maps to R's scalar callable NNS.diff path with plotting and trace
output disabled. It returns a dictionary keyed by R's matrix row names and
rounds results to digits, matching R's default output convention.
dy_dx(..., eval_point="overall") maps to R's dy.dx(..., eval.point =
"overall") path and returns the mean fitted gradient from unsmoothed
nns_reg. Numeric dy_dx evaluation points use R's finite-difference grid
around smooth nns_reg point estimates and return a table-like dictionary with
eval.point, first.derivative, and second.derivative. Boundary-point
quirks follow installed R where covered by parity tests.
NNS Python derivative parity is defined at the public input/output level, while
preserving R's cumulative finite-difference perturbation pattern for dy_d.
dy_d scalar wrt has enforced R parity for eval_points="mean", "median",
"last", "obs", and "apd". Vectorized wrt returns one row per eval point
and one column per requested regressor for First, Second, and Mixed when
mixed derivatives are defined. Treat dy_d as an NNS finite-difference
sensitivity estimate around nns_reg point estimates, not as an exact analytic
calculus derivative.
Mixed derivatives require a two-regressor input. Numeric two-value evaluation
points and single-row point modes match installed R on focused fixtures. For
multi-row matrix evaluation points, including eval_points="obs", NNS Python uses a
pointwise mixed finite-difference construction. Installed R's vectorized
list-matrix path packs multi-row mixed derivative points in an order-dependent
way, so NNS Python does not copy that packing quirk.
For scalar dy_d, R mutates lower and upper finite-difference points
cumulatively across rounded bandwidths. If rounded bandwidths repeat, R writes
the later cumulative result back to the first matching result slot and drops
the empty slots during final weighted averaging; NNS Python mirrors that behavior.
The obs and apd paths also rely on smooth dimensional-reduction
nns_reg(..., point_est=..., dim_red_method="equal", smooth=True) estimates.
For out-of-range smooth point estimates, R derives extrapolation slopes from
the smoothed regression points before clamping returned regression-point y
values, then anchors the extrapolation at the first which.min / which.max
boundary row. NNS Python mirrors those boundary quirks for parity.
ANOVA¶
nns_anova maps to R's non-plotting NNS.ANOVA paths. Binary comparisons
return a dictionary keyed like R's list output, aggregate multi-group
comparisons return {"Certainty": value}, and pairwise=True returns R's
symmetric certainty matrix. Confidence interval bootstrapping is structurally
identical to R but uses NumPy RNG instead of R's sample(), so exact per-call
parity is not achievable; numeric values converge to the same population CI.
Pass random_seed for reproducible NNS Python results. Degenerate zero-variance
groups preserve R's NaN CDF/certainty convention.