harv.samplers package

Sampling infrastructure for Keplerian orbits.

This module provides the rejection sampling infrastructure, prior distributions, the main RejectionSampler class, and the Samples container for posterior samples.

class harv.samplers.AbstractSampler

Bases: Module

Abstract base for harv samplers.

Every sampler holds a prior and a pre-built model. Use the bare constructor to combine them:

sampler = RejectionSampler(prior, RVModel(extensions=...))
samples = sampler.run(data, n_prior_samples=100_000)

Concrete subclasses are marked @final per the project’s abstract-final pattern.

Parameters:
__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
prior: HarvPrior
model: AbstractComponentModel | JointModel
class harv.samplers.NumpyroSampler

Bases: AbstractSampler

MCMC sampler for Keplerian orbital parameters using numpyro.

Builds a numpyro model from a component model (or joint model) and runs warm-started MCMC from rejection-sampler output, returning a Samples container. Data is passed to run() rather than at construction, so the same configured sampler can be applied to multiple datasets.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • parameterization – Orbital parameterization. For RV data defaults to StandardRV. Ignored for Gaia astrometry data.

  • extensions – Model extensions (jitter, trends, offsets, GP) supplied at construction time.

Examples

>>> from unxt import Q
>>> from harv.models import StandardRV
>>> from harv.samplers import NumpyroSampler
>>> prior = StandardRV().default_prior(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> mcmc = NumpyroSampler(prior)
>>> mcmc_samples = mcmc.run(
...     data, init_samples=rej_samples, num_warmup=500, num_samples=1000
... )
Parameters:
__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
optimize(samples, data, *, seed=None, max_passes=10, tol=0.0001)

Refine each input sample to the local posterior MAP via BFGS.

Uses numpyro.optim.Minimize() (BFGS via jax.scipy.optimize.minimize()) with an AutoDelta guide to find the local mode of log_prior + marginal_log_likelihood starting from each sample in samples. The returned Samples has the refined nonlinear values, linear values set to the conditional posterior mean (equal to the conditional MAP since the conditional is Gaussian) at the refined nonlinear point, and populated ln_likelihood / ln_prior.

Particularly useful when rejection sampling returns a single sample inside a posterior mode – this moves it from a random acceptance point to the mode peak.

Parameters:
  • samples (Samples) – Posterior samples (e.g. from RejectionSampler.run()) used as warm starts. Each sample seeds an independent BFGS run.

  • data (AbstractData | AbstractDatasetContainer) – Observed data (same shape/type as run()).

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • max_passes (int) – Maximum number of BFGS restarts per sample. The wrapped jax.scipy.optimize.minimize BFGS often quits early when its line search fails; restarting from the previous result frequently recovers further progress. Default 10.

  • tol (float) – Loss-change tolerance for the BFGS-restart loop. If the loss decreases by less than tol on a pass, optimization is considered converged. Default 1e-4.

Return type:

Samples

Returns:

Refined samples with ln_likelihood and ln_prior populated.

Notes

Only the marginalized path is supported (matches run(marginalized=True)). extra_model is not supported.

Emits UserWarning for any sample that does not converge within max_passes BFGS restarts; the returned point for such a sample may not be the local MAP.

run(data, *, init_samples=None, seed=None, marginalized=True, extra_model=None, extra_init_params=None, kernel=None, num_chains=4, num_warmup=500, num_samples=1000, chain_method='parallel', return_logprobs=False)

Run MCMC warm-started from rejection-sampler output.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data (RVData, GaiaAstrometryData, or SystemData for joint models).

  • init_samples (Samples | None) – Posterior samples produced by rejection sampling, used to set the initial positions for each MCMC chain.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • marginalized (bool) – If True (default), linear parameters are analytically marginalized in the likelihood and conditionally sampled afterward. If False, all parameters are sampled jointly.

  • extra_model (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function (pars: dict) -> dict that replaces specific linear parameters with deterministic functions of new physical parameters.

  • extra_init_params (dict[str, Any] | None) – Initial values for parameters introduced by extra_model.

  • kernel (type | None) – A numpyro MCMC kernel class. Defaults to numpyro.infer.NUTS.

  • num_chains (int) – Number of independent MCMC chains. Default: 4.

  • num_warmup (int) – Number of warmup (burn-in) steps per chain. Default: 500.

  • num_samples (int) – Number of posterior samples per chain. Default: 1000.

  • chain_method (str) – How to run chains: "parallel", "sequential", or "vectorized". Default: "parallel".

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood, re-evaluated via model.log_prob) and ln_prior (the summed nonlinear-prior log-density). Enables Samples.map_sample() and Samples.ln_posterior. Requires marginalized=True and no extra_model. Default False.

Return type:

Samples

Returns:

Posterior samples container with nonlinear and linear parameters.

prior: HarvPrior
model: AbstractComponentModel | JointModel
harv.samplers.convert_parameterization(nonlinear, linear, *, source, target)

Convert parameter values between supported parameterizations.

Both source and target must belong to the same family (both RV, or both Gaia astrometry). Parameters not declared by source are preserved unchanged.

Parameters:
Returns:

Converted (nonlinear, linear) parameter dictionaries.

Return type:

tuple[dict[str, Q], dict[str, Q]]

Raises:
  • NotImplementedError – If the samples contain namespaced (joint-model) keys, or source and target are not a supported same-family pair.

  • ValueError – If a parameter declared by source is missing from the inputs.

harv.samplers.QD

alias of QuantityDistribution

class harv.samplers.QuantityDistribution

Bases: Module

Pairs a numpyro distribution with the physical unit of its samples.

For scalar distributions, unit is a single string. For multivariate distributions (e.g. MultivariateNormal over parameters with mixed units), unit is a tuple of strings – one per dimension.

Parameters:
  • distribution (Distribution) – The underlying numpyro distribution (works with bare floats).

  • unit (str | tuple[str, ...]) – Physical unit of the samples. A single string for scalar distributions; a tuple for multivariate distributions where each element may have a different unit.

Examples

Scalar (period in days):

>>> import jax
>>> import numpyro.distributions as dist
>>> from harv import QuantityDistribution
>>> qd = QuantityDistribution(dist.LogUniform(50., 2000.), "day")
>>> sample = qd.sample(jax.random.key(0))
>>> sample.unit
Unit("d")

Multivariate (astrometric linear parameters with mixed units):

qd = QuantityDistribution(
    dist.MultivariateNormal(loc=jnp.zeros(6), ...),
    ("mas", "mas", "mas/yr", "mas/yr", "mas", "mas"),
)
sample = qd.sample(key)  # -> raw jax.Array (consumer splits by name)
__init__(distribution, unit)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

log_prob(value)

Evaluate log-probability, stripping units if present.

Parameters:

value (Any)

Return type:

Any

sample(key, sample_shape=())

Sample from the distribution, attaching units when possible.

For scalar units (str), returns Quantity. For tuple units (multivariate with mixed units), returns a raw array – the consumer splits by parameter name and attaches per-element units.

Parameters:
Return type:

Any

distribution: Distribution
unit: str | tuple[str, ...]
class harv.samplers.RejectionSampler

Bases: AbstractSampler

Rejection sampler for Keplerian orbital parameters.

Implements rejection sampling with analytic marginalization over linear parameters. Configure once, then call run() with each dataset.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • model (AbstractComponentModel | JointModel) – Fully constructed model template (no data, no linear_priors).

  • marginalized_names (tuple[str, ...] | None) – Linear parameter names to analytically marginalize. If None, all Gaussian linear parameters are auto-classified for marginalization.

  • batch_size (int) – Number of samples to process per batch. Smaller values use less memory but may be slower. Default: 100_000.

Examples

>>> from unxt import Q
>>> import jax.numpy as jnp
>>> from harv import HarvPrior, RejectionSampler, RVData
>>> from harv.models.rv import RVModel
>>> data = RVData(
...     time=Q(jnp.linspace(0, 100, 5), "day"),
...     rv=Q(jnp.zeros(5), "km/s"),
...     rv_err=Q(jnp.full(5, 1.0), "km/s"),
... )
>>> prior = HarvPrior.default_rv(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> sampler = RejectionSampler(prior, RVModel())
>>> samples = sampler.run(data, n_prior_samples=100_000)
__init__(prior, model, marginalized_names=None, batch_size=100000)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

batch_size: int = 100000
get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
run(data, *, n_prior_samples, max_posterior_samples=None, seed=None, ignore_non_finite=False, return_logprobs=False)

Run rejection sampling.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data: an AbstractData subclass (e.g. RVData, GaiaAstrometryData) for single-component models, or an AbstractDatasetContainer (e.g. SystemData, SourceData) for JointModel.

  • n_prior_samples (int) – Number of samples to draw from the prior.

  • max_posterior_samples (int | None) – Maximum number of posterior samples to return. If None, returns all accepted samples.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • ignore_non_finite (bool) – If True, any NaN or infinite log-likelihood values are treated as rejected samples by replacing them with -inf before the rejection step. If False (default), non-finite values are left unchanged.

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood) and ln_prior (the summed nonlinear prior log-density). These enable Samples.map_sample() and the Samples.ln_posterior property. Default False.

Return type:

Samples

Returns:

Posterior samples container.

prior: HarvPrior
model: AbstractComponentModel | JointModel
class harv.samplers.Samples

Bases: Module

Container for posterior samples.

Stores both nonlinear and linear parameter samples as Q objects with units baked in. Provides dict-like access, statistical summaries, and visualization tools.

Parameters:
  • nonlinear (dict[str, Quantity]) – Nonlinear parameter samples, one Q per parameter. Keys: "period", "eccentricity", "phase_peri", and optionally "arg_peri", "cos_i", "lon_asc_node". Units: period has time units; angles have "rad"; dimensionless parameters have unit "".

  • linear (dict[str, Quantity]) – Linear parameter samples, one Q per parameter. Keys: e.g. "rv_semiamp", "v_sys" for RV; "ra0", "dec0", "pmra", "pmdec", "parallax", "semi_major_axis" for astrometry. Units are data-driven (e.g. "km/s" for RV).

  • data_type (str) – Informational label identifying the model that produced these samples (e.g. "RVModel", "GaiaAstrometryModel", "JointModel"). Stored in HDF5 for round-tripping.

  • metadata (dict[str, Any]) – Additional metadata (t_ref, num_chains, acceptance rate, etc.).

  • linear_extension_names (tuple[str, ...]) – Names of linear parameters introduced by extensions (instrument offsets, polynomial trends, etc.) beyond the base linear set.

Examples

Samples is normally produced by run(), but can be constructed directly for testing or manual use:

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={
...         "period": Q([100.0, 101.0, 99.5], "day"),
...         "eccentricity": Q([0.1, 0.15, 0.12], ""),
...         "phase_peri": Q([0.3, 0.31, 0.29], ""),
...         "arg_peri": Q([1.0, 1.1, 0.9], "rad"),
...     },
...     linear={
...         "rv_semiamp": Q([10.0, 11.0, 9.5], "km/s"),
...         "v_sys": Q([5.0, 5.1, 4.9], "km/s"),
...     },
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> samples.n_samples
3
>>> "period" in samples
True
>>> samples.keys()[:4]
['period', 'eccentricity', 'phase_peri', 'arg_peri']
Parameters:
__init__(nonlinear, linear, data_type='', metadata=<factory>, linear_extension_names=(), ln_likelihood=None, ln_prior=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

binary_mass_function()

Binary mass function from the RV orbital elements.

See harv.kepler.masses.binary_mass_function().

Return type:

Quantity

Returns:

The binary mass function (a Q in solar masses), one per sample.

Raises:

KeyError – If the samples do not contain rv_semiamp (not an RV fit).

chi2(data, model)

Per-sample goodness-of-fit \(\chi^2\) against the data.

For each posterior sample, evaluates the model prediction at the stored parameter values and returns \(\chi^2 = r^\top C^{-1} r\) (see chi_squared()). Jitter and other extension noise terms are included via C.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit (RVModel or GaiaAstrometryModel); provides the prediction and noise model.

Return type:

Array

Returns:

Array of shape (n_samples,).

companion_mass(m1, *, sini=None)

Companion mass \(m_2\) given the primary mass.

For RV samples the mass function is binary_mass_function(); sini defaults to 1 (the minimum companion mass). For astrometry samples the dark-companion astrometric mass function is used and sini is ignored (the inclination is already encoded in the physical orbit size).

Parameters:
  • m1 (Quantity) – Primary mass (a Q).

  • sini (float | None) – Sine of the inclination, for RV samples only. Default 1 (edge-on).

Return type:

Quantity

Returns:

The companion mass (a Q in solar masses), one per sample.

convert_parameterization(*, source, target)

Convert stored values between supported parameterizations.

Wraps harv.samplers.convert_parameterization(), returning a new Samples with metadata, data_type, and linear_extension_names preserved. The initial implementation supports single-component RV and Gaia astrometry parameterizations only.

Parameters:
Return type:

Samples

data_type: str = ''
classmethod from_hdf5(filename)

Load samples from HDF5 file.

Parameters:

filename (str | Path) – Input HDF5 filename.

Return type:

Samples

Returns:

Loaded samples object.

Examples

>>> samples = Samples.from_hdf5("posterior_samples.h5")
>>> samples.n_samples
42
>>> samples.data_type
'rv'
keys()

All available parameter names (nonlinear + linear + derived).

Return type:

list[str]

linear_extension_names: tuple[str, ...] = ()
ln_likelihood: Array | None = None
property ln_posterior: Array

Per-sample log-posterior density, ln_prior + ln_likelihood.

Both ln_prior and ln_likelihood must have been stored (run the sampler with return_logprobs=True); otherwise ValueError is raised.

ln_prior: Array | None = None
map_sample(*, return_index=False)

Return the maximum a posteriori (MAP) sample.

Selects the single sample with the highest ln_posterior.

Parameters:

return_index (bool) – If True, also return the integer index of the MAP sample.

Return type:

Samples | tuple[Samples, int]

Returns:

A length-1 Samples with the MAP sample, or (Samples, index) when return_index is True.

Raises:

ValueError – If per-sample log-probabilities were not stored (run the sampler with return_logprobs=True).

max_phase_gap(data)

Largest gap in orbital-phase coverage, per sample.

The maximum gap between consecutive observations on the (circular) phase axis – the ESA Gaia “maximum phase gap” statistic.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

median(key=None)

Compute median values for parameters.

Parameters:

key (str | None) – If provided, return median for this parameter only. If None, return dict of medians for all parameters.

Return type:

dict[str, AbstractQuantity | Array] | AbstractQuantity | Array

Returns:

Median value(s).

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> med = samples.median("period")
>>> med.unit
Unit("d")
>>> all_medians = samples.median()
>>> "period" in all_medians
True
property meta: _MetadataView

Q-aware read-only view over metadata.

metadata itself stores quantity-valued entries in split form (<name> value + <name>_unit string) so the static field never holds a JAX array. Use samples.meta[name] to read those entries back as Q instances without manually reassembling the two halves; plain scalar entries (e.g. num_chains) round-trip unchanged. See _MetadataView for the full read API.

minimum_companion_mass(m1)

Minimum companion mass (edge-on, sin i = 1).

Convenience wrapper for companion_mass() with sini=1.

Parameters:

m1 (Quantity) – Primary mass (a Q).

Return type:

Quantity

property n_samples: int

Number of posterior samples.

percentile(key, percentiles=(16, 50, 84))

Compute percentiles for a parameter.

Parameters:
  • key (str) – Parameter name.

  • percentiles (list[float] | tuple[float, ...]) – Percentile values to compute (0-100). Default: (16, 50, 84) which corresponds to the 16th, 50th, 84th percentiles for Gaussian.

Return type:

list[AbstractQuantity | Array]

Returns:

Percentile values with appropriate units.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> p16, p50, p84 = samples.percentile("eccentricity")
>>> len(samples.percentile("period", [5, 50, 95]))
3
period_modes(data, n_clusters=2)

Cluster the period samples and test each mode for unimodality.

Runs K-means on log(period) (experimental; see thejoker’s is_P_Kmodal). Requires the optional scikit-learn dependency.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_clusters (int) – Number of period modes to cluster into. Default 2.

Return type:

tuple[bool, Quantity, ndarray]

Returns:

(all_unimodal, mode_periods, n_per_mode) – whether every mode is individually unimodal, the median period of each mode (a Q), and the sample count per mode.

period_unimodal(data)

Whether the period samples lie within a single mode.

Uses the criterion ptp(P) < 4 * P_min**2 / (2*pi*T), where T is the data time span – the period spacing at which adjacent aliases become resolvable (see thejoker’s is_P_unimodal).

Parameters:

data (AbstractData) – The data the samples were fit to (for the observed time span).

Return type:

bool

periods_spanned(data)

Number of orbital periods spanned by the data, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

phase_coverage(data, n_bins=10)

Fraction of phase bins containing at least one observation.

The ESA Gaia “phase coverage” statistic, per sample.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_bins (int) – Number of equal-width phase bins. Default 10.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

phase_coverage_per_period(data)

Maximum number of observations within any single period, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

plot_corner(params=None, truths=None, labels=None, **plot_kwargs)

Create corner plot of posterior samples using arviz.

Parameters:
  • params (list[str] | None) – Parameters to include in corner plot. If None, selects a default set based on data_type.

  • truths (dict[str, Any] | None) – Dictionary of true parameter values to overplot as reference values.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

  • **plot_kwargs (Any) – Additional keyword arguments passed to arviz.plot_pair().

Return type:

Any

Returns:

Array of axes from arviz.plot_pair.

Examples

Default corner plot (selects parameters based on data type):

>>> axes = samples.plot_corner()

Specify parameters and overplot true values:

>>> from unxt import Q
>>> axes = samples.plot_corner(
...     params=["period", "eccentricity"],
...     truths={"period": Q(100, "day"), "eccentricity": 0.3},
... )
reduced_chi2(data, model, *, dof=None)

Per-sample reduced \(\chi^2\) (\(\chi^2 / \mathrm{dof}\)).

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit.

  • dof (int | None) – Degrees of freedom. Defaults to n_obs - n_params, where n_params counts every fitted parameter (orbital + linear + extension). A well-fitting model gives reduced \(\chi^2\) near 1.

Return type:

Array

Returns:

Array of shape (n_samples,).

Raises:

ValueError – If dof (default or supplied) is not positive.

semi_major_axis_AU()

Physical semi-major axis (AU) from the angular orbit size + parallax.

See harv.kepler.masses.semi_major_axis_physical().

Return type:

Quantity

Returns:

The physical semi-major axis (a Q in AU), one per sample.

Raises:

KeyError – If the samples lack semi_major_axis or parallax.

summary(params=None)

Compute summary statistics for parameters.

For each parameter, computes: - median - mean - std (standard deviation) - percentiles (16th, 84th) for +/-1-sigma equivalent

Parameters:

params (list[str] | None) – List of parameter names to summarize. If None, summarizes all.

Return type:

dict[str, dict[str, Any]]

Returns:

Dictionary mapping parameter names to their statistics.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> summary = samples.summary(["period", "eccentricity"])
>>> sorted(summary.keys())
['eccentricity', 'period']
>>> sorted(summary["period"].keys())
['mean', 'median', 'p16', 'p84', 'std']
thiele_innes_to_campbell()

Convert Thiele-Innes linear parameters to Campbell orbital elements.

See campbell_from_thiele_innes() for the mathematical details of the conversion.

Returns:

New Samples with semi_major_axis, arg_peri, lon_asc_node, cos_i (replacing the four TI constants).

Return type:

Samples

to_arviz(params=None, labels=None)

Export samples to an arviz.InferenceData object.

Parameters:
  • params (list[str] | None) – Parameters to include. If None, all parameters returned by keys() are included.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

Return type:

Any

Returns:

Inference data suitable for arviz.plot_pair, arviz.summary, etc.

Raises:

ImportError – If arviz is not installed.

Examples

>>> idata = samples.to_arviz(["period", "eccentricity"])
to_hdf5(filename)

Save samples to HDF5 file.

Parameters:

filename (str | Path) – Output HDF5 filename.

Return type:

None

Examples

Save posterior samples and reload them:

>>> samples.to_hdf5("posterior_samples.h5")
>>> reloaded = Samples.from_hdf5("posterior_samples.h5")
>>> reloaded.n_samples == samples.n_samples
True
wrap_angles()

Wrap negative rv_semiamp / semi_major_axis to positive.

Negative rv_semiamp (K) or semi_major_axis (a) describes an orbit that is physically identical to the all-positive case under two symmetries of the model:

  • arg_peri -> arg_peri + pi flips the sign of both K and a (the (omega, K, a) -> (omega + pi, -K, -a) symmetry);

  • lon_asc_node -> lon_asc_node + pi flips the sign of a alone (the astrometric (Omega, a) -> (Omega + pi, -a) symmetry; lon_asc_node does not enter the RV model).

This method returns a new Samples enforcing K >= 0 and a >= 0 in two steps:

  1. shift arg_peri by pi (flipping every rv_semiamp and semi_major_axis) on the samples where the trigger amplitude is negative — enforcing K >= 0;

  2. shift lon_asc_node by pi (flipping every semi_major_axis) on the samples where a is still negative — enforcing a >= 0.

Both shifted angles are wrapped to [0, 2*pi). A single arg_peri shift cannot make both K and a positive when their signs disagree (which routinely happens in joint RV + astrometry posteriors), so the second lon_asc_node shift is required.

Joint models (e.g. SB2) namespace per-component linear parameters as "component.param_name"; this method discovers every rv_semiamp- and semi_major_axis-suffixed key. The first rv_semiamp-suffixed key (insertion order) triggers step 1; semi_major_axis triggers step 1 only when no rv_semiamp is present (astrometry-only fits).

No-op when arg_peri is absent from nonlinear, when neither rv_semiamp nor semi_major_axis is in linear, or when no entries are negative.

Raises:

NotImplementedError – If the model has multiple per-component arg_peri or lon_asc_node keys (e.g. "primary.arg_peri" and "secondary.arg_peri"). All current harv joint factories share both, so this is not expected to arise in practice.

Return type:

Samples

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 100.0], "day"),
...                "eccentricity": Q([0.1, 0.1], ""),
...                "phase_peri": Q([0.3, 0.3], ""),
...                "arg_peri": Q([1.0, 1.0], "rad")},
...     linear={"rv_semiamp": Q([-10.0, 10.0], "km/s"),
...             "v_sys": Q([0.0, 0.0], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> wrapped = samples.wrap_angles()
>>> bool((wrapped["rv_semiamp"].value >= 0).all())
True
Return type:

Samples

nonlinear: dict[str, Quantity]
linear: dict[str, Quantity]
metadata: dict[str, Any]

Submodules

harv.samplers.base module

Abstract base for harv samplers.

Carries the shared field set (prior, model, marginalized_names). Algorithm-specific surface (run() and its private helpers) lives on each concrete subclass, since the rejection and MCMC algorithms have fundamentally different run() signatures and internal state.

class harv.samplers.base.AbstractSampler

Bases: Module

Abstract base for harv samplers.

Every sampler holds a prior and a pre-built model. Use the bare constructor to combine them:

sampler = RejectionSampler(prior, RVModel(extensions=...))
samples = sampler.run(data, n_prior_samples=100_000)

Concrete subclasses are marked @final per the project’s abstract-final pattern.

Parameters:
prior: HarvPrior
model: AbstractComponentModel | JointModel
marginalized_names: tuple[str, ...] | None = None
get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

harv.samplers.conversion module

Convert sampled parameter values between parameterizations.

This first implementation supports single-component RV and Gaia astrometry parameterizations only:

  • RV: StandardRV <-> EcoswEsinwRV

  • Gaia astrometry: StandardGaiaAstrometry <-> ThieleInnesGaiaAstrometry

Parameters not declared by the source parameterization (for example, extension parameters such as jitter or polynomial-trend coefficients) are preserved unchanged. See docs/spec.md (“Parameterization conversion”).

harv.samplers.conversion.convert_parameterization(nonlinear, linear, *, source, target)

Convert parameter values between supported parameterizations.

Both source and target must belong to the same family (both RV, or both Gaia astrometry). Parameters not declared by source are preserved unchanged.

Parameters:
Returns:

Converted (nonlinear, linear) parameter dictionaries.

Return type:

tuple[dict[str, Q], dict[str, Q]]

Raises:
  • NotImplementedError – If the samples contain namespaced (joint-model) keys, or source and target are not a supported same-family pair.

  • ValueError – If a parameter declared by source is missing from the inputs.

harv.samplers.numpyro module

Numpyro MCMC sampler for harv.

Provides the NumpyroSampler class for warm-started MCMC using the new component model API. The component models’ numpyro_model() method builds the numpyro model closure directly.

final class harv.samplers.numpyro.NumpyroSampler

Bases: AbstractSampler

MCMC sampler for Keplerian orbital parameters using numpyro.

Builds a numpyro model from a component model (or joint model) and runs warm-started MCMC from rejection-sampler output, returning a Samples container. Data is passed to run() rather than at construction, so the same configured sampler can be applied to multiple datasets.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • parameterization – Orbital parameterization. For RV data defaults to StandardRV. Ignored for Gaia astrometry data.

  • extensions – Model extensions (jitter, trends, offsets, GP) supplied at construction time.

Examples

>>> from unxt import Q
>>> from harv.models import StandardRV
>>> from harv.samplers import NumpyroSampler
>>> prior = StandardRV().default_prior(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> mcmc = NumpyroSampler(prior)
>>> mcmc_samples = mcmc.run(
...     data, init_samples=rej_samples, num_warmup=500, num_samples=1000
... )
Parameters:
run(data, *, init_samples=None, seed=None, marginalized=True, extra_model=None, extra_init_params=None, kernel=None, num_chains=4, num_warmup=500, num_samples=1000, chain_method='parallel', return_logprobs=False)

Run MCMC warm-started from rejection-sampler output.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data (RVData, GaiaAstrometryData, or SystemData for joint models).

  • init_samples (Samples | None) – Posterior samples produced by rejection sampling, used to set the initial positions for each MCMC chain.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • marginalized (bool) – If True (default), linear parameters are analytically marginalized in the likelihood and conditionally sampled afterward. If False, all parameters are sampled jointly.

  • extra_model (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function (pars: dict) -> dict that replaces specific linear parameters with deterministic functions of new physical parameters.

  • extra_init_params (dict[str, Any] | None) – Initial values for parameters introduced by extra_model.

  • kernel (type | None) – A numpyro MCMC kernel class. Defaults to numpyro.infer.NUTS.

  • num_chains (int) – Number of independent MCMC chains. Default: 4.

  • num_warmup (int) – Number of warmup (burn-in) steps per chain. Default: 500.

  • num_samples (int) – Number of posterior samples per chain. Default: 1000.

  • chain_method (str) – How to run chains: "parallel", "sequential", or "vectorized". Default: "parallel".

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood, re-evaluated via model.log_prob) and ln_prior (the summed nonlinear-prior log-density). Enables Samples.map_sample() and Samples.ln_posterior. Requires marginalized=True and no extra_model. Default False.

Return type:

Samples

Returns:

Posterior samples container with nonlinear and linear parameters.

optimize(samples, data, *, seed=None, max_passes=10, tol=0.0001)

Refine each input sample to the local posterior MAP via BFGS.

Uses numpyro.optim.Minimize() (BFGS via jax.scipy.optimize.minimize()) with an AutoDelta guide to find the local mode of log_prior + marginal_log_likelihood starting from each sample in samples. The returned Samples has the refined nonlinear values, linear values set to the conditional posterior mean (equal to the conditional MAP since the conditional is Gaussian) at the refined nonlinear point, and populated ln_likelihood / ln_prior.

Particularly useful when rejection sampling returns a single sample inside a posterior mode – this moves it from a random acceptance point to the mode peak.

Parameters:
  • samples (Samples) – Posterior samples (e.g. from RejectionSampler.run()) used as warm starts. Each sample seeds an independent BFGS run.

  • data (AbstractData | AbstractDatasetContainer) – Observed data (same shape/type as run()).

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • max_passes (int) – Maximum number of BFGS restarts per sample. The wrapped jax.scipy.optimize.minimize BFGS often quits early when its line search fails; restarting from the previous result frequently recovers further progress. Default 10.

  • tol (float) – Loss-change tolerance for the BFGS-restart loop. If the loss decreases by less than tol on a pass, optimization is considered converged. Default 1e-4.

Return type:

Samples

Returns:

Refined samples with ln_likelihood and ln_prior populated.

Notes

Only the marginalized path is supported (matches run(marginalized=True)). extra_model is not supported.

Emits UserWarning for any sample that does not converge within max_passes BFGS restarts; the returned point for such a sample may not be the local MAP.

__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
prior: HarvPrior
model: AbstractComponentModel | JointModel

harv.samplers.rejection module

Rejection sampler for orbital parameter inference.

final class harv.samplers.rejection.RejectionSampler

Bases: AbstractSampler

Rejection sampler for Keplerian orbital parameters.

Implements rejection sampling with analytic marginalization over linear parameters. Configure once, then call run() with each dataset.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • model (AbstractComponentModel | JointModel) – Fully constructed model template (no data, no linear_priors).

  • marginalized_names (tuple[str, ...] | None) – Linear parameter names to analytically marginalize. If None, all Gaussian linear parameters are auto-classified for marginalization.

  • batch_size (int) – Number of samples to process per batch. Smaller values use less memory but may be slower. Default: 100_000.

Examples

>>> from unxt import Q
>>> import jax.numpy as jnp
>>> from harv import HarvPrior, RejectionSampler, RVData
>>> from harv.models.rv import RVModel
>>> data = RVData(
...     time=Q(jnp.linspace(0, 100, 5), "day"),
...     rv=Q(jnp.zeros(5), "km/s"),
...     rv_err=Q(jnp.full(5, 1.0), "km/s"),
... )
>>> prior = HarvPrior.default_rv(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> sampler = RejectionSampler(prior, RVModel())
>>> samples = sampler.run(data, n_prior_samples=100_000)
batch_size: int = 100000
run(data, *, n_prior_samples, max_posterior_samples=None, seed=None, ignore_non_finite=False, return_logprobs=False)

Run rejection sampling.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data: an AbstractData subclass (e.g. RVData, GaiaAstrometryData) for single-component models, or an AbstractDatasetContainer (e.g. SystemData, SourceData) for JointModel.

  • n_prior_samples (int) – Number of samples to draw from the prior.

  • max_posterior_samples (int | None) – Maximum number of posterior samples to return. If None, returns all accepted samples.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • ignore_non_finite (bool) – If True, any NaN or infinite log-likelihood values are treated as rejected samples by replacing them with -inf before the rejection step. If False (default), non-finite values are left unchanged.

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood) and ln_prior (the summed nonlinear prior log-density). These enable Samples.map_sample() and the Samples.ln_posterior property. Default False.

Return type:

Samples

Returns:

Posterior samples container.

__init__(prior, model, marginalized_names=None, batch_size=100000)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
prior: HarvPrior
model: AbstractComponentModel | JointModel

harv.samplers.samples module

Container for rejection sampler posterior samples.

This module provides the Samples class which stores posterior samples from rejection sampling with dict-like access, unit handling, and analysis tools.

class harv.samplers.samples.Samples

Bases: Module

Container for posterior samples.

Stores both nonlinear and linear parameter samples as Q objects with units baked in. Provides dict-like access, statistical summaries, and visualization tools.

Parameters:
  • nonlinear (dict[str, Quantity]) – Nonlinear parameter samples, one Q per parameter. Keys: "period", "eccentricity", "phase_peri", and optionally "arg_peri", "cos_i", "lon_asc_node". Units: period has time units; angles have "rad"; dimensionless parameters have unit "".

  • linear (dict[str, Quantity]) – Linear parameter samples, one Q per parameter. Keys: e.g. "rv_semiamp", "v_sys" for RV; "ra0", "dec0", "pmra", "pmdec", "parallax", "semi_major_axis" for astrometry. Units are data-driven (e.g. "km/s" for RV).

  • data_type (str) – Informational label identifying the model that produced these samples (e.g. "RVModel", "GaiaAstrometryModel", "JointModel"). Stored in HDF5 for round-tripping.

  • metadata (dict[str, Any]) – Additional metadata (t_ref, num_chains, acceptance rate, etc.).

  • linear_extension_names (tuple[str, ...]) – Names of linear parameters introduced by extensions (instrument offsets, polynomial trends, etc.) beyond the base linear set.

Examples

Samples is normally produced by run(), but can be constructed directly for testing or manual use:

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={
...         "period": Q([100.0, 101.0, 99.5], "day"),
...         "eccentricity": Q([0.1, 0.15, 0.12], ""),
...         "phase_peri": Q([0.3, 0.31, 0.29], ""),
...         "arg_peri": Q([1.0, 1.1, 0.9], "rad"),
...     },
...     linear={
...         "rv_semiamp": Q([10.0, 11.0, 9.5], "km/s"),
...         "v_sys": Q([5.0, 5.1, 4.9], "km/s"),
...     },
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> samples.n_samples
3
>>> "period" in samples
True
>>> samples.keys()[:4]
['period', 'eccentricity', 'phase_peri', 'arg_peri']
Parameters:
nonlinear: dict[str, Quantity]
linear: dict[str, Quantity]
data_type: str = ''
metadata: dict[str, Any]
linear_extension_names: tuple[str, ...] = ()
ln_likelihood: Array | None = None
ln_prior: Array | None = None
property n_samples: int

Number of posterior samples.

property meta: _MetadataView

Q-aware read-only view over metadata.

metadata itself stores quantity-valued entries in split form (<name> value + <name>_unit string) so the static field never holds a JAX array. Use samples.meta[name] to read those entries back as Q instances without manually reassembling the two halves; plain scalar entries (e.g. num_chains) round-trip unchanged. See _MetadataView for the full read API.

property ln_posterior: Array

Per-sample log-posterior density, ln_prior + ln_likelihood.

Both ln_prior and ln_likelihood must have been stored (run the sampler with return_logprobs=True); otherwise ValueError is raised.

keys()

All available parameter names (nonlinear + linear + derived).

Return type:

list[str]

wrap_angles()

Wrap negative rv_semiamp / semi_major_axis to positive.

Negative rv_semiamp (K) or semi_major_axis (a) describes an orbit that is physically identical to the all-positive case under two symmetries of the model:

  • arg_peri -> arg_peri + pi flips the sign of both K and a (the (omega, K, a) -> (omega + pi, -K, -a) symmetry);

  • lon_asc_node -> lon_asc_node + pi flips the sign of a alone (the astrometric (Omega, a) -> (Omega + pi, -a) symmetry; lon_asc_node does not enter the RV model).

This method returns a new Samples enforcing K >= 0 and a >= 0 in two steps:

  1. shift arg_peri by pi (flipping every rv_semiamp and semi_major_axis) on the samples where the trigger amplitude is negative — enforcing K >= 0;

  2. shift lon_asc_node by pi (flipping every semi_major_axis) on the samples where a is still negative — enforcing a >= 0.

Both shifted angles are wrapped to [0, 2*pi). A single arg_peri shift cannot make both K and a positive when their signs disagree (which routinely happens in joint RV + astrometry posteriors), so the second lon_asc_node shift is required.

Joint models (e.g. SB2) namespace per-component linear parameters as "component.param_name"; this method discovers every rv_semiamp- and semi_major_axis-suffixed key. The first rv_semiamp-suffixed key (insertion order) triggers step 1; semi_major_axis triggers step 1 only when no rv_semiamp is present (astrometry-only fits).

No-op when arg_peri is absent from nonlinear, when neither rv_semiamp nor semi_major_axis is in linear, or when no entries are negative.

Raises:

NotImplementedError – If the model has multiple per-component arg_peri or lon_asc_node keys (e.g. "primary.arg_peri" and "secondary.arg_peri"). All current harv joint factories share both, so this is not expected to arise in practice.

Return type:

Samples

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 100.0], "day"),
...                "eccentricity": Q([0.1, 0.1], ""),
...                "phase_peri": Q([0.3, 0.3], ""),
...                "arg_peri": Q([1.0, 1.0], "rad")},
...     linear={"rv_semiamp": Q([-10.0, 10.0], "km/s"),
...             "v_sys": Q([0.0, 0.0], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> wrapped = samples.wrap_angles()
>>> bool((wrapped["rv_semiamp"].value >= 0).all())
True
Return type:

Samples

thiele_innes_to_campbell()

Convert Thiele-Innes linear parameters to Campbell orbital elements.

See campbell_from_thiele_innes() for the mathematical details of the conversion.

Returns:

New Samples with semi_major_axis, arg_peri, lon_asc_node, cos_i (replacing the four TI constants).

Return type:

Samples

convert_parameterization(*, source, target)

Convert stored values between supported parameterizations.

Wraps harv.samplers.convert_parameterization(), returning a new Samples with metadata, data_type, and linear_extension_names preserved. The initial implementation supports single-component RV and Gaia astrometry parameterizations only.

Parameters:
Return type:

Samples

median(key=None)

Compute median values for parameters.

Parameters:

key (str | None) – If provided, return median for this parameter only. If None, return dict of medians for all parameters.

Return type:

dict[str, AbstractQuantity | Array] | AbstractQuantity | Array

Returns:

Median value(s).

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> med = samples.median("period")
>>> med.unit
Unit("d")
>>> all_medians = samples.median()
>>> "period" in all_medians
True
percentile(key, percentiles=(16, 50, 84))

Compute percentiles for a parameter.

Parameters:
  • key (str) – Parameter name.

  • percentiles (list[float] | tuple[float, ...]) – Percentile values to compute (0-100). Default: (16, 50, 84) which corresponds to the 16th, 50th, 84th percentiles for Gaussian.

Return type:

list[AbstractQuantity | Array]

Returns:

Percentile values with appropriate units.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> p16, p50, p84 = samples.percentile("eccentricity")
>>> len(samples.percentile("period", [5, 50, 95]))
3
summary(params=None)

Compute summary statistics for parameters.

For each parameter, computes: - median - mean - std (standard deviation) - percentiles (16th, 84th) for +/-1-sigma equivalent

Parameters:

params (list[str] | None) – List of parameter names to summarize. If None, summarizes all.

Return type:

dict[str, dict[str, Any]]

Returns:

Dictionary mapping parameter names to their statistics.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> summary = samples.summary(["period", "eccentricity"])
>>> sorted(summary.keys())
['eccentricity', 'period']
>>> sorted(summary["period"].keys())
['mean', 'median', 'p16', 'p84', 'std']
map_sample(*, return_index=False)

Return the maximum a posteriori (MAP) sample.

Selects the single sample with the highest ln_posterior.

Parameters:

return_index (bool) – If True, also return the integer index of the MAP sample.

Return type:

Samples | tuple[Samples, int]

Returns:

A length-1 Samples with the MAP sample, or (Samples, index) when return_index is True.

Raises:

ValueError – If per-sample log-probabilities were not stored (run the sampler with return_logprobs=True).

period_unimodal(data)

Whether the period samples lie within a single mode.

Uses the criterion ptp(P) < 4 * P_min**2 / (2*pi*T), where T is the data time span – the period spacing at which adjacent aliases become resolvable (see thejoker’s is_P_unimodal).

Parameters:

data (AbstractData) – The data the samples were fit to (for the observed time span).

Return type:

bool

period_modes(data, n_clusters=2)

Cluster the period samples and test each mode for unimodality.

Runs K-means on log(period) (experimental; see thejoker’s is_P_Kmodal). Requires the optional scikit-learn dependency.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_clusters (int) – Number of period modes to cluster into. Default 2.

Return type:

tuple[bool, Quantity, ndarray]

Returns:

(all_unimodal, mode_periods, n_per_mode) – whether every mode is individually unimodal, the median period of each mode (a Q), and the sample count per mode.

max_phase_gap(data)

Largest gap in orbital-phase coverage, per sample.

The maximum gap between consecutive observations on the (circular) phase axis – the ESA Gaia “maximum phase gap” statistic.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

phase_coverage(data, n_bins=10)

Fraction of phase bins containing at least one observation.

The ESA Gaia “phase coverage” statistic, per sample.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_bins (int) – Number of equal-width phase bins. Default 10.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

periods_spanned(data)

Number of orbital periods spanned by the data, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

phase_coverage_per_period(data)

Maximum number of observations within any single period, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

chi2(data, model)

Per-sample goodness-of-fit \(\chi^2\) against the data.

For each posterior sample, evaluates the model prediction at the stored parameter values and returns \(\chi^2 = r^\top C^{-1} r\) (see chi_squared()). Jitter and other extension noise terms are included via C.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit (RVModel or GaiaAstrometryModel); provides the prediction and noise model.

Return type:

Array

Returns:

Array of shape (n_samples,).

reduced_chi2(data, model, *, dof=None)

Per-sample reduced \(\chi^2\) (\(\chi^2 / \mathrm{dof}\)).

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit.

  • dof (int | None) – Degrees of freedom. Defaults to n_obs - n_params, where n_params counts every fitted parameter (orbital + linear + extension). A well-fitting model gives reduced \(\chi^2\) near 1.

Return type:

Array

Returns:

Array of shape (n_samples,).

Raises:

ValueError – If dof (default or supplied) is not positive.

binary_mass_function()

Binary mass function from the RV orbital elements.

See harv.kepler.masses.binary_mass_function().

Return type:

Quantity

Returns:

The binary mass function (a Q in solar masses), one per sample.

Raises:

KeyError – If the samples do not contain rv_semiamp (not an RV fit).

semi_major_axis_AU()

Physical semi-major axis (AU) from the angular orbit size + parallax.

See harv.kepler.masses.semi_major_axis_physical().

Return type:

Quantity

Returns:

The physical semi-major axis (a Q in AU), one per sample.

Raises:

KeyError – If the samples lack semi_major_axis or parallax.

companion_mass(m1, *, sini=None)

Companion mass \(m_2\) given the primary mass.

For RV samples the mass function is binary_mass_function(); sini defaults to 1 (the minimum companion mass). For astrometry samples the dark-companion astrometric mass function is used and sini is ignored (the inclination is already encoded in the physical orbit size).

Parameters:
  • m1 (Quantity) – Primary mass (a Q).

  • sini (float | None) – Sine of the inclination, for RV samples only. Default 1 (edge-on).

Return type:

Quantity

Returns:

The companion mass (a Q in solar masses), one per sample.

minimum_companion_mass(m1)

Minimum companion mass (edge-on, sin i = 1).

Convenience wrapper for companion_mass() with sini=1.

Parameters:

m1 (Quantity) – Primary mass (a Q).

Return type:

Quantity

to_hdf5(filename)

Save samples to HDF5 file.

Parameters:

filename (str | Path) – Output HDF5 filename.

Return type:

None

Examples

Save posterior samples and reload them:

>>> samples.to_hdf5("posterior_samples.h5")
>>> reloaded = Samples.from_hdf5("posterior_samples.h5")
>>> reloaded.n_samples == samples.n_samples
True
classmethod from_hdf5(filename)

Load samples from HDF5 file.

Parameters:

filename (str | Path) – Input HDF5 filename.

Return type:

Samples

Returns:

Loaded samples object.

Examples

>>> samples = Samples.from_hdf5("posterior_samples.h5")
>>> samples.n_samples
42
>>> samples.data_type
'rv'
__init__(nonlinear, linear, data_type='', metadata=<factory>, linear_extension_names=(), ln_likelihood=None, ln_prior=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

to_arviz(params=None, labels=None)

Export samples to an arviz.InferenceData object.

Parameters:
  • params (list[str] | None) – Parameters to include. If None, all parameters returned by keys() are included.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

Return type:

Any

Returns:

Inference data suitable for arviz.plot_pair, arviz.summary, etc.

Raises:

ImportError – If arviz is not installed.

Examples

>>> idata = samples.to_arviz(["period", "eccentricity"])
plot_corner(params=None, truths=None, labels=None, **plot_kwargs)

Create corner plot of posterior samples using arviz.

Parameters:
  • params (list[str] | None) – Parameters to include in corner plot. If None, selects a default set based on data_type.

  • truths (dict[str, Any] | None) – Dictionary of true parameter values to overplot as reference values.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

  • **plot_kwargs (Any) – Additional keyword arguments passed to arviz.plot_pair().

Return type:

Any

Returns:

Array of axes from arviz.plot_pair.

Examples

Default corner plot (selects parameters based on data type):

>>> axes = samples.plot_corner()

Specify parameters and overplot true values:

>>> from unxt import Q
>>> axes = samples.plot_corner(
...     params=["period", "eccentricity"],
...     truths={"period": Q(100, "day"), "eccentricity": 0.3},
... )