harv package

harv: Tools for inferring Keplerian orbital parameters.

A JAX-based package for modeling binary-star and star-exoplanet systems with time series data, such as Gaia DR4 epoch astrometry and radial velocities. The package is units aware via unxt, supports probabilistic modeling with numpyro, and provides flexible Keplerian orbit frameworks for single and multi-body systems.

class harv.GaiaAstrometryData

Bases: AbstractAstrometryData

Gaia epoch astrometry (along-scan measurements).

Examples

>>> import jax.numpy as jnp
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, al_position, al_position_err, scan_angle, parallax_factor, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

plot(ax=None, *, al_unit=None, add_labels=True, relative_to_t_ref=False, **kwargs)

Plot along-scan residuals vs time.

Parameters:
  • ax (Any) – matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • al_unit (str | None) – Display unit for the along-scan position. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import jax.numpy as jnp
>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> ax = data.plot()
>>> plt.close("all")
t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

al_position: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan position.

al_position_err: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan uncertainty.

scan_angle: Real[Quantity[PhysicalType('angle')], 'n']

Per-CCD scan angle.

parallax_factor: Float[Array, 'n']

AL parallax factors.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.RVData

Bases: AbstractData

Radial velocity measurements.

Examples

>>> from unxt import Q
>>> from harv import RVData
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, rv, rv_err, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

plot(ax=None, *, rv_unit=None, add_labels=True, relative_to_t_ref=False, phase_fold=None, **kwargs)

Plot RV data as error bars.

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • rv_unit (str | None) – Display unit for the RV axis. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref. Mutually exclusive with phase_fold.

  • phase_fold (Any | None) – If given, fold observations to orbital phase using this period: x = (time - t_ref) / phase_fold mod 1. Mutually exclusive with relative_to_t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> ax = data.plot()  # uses errorbar() with sensible defaults
>>> ax = data.plot(color="C1", markersize=6)  # override style
>>> ax = data.plot(phase_fold=Q(50.0, "day"))  # phase-folded
>>> plt.close("all")
t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

rv: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocities.

rv_err: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocity uncertainties.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.SourceData

Bases: AbstractDatasetContainer

Container for multiple named datasets for a single source.

Accepts arbitrary named datasets via keyword arguments. Names are user-defined and can be anything (e.g., gaia, keck_rv, hst_imaging).

Parameters:

datasets (AbstractAstrometryData | RVData)

__init__(**datasets)
Parameters:

datasets (AbstractAstrometryData | RVData)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

keys()

Dataset/component names.

Return type:

Iterator[str]

plot(*args, **kwargs)

Plot all datasets on a single axes.

Only valid when every contained dataset shares the same concrete type; plotting heterogeneous types (e.g. RV in km/s and astrometry in mas) on a single axes would overlay incompatible y-axes. Use get_datasets_by_type() to filter to a single type first when needed.

Parameters mirror AbstractDatasetContainer.plot().

Raises:

TypeError – If the contained datasets are not all of the same concrete type.

Parameters:
Return type:

Any

stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

harv.QD

alias of QuantityDistribution

class harv.QuantityDistribution

Bases: Module

Pairs a numpyro distribution with the physical unit of its samples.

For scalar distributions, unit is a single string. For multivariate distributions (e.g. MultivariateNormal over parameters with mixed units), unit is a tuple of strings – one per dimension.

Parameters:
  • distribution (Distribution) – The underlying numpyro distribution (works with bare floats).

  • unit (str | tuple[str, ...]) – Physical unit of the samples. A single string for scalar distributions; a tuple for multivariate distributions where each element may have a different unit.

Examples

Scalar (period in days):

>>> import jax
>>> import numpyro.distributions as dist
>>> from harv import QuantityDistribution
>>> qd = QuantityDistribution(dist.LogUniform(50., 2000.), "day")
>>> sample = qd.sample(jax.random.key(0))
>>> sample.unit
Unit("d")

Multivariate (astrometric linear parameters with mixed units):

qd = QuantityDistribution(
    dist.MultivariateNormal(loc=jnp.zeros(6), ...),
    ("mas", "mas", "mas/yr", "mas/yr", "mas", "mas"),
)
sample = qd.sample(key)  # -> raw jax.Array (consumer splits by name)
__init__(distribution, unit)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

log_prob(value)

Evaluate log-probability, stripping units if present.

Parameters:

value (Any)

Return type:

Any

sample(key, sample_shape=())

Sample from the distribution, attaching units when possible.

For scalar units (str), returns Quantity. For tuple units (multivariate with mixed units), returns a raw array – the consumer splits by parameter name and attaches per-element units.

Parameters:
Return type:

Any

distribution: Distribution
unit: str | tuple[str, ...]
class harv.GaiaAstrometryModel

Bases: AbstractComponentModel

Gaia epoch astrometry component model (template).

Carries a parameterization and extensions. Data and the linear prior are passed at evaluation time so the same model instance can be reused across multiple datasets.

Parameters:
  • parameterization (StandardGaiaAstrometry | ThieleInnesGaiaAstrometry) – Declares parameter names/roles and builds the base design matrix.

  • extensions (tuple[AbstractExtension, ...]) – Model extensions (jitter, trends, …).

  • pm_time_unit (str) – If not None, override the proper motion units in the design matrix to use this unit instead of the default (obs_unit / yr).

Examples

>>> from harv.models.astrometry import GaiaAstrometryModel
>>> model = GaiaAstrometryModel()
>>> sorted(model._all_nonlinear_names())
['arg_peri', 'cos_i', 'eccentricity', 'lon_asc_node', 'period', 'phase_peri']
__init__(parameterization=StandardGaiaAstrometry(), extensions=(), *, pm_time_unit='yr')
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

chi_squared(nl_values, linear_values, data)

Goodness-of-fit \(\chi^2\) for one fully-specified parameter set.

Unlike log_prob() (which marginalizes the linear parameters and returns a marginal log-likelihood), this evaluates the model at the given linear-parameter values and returns the residual statistic

\[\chi^2 = r^\top C^{-1} r, \qquad r = y_\mathrm{obs} - X\,y,\]

where \(C\) is the (extension-modified) observation covariance. For a diagonal covariance this is \(\sum_i r_i^2 / \sigma_i^2\); a Gaussian-process extension promotes \(C\) to a full matrix and the Mahalanobis form is used. Jitter is included via the inflated \(C\).

Parameters:
  • nl_values (dict[str, Any]) – Nonlinear parameter values (orbital + any extension parameters), in the same form accepted by log_prob().

  • linear_values (dict[str, Array]) – Linear parameter values, unit-stripped to the model’s linear parameter units (see _linear_param_units()).

  • data (AbstractData) – Runtime observation data.

Return type:

Array

Returns:

Scalar \(\chi^2\).

extensions: tuple[AbstractExtension, ...] = ()
log_prob(nl_values, data, *, linear_priors=None, linear_values=None, marginalized_names=None)

Compute the log-likelihood.

Three calling conventions are supported:

  1. Auto mode (recommended): pass linear_prior and let the model classify which linear params to marginalize. Non-Gaussian linear priors are expected as entries in nl_values.

  2. Manual marginalization: pass marginalized_names (and optionally linear_values) to control exactly which linear params are marginalized.

  3. Explicit evaluation: pass linear_values without marginalized_names (and linear_priors=None) to evaluate the Gaussian log-likelihood at fixed linear parameter values.

Parameters:
  • nl_values (dict[str, Any]) – Parameter values. In auto mode this may contain explicit linear parameter values alongside the nonlinear ones.

  • data (AbstractData) – Runtime observation data (RVData / GaiaAstrometryData / SystemData).

  • linear_priors (dict[str, Any] | None) – Per-parameter priors for analytic marginalization. Required for auto and manual-marginalization modes.

  • linear_values (dict[str, Array] | None) – Explicit linear parameter values (unit-stripped). When given without marginalized_names, triggers explicit evaluation.

  • marginalized_names (tuple[str, ...] | None) – Which linear params to marginalize (manual mode).

Return type:

Array

Returns:

Scalar log-likelihood.

numpyro_model(nonlinear_priors, data, linear_priors, *, marginalized=True, marginalized_names=None)

Build a numpyro model function for MCMC sampling.

Parameters:
  • nonlinear_priors (dict[str, Distribution | QuantityDistribution]) – Prior distributions for nonlinear parameters. Keys are parameter names (e.g. "period", "eccentricity"). Values are Distribution or QuantityDistribution.

  • data (AbstractData) – Runtime observation data (RVData / GaiaAstrometryData).

  • linear_priors (dict[str, Any] | None) – Per-parameter priors for the linear parameters. Required when any marginalization happens (marginalized=True or full non-marginalized mode that still needs explicit linear priors).

  • marginalized (bool) – If True (default), linear parameters are analytically marginalized and only nonlinear parameters are sampled. If False, all parameters are sampled explicitly.

  • marginalized_names (tuple[str, ...] | None) – Optional subset of linear parameter names to analytically marginalize when marginalized=True. None means use the model’s automatic prior-based classification.

Return type:

Callable[[], None]

Returns:

A no-argument callable suitable for numpyro.infer.MCMC.

parameterization: StandardGaiaAstrometry | ThieleInnesGaiaAstrometry = StandardGaiaAstrometry()
params_explicit(linear_priors)

Names of parameters that must be explicitly sampled.

Given a (resolved) linear_prior dict, returns all nonlinear parameters (orbital + extension, e.g. jitter) plus any linear parameters with non-Gaussian priors (e.g. parallax with a HalfNormal prior) that cannot be analytically marginalized.

These are the keys that must appear in the values dict passed to log_prob().

Parameters:

linear_priors (dict[str, Any] | None)

Return type:

tuple[str, ...]

params_marginalized(linear_priors)

Names of linear parameters analytically marginalized in log_prob.

Given a (resolved) linear_prior dict, returns the linear parameter names whose priors are Gaussian (or callable-returning-Normal) and so are integrated out analytically via the Woodbury identity rather than sampled. Their values are NOT required in the values dict; they are recovered afterward via sample_conditional_linear().

Parameters:

linear_priors (dict[str, Any] | None)

Return type:

tuple[str, ...]

pm_time_unit: str = 'yr'
predict(nl_values, linear_values, data)

Full predicted observable y_pred = X @ y at the data times.

X is the extension-augmented design matrix (_full_design_matrix()) and y is the ordered linear-parameter vector built from linear_values (missing entries default to 0). Used by _log_prob_explicit(), chi_squared(), and the plot functions so that every prediction path shares the same construction.

Parameters:
Return type:

Array

predict_orbit_sky(nl_values, linear_values, times)

Predicted sky-plane orbital offsets (dRA, dDec) at times.

Delegates to self.parameterization.sky_orbit(...). Returns the orbital contribution only — proper motion, parallax, and zero-point are not included. This is what the on-sky orbit ellipse panel of plot_gaia_sky_orbit() plots.

The returned arrays are in the same unit as the scalar semi-major-axis linear param ("semi_major_axis" for StandardGaiaAstrometry, or the TI constants for ThieleInnesGaiaAstrometry).

Parameters:
Return type:

tuple[Array, Array]

sample_conditional_linear(nl_values, key, data, *, linear_priors=None, marginalized_names=None, explicit_linear=None, use_mean=False)

Sample linear parameters from the conditional posterior.

In auto mode (both marginalized_names and explicit_linear are None), the method classifies from linear_prior and extracts explicit linear values from nl_values.

Returns all linear parameter values (both sampled and explicit), unit-stripped.

When use_mean=True, the conditional posterior mean is returned for the marginalized linear parameters instead of a random draw. For a Gaussian conditional this is also the conditional MAP. This is the appropriate choice when completing a MAP estimate (see optimize()); MCMC paths should keep the default use_mean=False.

Parameters:
Return type:

dict[str, Array]

class harv.JointModel

Bases: Module

Composition of component models that share orbital parameters.

We recommend using the JointModel factory methods like for_rv_and_gaia() or for_sb2().

Parameters:
  • components (dict[str, AbstractComponentModel]) – Named component models. Keys are used to namespace component-specific parameters (e.g. "rv.jitter").

  • shared_params (tuple[str, ...]) – Names of orbital parameters shared across all components (e.g. ("period", "eccentricity", "phase_peri", "arg_peri")).

  • shared_linear_params (tuple[str, ...])

__init__(components, shared_params, shared_linear_params=())
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property component_names: tuple[str, ...]

Names of the components in this joint model.

classmethod for_rv_and_gaia(components, *, shared_params=None, shared_linear_params=None)

Build a JointModel for combined RV + Gaia astrometry.

TODO: if we want to support more complex shared parameters, look at for_sb2 to see what we can generalize

Parameters:
  • components (dict[str, AbstractComponentModel]) – RV and Gaia astrometry component models (e.g. {"rv": ..., "astro": ...}).

  • shared_params (tuple[str, ...] | None) – Override the default shared orbital parameters. Defaults to ("period", "eccentricity", "phase_peri", "arg_peri").

  • shared_linear_params (tuple[str, ...] | None) – Linear parameter names shared across components. Defaults to () (no shared linear params for heterogeneous joint models).

Return type:

JointModel

Returns:

The constructed JointModel.

classmethod for_sb2(prior, *, component_names=('primary', 'secondary'), extensions=(), shared_params=None, shared_linear_params=None)

Build an SB2 JointModel from a prior.

The prior is expected to follow the convention of default_sb2_prior(): linear-prior keys name1.rv_semiamp and name2.rv_semiamp map to the per-component rv_semiamp of the components, named “name1” and “name2” in this example, but they can be customized. Other linear-prior keys (e.g. v_sys) are automatically declared shared across components.

Data is not bound to the model; pass it at sampler.run(data, ...) time.

Parameters:
  • prior (Any) – SB2-style prior. Keys for named component-specific parameters (e.g., rv_semiamp) must correspond to the component names in component_names. For example, with the default names “primary” and “secondary”, the prior must have keys “primary.rv_semiamp” and “secondary.rv_semiamp”. Other linear parameters (e.g. “v_sys”) are automatically treated as shared across components.

  • component_names (tuple[str, ...]) – Names of the two SB2 components. Defaults to ("primary", "secondary").

  • extensions (tuple[Any, ...] | dict[str, tuple[Any, ...]]) –

    Extensions to attach to each component.

    • A bare tuple is applied to all components.

    • A dict is keyed by component name; missing keys yield no extensions for that component.

  • shared_params (tuple[str, ...] | None) – Defaults to the standard nonlinear shared orbital params. For example, “period”, “eccentricity”, “phase_peri”, and “arg_peri”.

  • shared_linear_params (tuple[str, ...] | None) – Defaults to every key in prior.linear_prior except the rv_semiamp keys.

Return type:

JointModel

Returns:

The constructed JointModel.

Examples

>>> from unxt import Q
>>> from harv.models.joint import JointModel
>>> from harv.models.priors import default_sb2_prior
>>> prior = default_sb2_prior(
...     period_min=Q(10., "day"), period_max=Q(1000., "day"),
...     sigma_K0=Q(30., "km/s"), sigma_v0=Q(50., "km/s"),
... )
>>> joint = JointModel.for_sb2(prior=prior)
>>> joint.shared_linear_params
('v_sys',)
log_prob(nl_values, data, *, linear_priors=None, marginalized_names=None)

Compute the joint log-likelihood.

When shared_linear_params contains names that are being analytically marginalized, a single joint MarginalizedLinear is built spanning all components (the joint path). Otherwise the per-component log-likelihoods are summed as before.

Parameters:
  • nl_values (dict[str, Any]) – Flat dict of parameter values. Shared orbital params use bare names ("period", "eccentricity", etc.). Component-specific nonlinear params use "component.param" convention (e.g. "rv.jitter").

  • data (AbstractDatasetContainer) – Per-component data, indexed by component name.

  • linear_priors (dict[str, Any] | None) – Flat merged linear-prior dict. None means treat all linear params as marginalizable.

  • marginalized_names (tuple[str, ...] | None) – Optional linear parameter names to marginalize. Component-qualified names are accepted (e.g. "rv.parallax" or just "parallax" if unambiguous).

Return type:

Array

Returns:

Scalar log-likelihood.

numpyro_model(nonlinear_priors, data, linear_priors, *, marginalized=True, marginalized_names=None)

Build a numpyro model for MCMC sampling of the joint model.

Parameters:
  • nonlinear_priors (dict[str, Distribution | QuantityDistribution]) – Prior distributions for all nonlinear parameters. Shared orbital params use bare names. Component-specific params use "component.param" convention.

  • data (AbstractDatasetContainer) – Per-component data, indexed by component name.

  • linear_priors (dict[str, Any] | None) – Flat merged linear-prior dict.

  • marginalized (bool) – If True (default), linear parameters are marginalized per-component. If False, all parameters are sampled explicitly.

  • marginalized_names (tuple[str, ...] | None) – Optional linear parameter names to marginalize when marginalized=True. Component-qualified names are accepted.

Return type:

Callable[[], None]

Returns:

A numpyro model function suitable for use with a sampler.

params_explicit(linear_priors)

Names of parameters that must be explicitly sampled.

Shared nonlinear params use bare names (e.g. "period"). Component-specific nonlinear params use "comp.param" notation (e.g. "rv.jitter"). Explicit-linear params (non-Gaussian priors, e.g. "parallax") are listed flat without namespace prefix, matching how they appear in the log_prob values dict.

Parameters:

linear_priors (dict[str, Any] | None) – The merged linear-prior dict (with "comp.param" qualified keys for non-shared params). None means treat all linear params as marginalizable.

Return type:

tuple[str, ...]

params_marginalized(linear_priors)

Names of linear parameters analytically marginalized across all components.

De-duplicated; order follows component iteration order.

Parameters:

linear_priors (dict[str, Any] | None) – The merged linear-prior dict. None means treat all linear params as marginalizable.

Return type:

tuple[str, ...]

sample_conditional_linear(nl_values, key, data, *, linear_priors=None, marginalized_names=None, use_mean=False)

Sample conditional linear params for each component.

When shared_linear_params are jointly marginalized, shared parameters appear at the top level of the returned dict (with bare names) rather than inside per-component sub-dicts.

Parameters:
  • nl_values (dict[str, Any]) – Flat parameter values dict.

  • key (Array) – JAX PRNG key.

  • data (AbstractDatasetContainer) – Per-component data, indexed by component name.

  • linear_priors (dict[str, Any] | None) – Flat merged linear-prior dict.

  • marginalized_names (tuple[str, ...] | None) – Optional linear parameter names to marginalize.

  • use_mean (bool) – When True, return the conditional posterior mean for the marginalized linear parameters instead of a random draw. For a Gaussian conditional this is also the conditional MAP. Default False.

Return type:

dict[str, Any]

Returns:

Sampled parameter values. The structure depends on the path:

  • Default path (no shared marginalization): dict[comp_name, dict[param_name, array]].

  • Joint path (shared marginalization): mixed dict where shared params are top-level and per-component params are in sub-dicts keyed by component name.

shared_linear_params: tuple[str, ...] = ()
components: dict[str, AbstractComponentModel]
shared_params: tuple[str, ...]
class harv.ParamInfo

Bases: Module

Immutable descriptor for a single model parameter.

Parameters:
  • name (str) – Parameter name. Must not contain "." (reserved for JointModel tied-parameter paths).

  • unit (str) – Unit string (e.g. "day", "km/s", "" for dimensionless).

  • linear (bool) – Whether the parameter enters the model linearly (default False).

Examples

>>> from harv.models.extensions.base import ParamInfo
>>> p = ParamInfo("period", "time")
>>> p.name
'period'
>>> p.linear
False
__init__(name, unit, linear=False)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

linear: bool = False
name: str
unit: str
class harv.RVModel

Bases: AbstractComponentModel

Radial-velocity component model (template).

Carries a parameterization and extensions. Data and the linear prior are passed at evaluation time so the same model instance can be reused across multiple datasets.

Parameters:

Examples

>>> from harv.models.rv import RVModel
>>> model = RVModel()
>>> sorted(model._all_nonlinear_names())
['arg_peri', 'eccentricity', 'period', 'phase_peri']
__init__(parameterization=StandardRV(), extensions=())
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

chi_squared(nl_values, linear_values, data)

Goodness-of-fit \(\chi^2\) for one fully-specified parameter set.

Unlike log_prob() (which marginalizes the linear parameters and returns a marginal log-likelihood), this evaluates the model at the given linear-parameter values and returns the residual statistic

\[\chi^2 = r^\top C^{-1} r, \qquad r = y_\mathrm{obs} - X\,y,\]

where \(C\) is the (extension-modified) observation covariance. For a diagonal covariance this is \(\sum_i r_i^2 / \sigma_i^2\); a Gaussian-process extension promotes \(C\) to a full matrix and the Mahalanobis form is used. Jitter is included via the inflated \(C\).

Parameters:
  • nl_values (dict[str, Any]) – Nonlinear parameter values (orbital + any extension parameters), in the same form accepted by log_prob().

  • linear_values (dict[str, Array]) – Linear parameter values, unit-stripped to the model’s linear parameter units (see _linear_param_units()).

  • data (AbstractData) – Runtime observation data.

Return type:

Array

Returns:

Scalar \(\chi^2\).

extensions: tuple[AbstractExtension, ...] = ()
log_prob(nl_values, data, *, linear_priors=None, linear_values=None, marginalized_names=None)

Compute the log-likelihood.

Three calling conventions are supported:

  1. Auto mode (recommended): pass linear_prior and let the model classify which linear params to marginalize. Non-Gaussian linear priors are expected as entries in nl_values.

  2. Manual marginalization: pass marginalized_names (and optionally linear_values) to control exactly which linear params are marginalized.

  3. Explicit evaluation: pass linear_values without marginalized_names (and linear_priors=None) to evaluate the Gaussian log-likelihood at fixed linear parameter values.

Parameters:
  • nl_values (dict[str, Any]) – Parameter values. In auto mode this may contain explicit linear parameter values alongside the nonlinear ones.

  • data (AbstractData) – Runtime observation data (RVData / GaiaAstrometryData / SystemData).

  • linear_priors (dict[str, Any] | None) – Per-parameter priors for analytic marginalization. Required for auto and manual-marginalization modes.

  • linear_values (dict[str, Array] | None) – Explicit linear parameter values (unit-stripped). When given without marginalized_names, triggers explicit evaluation.

  • marginalized_names (tuple[str, ...] | None) – Which linear params to marginalize (manual mode).

Return type:

Array

Returns:

Scalar log-likelihood.

numpyro_model(nonlinear_priors, data, linear_priors, *, marginalized=True, marginalized_names=None)

Build a numpyro model function for MCMC sampling.

Parameters:
  • nonlinear_priors (dict[str, Distribution | QuantityDistribution]) – Prior distributions for nonlinear parameters. Keys are parameter names (e.g. "period", "eccentricity"). Values are Distribution or QuantityDistribution.

  • data (AbstractData) – Runtime observation data (RVData / GaiaAstrometryData).

  • linear_priors (dict[str, Any] | None) – Per-parameter priors for the linear parameters. Required when any marginalization happens (marginalized=True or full non-marginalized mode that still needs explicit linear priors).

  • marginalized (bool) – If True (default), linear parameters are analytically marginalized and only nonlinear parameters are sampled. If False, all parameters are sampled explicitly.

  • marginalized_names (tuple[str, ...] | None) – Optional subset of linear parameter names to analytically marginalize when marginalized=True. None means use the model’s automatic prior-based classification.

Return type:

Callable[[], None]

Returns:

A no-argument callable suitable for numpyro.infer.MCMC.

parameterization: StandardRV | EcoswEsinwRV = StandardRV()
params_explicit(linear_priors)

Names of parameters that must be explicitly sampled.

Given a (resolved) linear_prior dict, returns all nonlinear parameters (orbital + extension, e.g. jitter) plus any linear parameters with non-Gaussian priors (e.g. parallax with a HalfNormal prior) that cannot be analytically marginalized.

These are the keys that must appear in the values dict passed to log_prob().

Parameters:

linear_priors (dict[str, Any] | None)

Return type:

tuple[str, ...]

params_marginalized(linear_priors)

Names of linear parameters analytically marginalized in log_prob.

Given a (resolved) linear_prior dict, returns the linear parameter names whose priors are Gaussian (or callable-returning-Normal) and so are integrated out analytically via the Woodbury identity rather than sampled. Their values are NOT required in the values dict; they are recovered afterward via sample_conditional_linear().

Parameters:

linear_priors (dict[str, Any] | None)

Return type:

tuple[str, ...]

predict(nl_values, linear_values, data)

Full predicted observable y_pred = X @ y at the data times.

X is the extension-augmented design matrix (_full_design_matrix()) and y is the ordered linear-parameter vector built from linear_values (missing entries default to 0). Used by _log_prob_explicit(), chi_squared(), and the plot functions so that every prediction path shares the same construction.

Parameters:
Return type:

Array

predict_at_times(times, nl_values, linear_values, *, t_ref, obs_unit='km/s')

Predicted RV at arbitrary times, no observed-data object required.

Internally constructs an RVData shim at times with dummy rv / rv_err (zeros / ones) and delegates to predict(). The dummy obs are never read by the prediction path (_full_design_matrix only consumes data.time and data.t_ref; extensions read at most those fields too). The returned array is in the same units the model’s linear parameters are expressed in.

Raises TypeError if any MultiSurveyOffset is in self.extensions — its indicator_matrix is fixed to the original data’s row count and cannot apply on an arbitrary time grid. Callers building a smooth-curve overlay should filter it out before calling.

Parameters:
  • times (Real[Quantity[PhysicalType('time')], '*batch'])

  • nl_values (dict[str, Any])

  • linear_values (dict[str, Array])

  • t_ref (Real[Quantity[PhysicalType('time')], ''])

  • obs_unit (str)

Return type:

Array

sample_conditional_linear(nl_values, key, data, *, linear_priors=None, marginalized_names=None, explicit_linear=None, use_mean=False)

Sample linear parameters from the conditional posterior.

In auto mode (both marginalized_names and explicit_linear are None), the method classifies from linear_prior and extracts explicit linear values from nl_values.

Returns all linear parameter values (both sampled and explicit), unit-stripped.

When use_mean=True, the conditional posterior mean is returned for the marginalized linear parameters instead of a random draw. For a Gaussian conditional this is also the conditional MAP. This is the appropriate choice when completing a MAP estimate (see optimize()); MCMC paths should keep the default use_mean=False.

Parameters:
Return type:

dict[str, Array]

class harv.AbstractSampler

Bases: Module

Abstract base for harv samplers.

Every sampler holds a prior and a pre-built model. Use the bare constructor to combine them:

sampler = RejectionSampler(prior, RVModel(extensions=...))
samples = sampler.run(data, n_prior_samples=100_000)

Concrete subclasses are marked @final per the project’s abstract-final pattern.

Parameters:
__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
prior: HarvPrior
model: AbstractComponentModel | JointModel
harv.make_prior_cache(prior, model, n_samples, filename, *, key, batch_size=100000, return_logprobs=False, marginalized_names=None)

Write n_samples prior draws to an HDF5 cache, in batches.

Each batch is drawn from an independent subkey via jax.random.fold_in(), so the on-disk order is i.i.d. and sequential reads of the file are statistically equivalent to random reads.

Parameters:
  • prior (HarvPrior) – Prior to draw from.

  • model (AbstractComponentModel | JointModel) – Component or joint model template defining which extension parameters need priors. See HarvPrior.sample().

  • n_samples (int) – Total number of samples to write.

  • filename (str | PathLike[str]) – Output HDF5 path.

  • key (Array) – JAX random key. Independent subkeys per batch are derived via jax.random.fold_in().

  • batch_size (int) – Number of samples generated and written per chunk. Memory usage scales with batch_size, not n_samples. Default: 100,000.

  • return_logprobs (bool) – If True, write ln_prior to the cache so it can be reused by downstream tools without recomputation.

  • marginalized_names (tuple[str, ...] | None) – Forwarded to HarvPrior.sample(). Must match the marginalized_names of the RejectionSampler that will consume the cache, so the explicit-linear keys written here match those RejectionSampler.run_with_samples() expects. Leave as None (default) for the default marginalization.

Return type:

None

Examples

>>> import jax
>>> from unxt import Q
>>> from harv import HarvPrior, RVModel, StandardRV
>>> from harv.samplers import make_prior_cache
>>> prior = StandardRV().default_prior(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> make_prior_cache(
...     prior, RVModel(), 1_000_000, "cache.h5",
...     key=jax.random.key(0), batch_size=100_000,
... )
class harv.NumpyroSampler

Bases: AbstractSampler

MCMC sampler for Keplerian orbital parameters using numpyro.

Builds a numpyro model from a component model (or joint model) and runs warm-started MCMC from rejection-sampler output, returning a Samples container. Data is passed to run() rather than at construction, so the same configured sampler can be applied to multiple datasets.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • parameterization – Orbital parameterization. For RV data defaults to StandardRV. Ignored for Gaia astrometry data.

  • extensions – Model extensions (jitter, trends, offsets, GP) supplied at construction time.

Examples

>>> from unxt import Q
>>> from harv.models import StandardRV
>>> from harv.samplers import NumpyroSampler
>>> prior = StandardRV().default_prior(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> mcmc = NumpyroSampler(prior)
>>> mcmc_samples = mcmc.run(
...     data, init_samples=rej_samples, num_warmup=500, num_samples=1000
... )
Parameters:
__init__(prior, model, marginalized_names=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
optimize(samples, data, *, seed=None, max_passes=10, tol=0.0001)

Refine each input sample to the local posterior MAP via BFGS.

Uses numpyro.optim.Minimize() (BFGS via jax.scipy.optimize.minimize()) with an AutoDelta guide to find the local mode of log_prior + marginal_log_likelihood starting from each sample in samples. The returned Samples has the refined nonlinear values, linear values set to the conditional posterior mean (equal to the conditional MAP since the conditional is Gaussian) at the refined nonlinear point, and populated ln_likelihood / ln_prior.

Particularly useful when rejection sampling returns a single sample inside a posterior mode – this moves it from a random acceptance point to the mode peak.

Parameters:
  • samples (Samples) – Posterior samples (e.g. from RejectionSampler.run()) used as warm starts. Each sample seeds an independent BFGS run.

  • data (AbstractData | AbstractDatasetContainer) – Observed data (same shape/type as run()).

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • max_passes (int) – Maximum number of BFGS restarts per sample. The wrapped jax.scipy.optimize.minimize BFGS often quits early when its line search fails; restarting from the previous result frequently recovers further progress. Default 10.

  • tol (float) – Loss-change tolerance for the BFGS-restart loop. If the loss decreases by less than tol on a pass, optimization is considered converged. Default 1e-4.

Return type:

Samples

Returns:

Refined samples with ln_likelihood and ln_prior populated.

Notes

Only the marginalized path is supported (matches run(marginalized=True)). extra_model is not supported.

Emits UserWarning for any sample that does not converge within max_passes BFGS restarts; the returned point for such a sample may not be the local MAP.

run(data, *, init_samples=None, seed=None, marginalized=True, extra_model=None, extra_init_params=None, kernel=None, num_chains=4, num_warmup=500, num_samples=1000, chain_method='parallel', return_logprobs=False)

Run MCMC warm-started from rejection-sampler output.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data (RVData, GaiaAstrometryData, or SystemData for joint models).

  • init_samples (Samples | None) – Posterior samples produced by rejection sampling, used to set the initial positions for each MCMC chain.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • marginalized (bool) – If True (default), linear parameters are analytically marginalized in the likelihood and conditionally sampled afterward. If False, all parameters are sampled jointly.

  • extra_model (Callable[[dict[str, Any]], dict[str, Any]] | None) – A function (pars: dict) -> dict that replaces specific linear parameters with deterministic functions of new physical parameters.

  • extra_init_params (dict[str, Any] | None) – Initial values for parameters introduced by extra_model.

  • kernel (type | None) – A numpyro MCMC kernel class. Defaults to numpyro.infer.NUTS.

  • num_chains (int) – Number of independent MCMC chains. Default: 4.

  • num_warmup (int) – Number of warmup (burn-in) steps per chain. Default: 500.

  • num_samples (int) – Number of posterior samples per chain. Default: 1000.

  • chain_method (str) – How to run chains: "parallel", "sequential", or "vectorized". Default: "parallel".

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood, re-evaluated via model.log_prob) and ln_prior (the summed nonlinear-prior log-density). Enables Samples.map_sample() and Samples.ln_posterior. Requires marginalized=True and no extra_model. Default False.

Return type:

Samples

Returns:

Posterior samples container with nonlinear and linear parameters.

prior: HarvPrior
model: AbstractComponentModel | JointModel
class harv.HarvPrior

Bases: Module

Prior distribution for rejection sampling of Keplerian orbits.

This class encapsulates the prior distributions for both nonlinear and linear parameters. It is agnostic to data type - the sampler determines which parameters are required based on the provided data.

We recommend using the “default” factory constructors (e.g. default_rv(), default_gaia_astrometry(), etc.), which set up sensible priors for common use cases.

Nonlinear parameterization:

Parameter names in nonlinear_priors must match the field names of the parameterization, for example, period, eccentricity, phase_peri, etc. These parameters are sampled explicitly.

See the options available in harv.models.parameterizations.

Default parameterizations:

Radial Velocity:
  • Nonlinear: period, eccentricity, phase_peri, arg_peri

  • Linear: rv_semiamp, v_sys

Astrometry:
  • Nonlinear: period, eccentricity, phase_peri, cos_i, arg_peri, lon_asc_node

  • Linear params: ra0, dec0, pmra, pmdec, parallax, semi_major_axis

Parameters:
  • nonlinear_priors (dict[str, Distribution | QuantityDistribution]) – Mapping from parameter name to its prior distribution (a bare dist.Distribution for dimensionless parameters, or a harv.distributions.QuantityDistribution wrapper for parameters with physical units).

  • linear_priors (dict[str, Distribution | QuantityDistribution | Callable[..., QuantityDistribution | Normal]]) –

    Per-parameter priors for linear parameters. Each entry is classified:

    • dist.Normal or QD(Normal) – Gaussian, can be analytically marginalized.

    • LinearPriorCallable – called with nonlinear params to produce a Normal, can be marginalized.

    • dist.HalfNormal, dist.Delta, etc. – non-Gaussian, sampled explicitly alongside nonlinear params.

    When using default_rv() with offsets, the non-reference offset priors are automatically included as linear parameters.

  • extension_priors (dict[str, Distribution | QuantityDistribution]) – Priors for extension parameters declared via extra_params().

__init__(nonlinear_priors, linear_priors, extension_priors=<factory>)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

classmethod from_parameterization(parameterization, **kwargs)

Build a default prior for any supported parameterization.

Delegates to parameterization.default_prior(**kwargs). Each concrete parameterization declares its own required scale arguments (e.g. sigma_K0 for RV, sigma_a0 for astrometry).

Parameters:
Return type:

HarvPrior

Examples

>>> from unxt import Q
>>> from harv.models.parameterizations.rv import EcoswEsinwRV
>>> from harv.samplers import HarvPrior
>>> prior = HarvPrior.from_parameterization(
...     EcoswEsinwRV(),
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> sorted(prior.nonlinear_priors.keys())
['ecosw', 'esinw', 'period', 'phase_peri']
sample(key, n_samples, *, model, return_logprobs=False, marginalized_names=None)

Draw a complete prior sample for a given model.

Unlike sample_nonlinear(), this draws every parameter the rejection sampler would consume for model:

  • base nonlinear params from nonlinear_priors

  • extension nonlinear params (e.g. jitter, GP hypers) discovered by walking model.extensions

  • any linear params from linear_priors that are sampled explicitly rather than analytically marginalized. With the default marginalized_names=None this is just the non-Gaussian linear priors (e.g. HalfNormal-prior parallax); when marginalized_names is given it is every linear param not in that set (see below).

The returned Samples is the same container the rejection sampler produces for posteriors, with units restored from each QuantityDistribution. Its linear field is empty in the common Gaussian-linear case. ln_likelihood is always None (no data yet); ln_prior is populated when return_logprobs=True, summing the nonlinear (base + extension) prior log-densities — matching the convention used by RejectionSampler.run().

model is required because the set of extension and explicit-linear parameters depends on the extensions attached to the model.

Parameters:
  • key (Array) – JAX random key.

  • n_samples (int) – Number of prior draws.

  • model (AbstractComponentModel | JointModel) – Component or joint model template defining the extensions and linear-prior classification.

  • return_logprobs (bool) – If True, populate Samples.ln_prior with the per-sample nonlinear-prior log-density.

  • marginalized_names (tuple[str, ...] | None) – Which linear params are analytically marginalized (and therefore not drawn here), mirroring RejectionSampler.marginalized_names. Must match the consuming sampler so the explicit-linear keys agree. None (default) marginalizes every linear param that can be (the Gaussian ones), leaving only non-Gaussian priors explicit.

Returns:

A container holding the full prior draw, suitable for passing to RejectionSampler.run_with_samples() or for caching via harv.samplers.make_prior_cache().

Return type:

Samples

Examples

>>> import jax
>>> from unxt import Q
>>> from harv import HarvPrior, RVModel, StandardRV
>>> prior = StandardRV().default_prior(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> samples = prior.sample(jax.random.key(0), 100, model=RVModel())
>>> samples.n_samples
100
>>> samples.data_type
'RVModel'
sample_nonlinear(key, n_samples)

Sample nonlinear parameters from priors.

Parameters:
  • key (Array) – Random key for sampling.

  • n_samples (int) – Number of samples to draw.

Return type:

dict[str, Any]

Returns:

Dictionary mapping each parameter name to an array of shape (n_samples,). Values are bare JAX arrays regardless of whether the distribution is wrapped in QuantityDistribution.

Examples

>>> import jax
>>> from unxt import Q
>>> from harv.samplers import HarvPrior
>>> sorted(
...     HarvPrior.default_rv(
...         period_min=Q(2.0, "day"),
...         period_max=Q(1000.0, "day"),
...         sigma_K0=Q(30.0, "km/s"),
...         sigma_v0=Q(50.0, "km/s"),
...     ).sample_nonlinear(jax.random.key(0), 10).keys()
... )
['arg_peri', 'eccentricity', 'period', 'phase_peri']
>>> HarvPrior.default_rv(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... ).sample_nonlinear(jax.random.key(0), 10)["period"].shape
(10,)
nonlinear_priors: dict[str, Distribution | QuantityDistribution]
linear_priors: dict[str, Distribution | QuantityDistribution | Callable[..., QuantityDistribution | Normal]]
extension_priors: dict[str, Distribution | QuantityDistribution]
class harv.RejectionSampler

Bases: AbstractSampler

Rejection sampler for Keplerian orbital parameters.

Implements rejection sampling with analytic marginalization over linear parameters. Configure once, then call run() with each dataset.

Parameters:
  • prior (HarvPrior) – Prior distributions for nonlinear (and optionally linear) parameters.

  • model (AbstractComponentModel | JointModel) – Fully constructed model template (no data, no linear_priors).

  • marginalized_names (tuple[str, ...] | None) – Linear parameter names to analytically marginalize. If None, all Gaussian linear parameters are auto-classified for marginalization.

  • batch_size (int) – Number of samples to process per batch. Smaller values use less memory but may be slower. Default: 100_000.

Examples

>>> from unxt import Q
>>> import jax.numpy as jnp
>>> from harv import HarvPrior, RejectionSampler, RVData
>>> from harv.models.rv import RVModel
>>> data = RVData(
...     time=Q(jnp.linspace(0, 100, 5), "day"),
...     rv=Q(jnp.zeros(5), "km/s"),
...     rv_err=Q(jnp.full(5, 1.0), "km/s"),
... )
>>> prior = HarvPrior.default_rv(
...     period_min=Q(2.0, "day"),
...     period_max=Q(1000.0, "day"),
...     sigma_K0=Q(30.0, "km/s"),
...     sigma_v0=Q(50.0, "km/s"),
... )
>>> sampler = RejectionSampler(prior, RVModel())
>>> samples = sampler.run(data, n_prior_samples=100_000)
__init__(prior, model, marginalized_names=None, batch_size=100000)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

batch_size: int = 100000
get_extensions()

Return the extensions in effect for this sampler.

Walks the attached model: returns model.extensions for a single component model, or dict[component_name, tuple[Extension, ...]] for a JointModel so the per-component association used for namespaced parameter names like "primary.jitter" is preserved. Use this in any downstream consumer (e.g. harv.plot.plot_rv()) that needs to know which extensions are in play.

Return type:

tuple[AbstractExtension, ...] | dict[str, tuple[AbstractExtension, ...]]

marginalized_names: tuple[str, ...] | None = None
run(data, *, n_prior_samples, max_posterior_samples=None, seed=None, ignore_non_finite=False, return_logprobs=False)

Run rejection sampling.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data: an AbstractData subclass (e.g. RVData, GaiaAstrometryData) for single-component models, or an AbstractDatasetContainer (e.g. SystemData, SourceData) for JointModel.

  • n_prior_samples (int) – Number of samples to draw from the prior.

  • max_posterior_samples (int | None) – Maximum number of posterior samples to return. If None, returns all accepted samples.

  • seed (int | None) – Random number seed. If not specified, picks a seed based on the current time.

  • ignore_non_finite (bool) – If True, any NaN or infinite log-likelihood values are treated as rejected samples by replacing them with -inf before the rejection step. If False (default), non-finite values are left unchanged.

  • return_logprobs (bool) – If True, store per-sample log-probabilities on the returned Samples: ln_likelihood (the marginal log-likelihood) and ln_prior (the summed nonlinear prior log-density). These enable Samples.map_sample() and the Samples.ln_posterior property. Default False.

Return type:

Samples

Returns:

Posterior samples container.

run_with_samples(data, prior_samples, *, max_posterior_samples=None, seed=None, ignore_non_finite=False, return_logprobs=False, randomize_prior_order=True)

Run rejection against pre-computed prior samples.

Parallel to run() but skips the prior-sampling step: the caller supplies the prior draws either in memory (a Samples returned by HarvPrior.sample()) or as a path to an HDF5 file produced by make_prior_cache(). Useful when the same prior library is reused across many datasets — generate once, evaluate many times.

Parameters:
  • data (AbstractData | AbstractDatasetContainer) – Observed data; same conventions as run().

  • prior_samples (Samples | str | PathLike[str]) – Either a Samples instance (in-memory cache, e.g. from prior.sample(...)) or an str / os.PathLike path to an HDF5 file. The HDF5 path is streamed batch by batch, so the file may be much larger than RAM.

  • max_posterior_samples (int | None) – See run().

  • seed (int | None) – See run().

  • ignore_non_finite (bool) – See run().

  • return_logprobs (bool) – See run().

  • randomize_prior_order (bool) – Disk-streaming branch only: when True (default), batches are read from the HDF5 file in a random order (drawn from seed). Each batch is still a single contiguous h5py slice, so disk I/O is unchanged. Set to False for strictly sequential reads (reproducibility / debugging). Ignored for the in-memory branch.

Returns:

Posterior samples, equivalent in shape and contents to run()’s return.

Return type:

Samples

prior: HarvPrior
model: AbstractComponentModel | JointModel
class harv.Samples

Bases: Module

Container for posterior samples.

Stores both nonlinear and linear parameter samples as Q objects with units baked in. Provides dict-like access, statistical summaries, and visualization tools.

Parameters:
  • nonlinear (dict[str, Quantity]) – Nonlinear parameter samples, one Q per parameter. Keys: "period", "eccentricity", "phase_peri", and optionally "arg_peri", "cos_i", "lon_asc_node". Units: period has time units; angles have "rad"; dimensionless parameters have unit "".

  • linear (dict[str, Quantity]) – Linear parameter samples, one Q per parameter. Keys: e.g. "rv_semiamp", "v_sys" for RV; "ra0", "dec0", "pmra", "pmdec", "parallax", "semi_major_axis" for astrometry. Units are data-driven (e.g. "km/s" for RV).

  • data_type (str) – Informational label identifying the model that produced these samples (e.g. "RVModel", "GaiaAstrometryModel", "JointModel"). Stored in HDF5 for round-tripping.

  • metadata (dict[str, Any]) – Additional metadata (t_ref, num_chains, acceptance rate, etc.).

  • linear_extension_names (tuple[str, ...]) – Names of linear parameters introduced by extensions (instrument offsets, polynomial trends, etc.) beyond the base linear set.

Examples

Samples is normally produced by run(), but can be constructed directly for testing or manual use:

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={
...         "period": Q([100.0, 101.0, 99.5], "day"),
...         "eccentricity": Q([0.1, 0.15, 0.12], ""),
...         "phase_peri": Q([0.3, 0.31, 0.29], ""),
...         "arg_peri": Q([1.0, 1.1, 0.9], "rad"),
...     },
...     linear={
...         "rv_semiamp": Q([10.0, 11.0, 9.5], "km/s"),
...         "v_sys": Q([5.0, 5.1, 4.9], "km/s"),
...     },
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> samples.n_samples
3
>>> "period" in samples
True
>>> samples.keys()[:4]
['period', 'eccentricity', 'phase_peri', 'arg_peri']
Parameters:
__init__(nonlinear, linear, data_type='', metadata=<factory>, linear_extension_names=(), ln_likelihood=None, ln_prior=None)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

binary_mass_function()

Binary mass function from the RV orbital elements.

See harv.kepler.masses.binary_mass_function().

Return type:

Quantity

Returns:

The binary mass function (a Q in solar masses), one per sample.

Raises:

KeyError – If the samples do not contain rv_semiamp (not an RV fit).

chi2(data, model)

Per-sample goodness-of-fit \(\chi^2\) against the data.

For each posterior sample, evaluates the model prediction at the stored parameter values and returns \(\chi^2 = r^\top C^{-1} r\) (see chi_squared()). Jitter and other extension noise terms are included via C.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit (RVModel or GaiaAstrometryModel); provides the prediction and noise model.

Return type:

Array

Returns:

Array of shape (n_samples,).

companion_mass(m1, *, sini=None)

Companion mass \(m_2\) given the primary mass.

For RV samples the mass function is binary_mass_function(); sini defaults to 1 (the minimum companion mass). For astrometry samples the dark-companion astrometric mass function is used and sini is ignored (the inclination is already encoded in the physical orbit size).

Parameters:
  • m1 (Quantity) – Primary mass (a Q).

  • sini (float | None) – Sine of the inclination, for RV samples only. Default 1 (edge-on).

Return type:

Quantity

Returns:

The companion mass (a Q in solar masses), one per sample.

convert_parameterization(*, source, target)

Convert stored values between supported parameterizations.

Wraps harv.samplers.convert_parameterization(), returning a new Samples with metadata, data_type, and linear_extension_names preserved. The initial implementation supports single-component RV and Gaia astrometry parameterizations only.

Parameters:
Return type:

Samples

data_type: str = ''
classmethod from_hdf5(filename)

Load samples from HDF5 file.

Parameters:

filename (str | Path) – Input HDF5 filename.

Return type:

Samples

Returns:

Loaded samples object.

Examples

>>> samples = Samples.from_hdf5("posterior_samples.h5")
>>> samples.n_samples
42
>>> samples.data_type
'rv'
keys()

All available parameter names (nonlinear + linear + derived).

Return type:

list[str]

linear_extension_names: tuple[str, ...] = ()
ln_likelihood: Array | None = None
property ln_posterior: Array

Per-sample log-posterior density, ln_prior + ln_likelihood.

Both ln_prior and ln_likelihood must have been stored (run the sampler with return_logprobs=True); otherwise ValueError is raised.

ln_prior: Array | None = None
map_sample(*, return_index=False)

Return the maximum a posteriori (MAP) sample.

Selects the single sample with the highest ln_posterior.

Parameters:

return_index (bool) – If True, also return the integer index of the MAP sample.

Return type:

Samples | tuple[Samples, int]

Returns:

A length-1 Samples with the MAP sample, or (Samples, index) when return_index is True.

Raises:

ValueError – If per-sample log-probabilities were not stored (run the sampler with return_logprobs=True).

max_phase_gap(data)

Largest gap in orbital-phase coverage, per sample.

The maximum gap between consecutive observations on the (circular) phase axis – the ESA Gaia “maximum phase gap” statistic.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

median(key=None)

Compute median values for parameters.

Parameters:

key (str | None) – If provided, return median for this parameter only. If None, return dict of medians for all parameters.

Return type:

dict[str, AbstractQuantity | Array] | AbstractQuantity | Array

Returns:

Median value(s).

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> med = samples.median("period")
>>> med.unit
Unit("d")
>>> all_medians = samples.median()
>>> "period" in all_medians
True
property meta: _MetadataView

Q-aware read-only view over metadata.

metadata itself stores quantity-valued entries in split form (<name> value + <name>_unit string) so the static field never holds a JAX array. Use samples.meta[name] to read those entries back as Q instances without manually reassembling the two halves; plain scalar entries (e.g. num_chains) round-trip unchanged. See _MetadataView for the full read API.

minimum_companion_mass(m1)

Minimum companion mass (edge-on, sin i = 1).

Convenience wrapper for companion_mass() with sini=1.

Parameters:

m1 (Quantity) – Primary mass (a Q).

Return type:

Quantity

property n_samples: int

Number of posterior samples.

percentile(key, percentiles=(16, 50, 84))

Compute percentiles for a parameter.

Parameters:
  • key (str) – Parameter name.

  • percentiles (list[float] | tuple[float, ...]) – Percentile values to compute (0-100). Default: (16, 50, 84) which corresponds to the 16th, 50th, 84th percentiles for Gaussian.

Return type:

list[AbstractQuantity | Array]

Returns:

Percentile values with appropriate units.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> p16, p50, p84 = samples.percentile("eccentricity")
>>> len(samples.percentile("period", [5, 50, 95]))
3
period_modes(data, n_clusters=2)

Cluster the period samples and test each mode for unimodality.

Runs K-means on log(period) (experimental; see thejoker’s is_P_Kmodal). Requires the optional scikit-learn dependency.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_clusters (int) – Number of period modes to cluster into. Default 2.

Return type:

tuple[bool, Quantity, ndarray]

Returns:

(all_unimodal, mode_periods, n_per_mode) – whether every mode is individually unimodal, the median period of each mode (a Q), and the sample count per mode.

period_unimodal(data)

Whether the period samples lie within a single mode.

Uses the criterion ptp(P) < 4 * P_min**2 / (2*pi*T), where T is the data time span – the period spacing at which adjacent aliases become resolvable (see thejoker’s is_P_unimodal).

Parameters:

data (AbstractData) – The data the samples were fit to (for the observed time span).

Return type:

bool

periods_spanned(data)

Number of orbital periods spanned by the data, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

phase_coverage(data, n_bins=10)

Fraction of phase bins containing at least one observation.

The ESA Gaia “phase coverage” statistic, per sample.

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • n_bins (int) – Number of equal-width phase bins. Default 10.

Return type:

ndarray

Returns:

Array of shape (n_samples,); values in [0, 1].

phase_coverage_per_period(data)

Maximum number of observations within any single period, per sample.

Parameters:

data (AbstractData) – The data the samples were fit to.

Return type:

ndarray

Returns:

Array of shape (n_samples,).

plot_corner(params=None, truths=None, labels=None, **plot_kwargs)

Create corner plot of posterior samples using arviz.

Parameters:
  • params (list[str] | None) – Parameters to include in corner plot. If None, selects a default set based on data_type.

  • truths (dict[str, Any] | None) – Dictionary of true parameter values to overplot as reference values.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

  • **plot_kwargs (Any) – Additional keyword arguments passed to arviz.plot_pair().

Return type:

Any

Returns:

Array of axes from arviz.plot_pair.

Examples

Default corner plot (selects parameters based on data type):

>>> axes = samples.plot_corner()

Specify parameters and overplot true values:

>>> from unxt import Q
>>> axes = samples.plot_corner(
...     params=["period", "eccentricity"],
...     truths={"period": Q(100, "day"), "eccentricity": 0.3},
... )
reduced_chi2(data, model, *, dof=None)

Per-sample reduced \(\chi^2\) (\(\chi^2 / \mathrm{dof}\)).

Parameters:
  • data (AbstractData) – The data the samples were fit to.

  • model (Any) – The component model used for the fit.

  • dof (int | None) – Degrees of freedom. Defaults to n_obs - n_params, where n_params counts every fitted parameter (orbital + linear + extension). A well-fitting model gives reduced \(\chi^2\) near 1.

Return type:

Array

Returns:

Array of shape (n_samples,).

Raises:

ValueError – If dof (default or supplied) is not positive.

semi_major_axis_AU()

Physical semi-major axis (AU) from the angular orbit size + parallax.

See harv.kepler.masses.semi_major_axis_physical().

Return type:

Quantity

Returns:

The physical semi-major axis (a Q in AU), one per sample.

Raises:

KeyError – If the samples lack semi_major_axis or parallax.

summary(params=None)

Compute summary statistics for parameters.

For each parameter, computes: - median - mean - std (standard deviation) - percentiles (16th, 84th) for +/-1-sigma equivalent

Parameters:

params (list[str] | None) – List of parameter names to summarize. If None, summarizes all.

Return type:

dict[str, dict[str, Any]]

Returns:

Dictionary mapping parameter names to their statistics.

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 102.0], "day"),
...                "eccentricity": Q([0.1, 0.15], ""),
...                "phase_peri": Q([0.3, 0.31], ""),
...                "arg_peri": Q([1.0, 1.1], "rad")},
...     linear={"rv_semiamp": Q([10.0, 12.0], "km/s"),
...             "v_sys": Q([5.0, 5.2], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> summary = samples.summary(["period", "eccentricity"])
>>> sorted(summary.keys())
['eccentricity', 'period']
>>> sorted(summary["period"].keys())
['mean', 'median', 'p16', 'p84', 'std']
thiele_innes_to_campbell()

Convert Thiele-Innes linear parameters to Campbell orbital elements.

See campbell_from_thiele_innes() for the mathematical details of the conversion.

Returns:

New Samples with semi_major_axis, arg_peri, lon_asc_node, cos_i (replacing the four TI constants).

Return type:

Samples

to_arviz(params=None, labels=None)

Export samples to an arviz.InferenceData object.

Parameters:
  • params (list[str] | None) – Parameters to include. If None, all parameters returned by keys() are included.

  • labels (dict[str, str] | None) – Override display names for specific parameters, e.g. {"period": "period [day]", "rv_semiamp": "K [km/s]"}. Parameters not listed use their plain parameter name as the label.

Return type:

Any

Returns:

Inference data suitable for arviz.plot_pair, arviz.summary, etc.

Raises:

ImportError – If arviz is not installed.

Examples

>>> idata = samples.to_arviz(["period", "eccentricity"])
to_hdf5(filename)

Save samples to HDF5 file.

Parameters:

filename (str | Path) – Output HDF5 filename.

Return type:

None

Examples

Save posterior samples and reload them:

>>> samples.to_hdf5("posterior_samples.h5")
>>> reloaded = Samples.from_hdf5("posterior_samples.h5")
>>> reloaded.n_samples == samples.n_samples
True
wrap_angles()

Wrap negative rv_semiamp / semi_major_axis to positive.

Negative rv_semiamp (K) or semi_major_axis (a) describes an orbit that is physically identical to the all-positive case under two symmetries of the model:

  • arg_peri -> arg_peri + pi flips the sign of both K and a (the (omega, K, a) -> (omega + pi, -K, -a) symmetry);

  • lon_asc_node -> lon_asc_node + pi flips the sign of a alone (the astrometric (Omega, a) -> (Omega + pi, -a) symmetry; lon_asc_node does not enter the RV model).

This method returns a new Samples enforcing K >= 0 and a >= 0 in two steps:

  1. shift arg_peri by pi (flipping every rv_semiamp and semi_major_axis) on the samples where the trigger amplitude is negative — enforcing K >= 0;

  2. shift lon_asc_node by pi (flipping every semi_major_axis) on the samples where a is still negative — enforcing a >= 0.

Both shifted angles are wrapped to [0, 2*pi). A single arg_peri shift cannot make both K and a positive when their signs disagree (which routinely happens in joint RV + astrometry posteriors), so the second lon_asc_node shift is required.

Joint models (e.g. SB2) namespace per-component linear parameters as "component.param_name"; this method discovers every rv_semiamp- and semi_major_axis-suffixed key. The first rv_semiamp-suffixed key (insertion order) triggers step 1; semi_major_axis triggers step 1 only when no rv_semiamp is present (astrometry-only fits).

No-op when arg_peri is absent from nonlinear, when neither rv_semiamp nor semi_major_axis is in linear, or when no entries are negative.

Raises:

NotImplementedError – If the model has multiple per-component arg_peri or lon_asc_node keys (e.g. "primary.arg_peri" and "secondary.arg_peri"). All current harv joint factories share both, so this is not expected to arise in practice.

Return type:

Samples

Examples

>>> from unxt import Q
>>> from harv.samplers.samples import Samples
>>> samples = Samples(
...     nonlinear={"period": Q([100.0, 100.0], "day"),
...                "eccentricity": Q([0.1, 0.1], ""),
...                "phase_peri": Q([0.3, 0.3], ""),
...                "arg_peri": Q([1.0, 1.0], "rad")},
...     linear={"rv_semiamp": Q([-10.0, 10.0], "km/s"),
...             "v_sys": Q([0.0, 0.0], "km/s")},
...     data_type="rv",
...     metadata={"t_ref": 0.0, "t_ref_unit": "day"},
... )
>>> wrapped = samples.wrap_angles()
>>> bool((wrapped["rv_semiamp"].value >= 0).all())
True
Return type:

Samples

nonlinear: dict[str, Quantity]
linear: dict[str, Quantity]
metadata: dict[str, Any]

Subpackages

Submodules

harv.custom_types module

Custom types used in harv.

harv.custom_types.float_converter(x)

Converter for dimensionless scalar float fields.

Strips units from a dimensionless quantity or passes through plain scalars, always producing a 0-d JAX array. Use as an eqx.field converter wherever a dimensionless scalar is stored internally as a plain JAX float.

Parameters:

x (Float[Array, ''] | floating[Any] | float | int | Real[Quantity[PhysicalType('dimensionless')], ''])

Return type:

Float[Array, '']

harv.distributions module

Unit-aware wrapper for numpyro distributions.

This module provides QuantityDistribution, a thin wrapper that pairs a numpyro distribution with a physical unit string so that samples carry explicit units via unxt.Q.

harv.distributions.QD

alias of QuantityDistribution

class harv.distributions.QuantityDistribution

Bases: Module

Pairs a numpyro distribution with the physical unit of its samples.

For scalar distributions, unit is a single string. For multivariate distributions (e.g. MultivariateNormal over parameters with mixed units), unit is a tuple of strings – one per dimension.

Parameters:
  • distribution (Distribution) – The underlying numpyro distribution (works with bare floats).

  • unit (str | tuple[str, ...]) – Physical unit of the samples. A single string for scalar distributions; a tuple for multivariate distributions where each element may have a different unit.

Examples

Scalar (period in days):

>>> import jax
>>> import numpyro.distributions as dist
>>> from harv import QuantityDistribution
>>> qd = QuantityDistribution(dist.LogUniform(50., 2000.), "day")
>>> sample = qd.sample(jax.random.key(0))
>>> sample.unit
Unit("d")

Multivariate (astrometric linear parameters with mixed units):

qd = QuantityDistribution(
    dist.MultivariateNormal(loc=jnp.zeros(6), ...),
    ("mas", "mas", "mas/yr", "mas/yr", "mas", "mas"),
)
sample = qd.sample(key)  # -> raw jax.Array (consumer splits by name)
distribution: Distribution
unit: str | tuple[str, ...]
sample(key, sample_shape=())

Sample from the distribution, attaching units when possible.

For scalar units (str), returns Quantity. For tuple units (multivariate with mixed units), returns a raw array – the consumer splits by parameter name and attaches per-element units.

Parameters:
Return type:

Any

log_prob(value)

Evaluate log-probability, stripping units if present.

Parameters:

value (Any)

Return type:

Any

__init__(distribution, unit)
Parameters:
Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

harv.io module

Serialization helpers for harv sampler objects.

Provides save_sampler() and load_sampler() for persisting a configured sampler (prior + extensions) to disk so that the same setup can be reloaded without re-specifying all prior parameters.

The format is a standard Python pickle file. Equinox modules (eqx.Module subclasses) are picklable, so all prior distributions, extension objects, and static fields round-trip exactly.

Examples

Save and reload a RejectionSampler:

>>> import harv
>>> sampler = harv.RejectionSampler(prior)
>>> harv.save_sampler("sampler.pkl", sampler)
>>> sampler2 = harv.load_sampler("sampler.pkl")
harv.io.load_sampler(path)

Load a sampler from a pickle file written by save_sampler().

Parameters:

path (str | Path) – Path to the .pkl checkpoint file.

Returns:

Restored sampler with all prior distributions and extensions intact.

Return type:

RejectionSampler

Examples

>>> import harv
>>> sampler = load_sampler("sampler.pkl")
harv.io.save_sampler(path, sampler)

Save a configured sampler to a pickle file.

All prior distributions, extensions, and static configuration are preserved. The file can be reloaded with load_sampler() without needing to re-specify any prior parameters.

Parameters:
  • path (str | Path) – Output file path. By convention use a .pkl extension.

  • sampler (Any) – The sampler to save. Any picklable object is accepted.

Return type:

None

Examples

>>> import harv
>>> save_sampler("sampler.pkl", sampler)

harv.plot module

Plotting utilities.

harv.plot.get_t_grid(times, period, *, span_buffer_factor=0.1, n_points_per_period=256, max_t_grid=1000000)

Dense time grid spanning the observation baseline with a small buffer.

Generates a regular grid of times for plotting model orbits over data. The grid resolution adapts to the period so that fast orbits are well-resolved while long-period orbits don’t create excessive grids.

Parameters:
  • times (Real[Quantity[PhysicalType('time')], '*batch']) – Observation times.

  • period (Real[Quantity[PhysicalType('time')], '']) – Orbital period (scalar). Used to set the grid spacing as period / n_points_per_period.

  • span_buffer_factor (float) – Fractional buffer added to each side of the observation baseline. Default: 0.1 (10% on each side).

  • n_points_per_period (int) – Number of grid points per orbital period. Default: 256.

  • max_t_grid (int | None) – Maximum number of grid points. Default: 1e6. Set to None to disable.

Return type:

Real[Quantity[PhysicalType('time')], 'n']

Returns:

Regular time grid spanning the buffered observation range.

Examples

>>> from unxt import Q
>>> times = Q([0.0, 50.0, 100.0], "day")
>>> t_grid = get_t_grid(times, Q(30.0, "day"))
>>> len(t_grid) > 128
True
harv.plot.plot_gaia_astrometry(samples, data, model=None, *, data_plot_kwargs=None, sky_orbit_kwargs=None, figsize=(10, 5), axes=None, **kwargs)

Plot the best-fit Gaia astrometry model for a single posterior sample.

Produces a two-panel goodness-of-fit figure for one posterior sample:

  • Panel 1 (left): on-sky photocentre orbital ellipse (delegated to plot_gaia_sky_orbit()), with each Gaia epoch shown as a scan-direction segment at the model-predicted photocentre offset.

  • Panel 2 (right): along-scan position residual vs time — the observed al_position minus the full predicted model for the sample. The prediction is built by delegating to the model (predict()), which folds in every design-matrix extension (e.g. MonomialTrend (astrometry=True), MultiSurveyOffset). Covariance-only extensions (jitter, GP) widen the residual error bars via _full_obs_err(); GP conditional-mean structured noise is additionally subtracted from the residual so only true noise remains.

samples must contain exactly one posterior sample. Select one beforehand with samples[i] or map_sample().

Parameters:
  • samples (Samples) – Posterior samples from a Gaia astrometry or joint model. Must contain exactly one sample; otherwise a ValueError is raised.

  • data (GaiaAstrometryData) – The data conditioned on by the model. Required.

  • model (Any) – The GaiaAstrometryModel whose extensions and parameterization define the prediction. When None (default), a bare GaiaAstrometryModel() is used — equivalent to the previous no-extension behavior.

  • data_plot_kwargs (dict[str, Any] | None) – Style overrides for the panel-2 residual error bars.

  • sky_orbit_kwargs (dict[str, Any] | None) – Forwarded to plot_gaia_sky_orbit() for panel 1.

  • figsize (tuple[float, float]) – Figure size when axes is None. Default (10, 5).

  • axes (tuple[Any, Any] | None) – Two axes to draw into, (sky, residual). None (default) creates a new figure.

  • **kwargs (Any) – Forwarded to matplotlib.pyplot.subplots when axes is None.

Return type:

Any

Returns:

The figure if axes was None, else None.

Raises:
  • ImportError – If matplotlib is not installed.

  • ValueError – If samples does not contain exactly one posterior sample.

Examples

>>> fig = plot_gaia_astrometry(samples[0], data=gaia_data)
>>> fig = plot_gaia_astrometry(
...     samples.map_sample(), data=gaia_data, model=sampler.model
... )
harv.plot.plot_gaia_sky_orbit(model, samples, *, data=None, n_grid=500, errorbar_scale=1.0, plot_kwargs=None, data_plot_kwargs=None, ax=None, **kwargs)

Plot a single astrometric orbit ellipse on the sky.

Draws the photocenter orbit projected onto the sky-plane (ΔRA vs ΔDec) for one posterior sample. When data is provided, each Gaia epoch is rendered as a short line segment in the scan direction at the model- predicted photocenter offset, with half-length equal to the along-scan measurement uncertainty (scaled by errorbar_scale).

The orbit-only sky path (no PM, no parallax) is constructed by delegating to predict_orbit_sky(), so this function automatically supports both Standard and Thiele-Innes parameterizations.

Parameters:
  • model (Any) – The GaiaAstrometryModel whose parameterization defines the orbit.

  • samples (Samples) – Posterior samples containing exactly one sample. Select beforehand with samples[i] or Samples.map_sample().

  • data (GaiaAstrometryData | None) – Gaia epoch astrometry data. When None, only the orbit ellipse is drawn (no per-epoch markers).

  • n_grid (int) – Number of phase points used to draw the smooth orbit curve. Default 500.

  • errorbar_scale (float) – Scale factor on the half-length of each scan-direction line segment (default 1.0 = 1-sigma).

  • plot_kwargs (dict[str, Any] | None) – Style overrides for the orbit curve (forwarded to ax.plot).

  • data_plot_kwargs (dict[str, Any] | None) – Style overrides for the per-epoch scan-direction segments.

  • ax (Any) – Axes to draw into. None (default) creates a new figure.

  • **kwargs (Any) – Forwarded to matplotlib.pyplot.subplots when ax is None.

Return type:

Any

Returns:

The figure if ax was None, else None.

Raises:
  • ImportError – If matplotlib is not installed.

  • ValueError – If samples does not contain exactly one posterior sample.

harv.plot.plot_rv(samples, data=None, model=None, *, n_samples=128, time_grid=None, show_signal_components=False, relative_to_t_ref=False, relative_to_median_v_sys=False, phase_fold_median=False, apply_median_offsets=True, plot_kwargs=None, data_plot_kwargs=None, extra_err_plot_kwargs=None, color_cycler=None, ax=None, **kwargs)

Plot RV curves computed from (posterior) samples over data.

Draws n_samples posterior RV curves sampled from samples. The predicted curve and error model are obtained by delegating to the model:

  • The smooth curve at the plotting grid uses predict_at_times(), which folds in every design-matrix extension (e.g. MonomialTrend).

  • Error bars are widened using _full_obs_err() (jitter, GP diagonal).

  • GP structured noise is overlaid via conditional_mean().

MultiSurveyOffset is subsetted out for the smooth-curve overlay (its indicator_matrix is bound to the original data’s row count and cannot be evaluated on a plotting grid); per-instrument median offsets are still applied to data points so that instruments land in the reference frame.

Parameters:
  • samples (Samples) – Posterior samples from RejectionSampler or NumpyroSampler.

  • data (RVData | SourceData | SystemData | None) – Observed RV data to overplot. When None, only orbit model curves are drawn (no data points, no instrument-colour cycling).

  • model (Any) – The RV component model (or JointModel for SB2) whose extensions and parameterization define the prediction. When None (default), a bare RVModel() is used — the previous no-extension behavior.

  • n_samples (int | None) – Number of posterior curves to draw. Set to None to draw all samples. Default: 128.

  • time_grid (Real[Quantity[PhysicalType('time')], '*batch'] | None) – Explicit time grid used to evaluate and plot the posterior orbit curves. When provided, this is used instead of the default phase grid or get_t_grid(). If phase_fold_median=True, the supplied time grid is converted to phase using the reference sample’s period and periastron time.

  • show_signal_components (bool) – Whether to plot the Keplerian signal and the combined extension-driven contribution as separate curves instead of plotting their sum. This decomposition view is only supported for time-domain RV plots with observed data. Default: False.

  • relative_to_t_ref (bool) – Whether to plot time relative to the reference epoch (t_ref) of the data.

  • relative_to_median_v_sys (bool) – Whether to shift all curves by the median systemic velocity (v_sys) of the samples, so that the curves show only the relative RV variations. Only applies when a “v_sys” parameter is present in the samples. Default: False.

  • phase_fold_median (bool) – If True, fold data and model to orbital phase using the sample closest to the median period. Phase zero is set to that sample’s t_peri value. Only that single reference orbit curve is drawn — plotting multiple samples on a phase axis defined by one period is misleading when the posterior has period spread. When plot-aware extensions are present, the reference sample’s extension contribution is subtracted from the data before folding so the Keplerian orbit overlays the phase-folded points. Default: False.

  • apply_median_offsets (bool) – Shift non-reference instrument data by the posterior median offset so all instruments land in the reference frame. Only applies when a MultiSurveyOffset extension is present. Default: True.

  • plot_kwargs (dict[str, Any] | None) – Style overrides for orbit model curves.

  • data_plot_kwargs (dict[str, Any] | None) – Style overrides for data points.

  • extra_err_plot_kwargs (dict[str, Any] | None) – Style overrides for the widened error bars drawn when a jitter extension is present. Keys override the defaults (marker=””, ecolor=”#6b2828”, alpha=0.5).

  • color_cycler (Any | None) – Colour cycler used to assign distinct colours to each instrument’s data points. When None (default) the current axes.prop_cycle from matplotlib.rcParams is used.

  • ax (Any) – Axes to draw into. If None (default), a new figure and axes are created and the axes object is returned.

  • **kwargs (Any) – Forwarded to matplotlib.pyplot.subplots when ax is None.

Return type:

Any

Returns:

The axes plotted to.

Raises:

ImportError – If matplotlib is not installed.

Examples

>>> ax = plot_rv(samples, rv_data)
>>> ax = plot_rv(samples, rv_data, model=sampler.model)
>>> ax = plot_rv(samples, rv_data, phase_fold_median=True)