harv.data package

Data classes for representing time series data.

TODO: we could add support for metadata for the data type classes below.

class harv.data.AbstractAstrometryData

Bases: AbstractData

Abstract base class for astrometric data.

Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.AbstractData

Bases: Module

Abstract base class for observational data time series.

Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.GaiaAstrometryData

Bases: AbstractAstrometryData

Gaia epoch astrometry (along-scan measurements).

Examples

>>> import jax.numpy as jnp
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, al_position, al_position_err, scan_angle, parallax_factor, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

plot(ax=None, *, al_unit=None, add_labels=True, relative_to_t_ref=False, **kwargs)

Plot along-scan residuals vs time.

Parameters:
  • ax (Any) – matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • al_unit (str | None) – Display unit for the along-scan position. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import jax.numpy as jnp
>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> ax = data.plot()
>>> plt.close("all")
t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

al_position: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan position.

al_position_err: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan uncertainty.

scan_angle: Real[Quantity[PhysicalType('angle')], 'n']

Per-CCD scan angle.

parallax_factor: Float[Array, 'n']

AL parallax factors.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.RVData

Bases: AbstractData

Radial velocity measurements.

Examples

>>> from unxt import Q
>>> from harv import RVData
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, rv, rv_err, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

plot(ax=None, *, rv_unit=None, add_labels=True, relative_to_t_ref=False, phase_fold=None, **kwargs)

Plot RV data as error bars.

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • rv_unit (str | None) – Display unit for the RV axis. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref. Mutually exclusive with phase_fold.

  • phase_fold (Any | None) – If given, fold observations to orbital phase using this period: x = (time - t_ref) / phase_fold mod 1. Mutually exclusive with relative_to_t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> ax = data.plot()  # uses errorbar() with sensible defaults
>>> ax = data.plot(color="C1", markersize=6)  # override style
>>> ax = data.plot(phase_fold=Q(50.0, "day"))  # phase-folded
>>> plt.close("all")
t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

rv: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocities.

rv_err: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocity uncertainties.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.SourceData

Bases: AbstractDatasetContainer

Container for multiple named datasets for a single source.

Accepts arbitrary named datasets via keyword arguments. Names are user-defined and can be anything (e.g., gaia, keck_rv, hst_imaging).

Parameters:

datasets (AbstractAstrometryData | RVData)

__init__(**datasets)
Parameters:

datasets (AbstractAstrometryData | RVData)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

keys()

Dataset/component names.

Return type:

Iterator[str]

plot(*args, **kwargs)

Plot all datasets on a single axes.

Only valid when every contained dataset shares the same concrete type; plotting heterogeneous types (e.g. RV in km/s and astrometry in mas) on a single axes would overlay incompatible y-axes. Use get_datasets_by_type() to filter to a single type first when needed.

Parameters mirror AbstractDatasetContainer.plot().

Raises:

TypeError – If the contained datasets are not all of the same concrete type.

Parameters:
Return type:

Any

stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

class harv.data.SystemData

Bases: AbstractDatasetContainer

Container for a multi-component system.

Each named component holds the same concrete data class representing observations of a distinct physical body or photocenter in a gravitationally bound system.

Parameters:

datasets (AbstractAstrometryData | RVData)

__init__(**datasets)
Parameters:

datasets (AbstractAstrometryData | RVData)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property dataset_type: type[AbstractData]

Concrete dataset class shared by all components.

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
indicator_data(reference)

Return stacked data and component-indicator flags.

Parameters:

reference (str)

Return type:

tuple[AbstractAstrometryData | RVData, Array | None, tuple[str, ...] | None]

indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

keys()

Dataset/component names.

Return type:

Iterator[str]

plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)

Plot all contained datasets on the same axes.

Dispatches to each dataset’s .plot() method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color from color_cycler (or the current axes.prop_cycle when not specified).

This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (SystemData enforces homogeneity at construction; SourceData validates at call time).

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, a new figure is created.

  • add_legend (bool) – Whether to add a legend labelled by component name. Default: True.

  • color_cycler (Any) – A cycler.Cycler whose "color" key supplies per-component colors. When None (default), colors are taken from the current axes.prop_cycle rcParam.

  • **kwargs (Any) – Forwarded to each component’s .plot() method. A color keyword here overrides the cycler for all components.

Return type:

Any

Returns:

The matplotlib.axes.Axes instance.

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import RVData
>>> from harv.data.containers import SystemData
>>> sys_data = SystemData(
...     primary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([10.0, -10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
...     secondary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([-10.0, 10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
... )
>>> ax = sys_data.plot()
>>> plt.close("all")
stacked()

Stack all component datasets.

Return type:

AbstractAstrometryData | RVData

stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

harv.data.build_indicator_matrix(datasets, reference)

Build indicator matrix for multi-survey data of the same type.

Parameters:
  • datasets (dict[str, TypeVar(DT, bound= AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order must match the order used when stacking (see stack_datasets()).

  • reference (str) – Name of the reference instrument (its observations get no offset column).

Return type:

tuple[TypeVar(DT, bound= AbstractData), Array | None, tuple[str, ...] | None]

Returns:

  • stacked (DT) – Stacked dataset containing all observations.

  • indicator (jax.Array | None) – Shape (n_obs_total, n_non_ref). indicator[i, j] = 1 when observation i belongs to non-reference instrument j.

  • instrument_names (tuple[str, …] | None) – Names of the non-reference instruments, in column order.

Examples

>>> from unxt import Q
>>> from harv.data import RVData
>>> from harv.data.helpers import build_indicator_matrix
>>> rv1 = RVData(
...     time=Q([0.0, 50.0], "day"),
...     rv=Q([1.0, -2.0], "km/s"),
...     rv_err=Q([0.5, 0.5], "km/s"),
... )
>>> rv2 = RVData(
...     time=Q([10.0, 60.0], "day"),
...     rv=Q([0.5, -1.5], "km/s"),
...     rv_err=Q([0.3, 0.3], "km/s"),
... )
>>> stacked, indicator, names = build_indicator_matrix(
...     {"survey1": rv1, "survey2": rv2}, reference="survey1",
... )
>>> stacked.n_times
4
>>> names
('survey2',)
>>> indicator.shape
(4, 1)
harv.data.stack_datasets(datasets)

Concatenate multiple datasets in dict order into a single one.

Parameters:

datasets (dict[str, TypeVar(DT, bound= AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order determines the row order in the stacked output; it must match the order used when building the indicator matrix (see build_indicator_matrix()).

Return type:

TypeVar(DT, bound= AbstractData)

Examples

>>> from unxt import Q
>>> from harv.data import RVData
>>> from harv.data.helpers import stack_datasets
>>> rv1 = RVData(
...     time=Q([0.0, 50.0], "day"),
...     rv=Q([1.0, -2.0], "km/s"),
...     rv_err=Q([0.5, 0.5], "km/s"),
... )
>>> rv2 = RVData(
...     time=Q([10.0, 60.0], "day"),
...     rv=Q([0.5, -1.5], "km/s"),
...     rv_err=Q([0.3, 0.3], "km/s"),
... )
>>> stacked = stack_datasets({"instr1": rv1, "instr2": rv2})
>>> stacked.n_times
4

Submodules

harv.data.containers module

Dataset containers for multi-component and multi-instrument data.

class harv.data.containers.AbstractDatasetContainer

Bases: Module

Base class providing a dict-like interface over named datasets.

Subclasses (SystemData, SourceData) share this common interface but carry different semantic meaning.

Parameters:

_datasets (dict[str, AbstractAstrometryData | RVData])

property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

keys()

Dataset/component names.

Return type:

Iterator[str]

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)

Plot all contained datasets on the same axes.

Dispatches to each dataset’s .plot() method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color from color_cycler (or the current axes.prop_cycle when not specified).

This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (SystemData enforces homogeneity at construction; SourceData validates at call time).

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, a new figure is created.

  • add_legend (bool) – Whether to add a legend labelled by component name. Default: True.

  • color_cycler (Any) – A cycler.Cycler whose "color" key supplies per-component colors. When None (default), colors are taken from the current axes.prop_cycle rcParam.

  • **kwargs (Any) – Forwarded to each component’s .plot() method. A color keyword here overrides the cycler for all components.

Return type:

Any

Returns:

The matplotlib.axes.Axes instance.

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import RVData
>>> from harv.data.containers import SystemData
>>> sys_data = SystemData(
...     primary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([10.0, -10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
...     secondary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([-10.0, 10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
... )
>>> ax = sys_data.plot()
>>> plt.close("all")
__init__(_datasets)
Parameters:

_datasets (dict[str, AbstractAstrometryData | RVData])

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

class harv.data.containers.SourceData

Bases: AbstractDatasetContainer

Container for multiple named datasets for a single source.

Accepts arbitrary named datasets via keyword arguments. Names are user-defined and can be anything (e.g., gaia, keck_rv, hst_imaging).

Parameters:

datasets (AbstractAstrometryData | RVData)

__init__(**datasets)
Parameters:

datasets (AbstractAstrometryData | RVData)

Return type:

None

plot(*args, **kwargs)

Plot all datasets on a single axes.

Only valid when every contained dataset shares the same concrete type; plotting heterogeneous types (e.g. RV in km/s and astrometry in mas) on a single axes would overlay incompatible y-axes. Use get_datasets_by_type() to filter to a single type first when needed.

Parameters mirror AbstractDatasetContainer.plot().

Raises:

TypeError – If the contained datasets are not all of the same concrete type.

Parameters:
Return type:

Any

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

keys()

Dataset/component names.

Return type:

Iterator[str]

stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

class harv.data.containers.SystemData

Bases: AbstractDatasetContainer

Container for a multi-component system.

Each named component holds the same concrete data class representing observations of a distinct physical body or photocenter in a gravitationally bound system.

Parameters:

datasets (AbstractAstrometryData | RVData)

__init__(**datasets)
Parameters:

datasets (AbstractAstrometryData | RVData)

Return type:

None

property dataset_type: type[AbstractData]

Concrete dataset class shared by all components.

stacked()

Stack all component datasets.

Return type:

AbstractAstrometryData | RVData

indicator_data(reference)

Return stacked data and component-indicator flags.

Parameters:

reference (str)

Return type:

tuple[AbstractAstrometryData | RVData, Array | None, tuple[str, ...] | None]

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

get_datasets_by_type(data_type)

Get all datasets/components of a specific data type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.

Return type:

dict[str, TypeVar(_DT, bound= AbstractAstrometryData | RVData)]

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.get_datasets_by_type(RVData)
{'keck_rv': RVData(...)}
>>> source_data.get_datasets_by_type(GaiaAstrometryData)
{'gaia': GaiaAstrometryData(...)}
indicator_data_by_type(data_type, reference)

Return stacked data and indicator flags for one dataset type.

This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).

Parameters:
  • data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

  • reference (str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).

Return type:

tuple[TypeVar(_DT, bound= AbstractAstrometryData | RVData), Array | None, tuple[str, ...] | None]

items()

(name, dataset) pairs.

Return type:

Iterator[tuple[str, AbstractAstrometryData | RVData]]

keys()

Dataset/component names.

Return type:

Iterator[str]

plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)

Plot all contained datasets on the same axes.

Dispatches to each dataset’s .plot() method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color from color_cycler (or the current axes.prop_cycle when not specified).

This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (SystemData enforces homogeneity at construction; SourceData validates at call time).

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, a new figure is created.

  • add_legend (bool) – Whether to add a legend labelled by component name. Default: True.

  • color_cycler (Any) – A cycler.Cycler whose "color" key supplies per-component colors. When None (default), colors are taken from the current axes.prop_cycle rcParam.

  • **kwargs (Any) – Forwarded to each component’s .plot() method. A color keyword here overrides the cycler for all components.

Return type:

Any

Returns:

The matplotlib.axes.Axes instance.

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import RVData
>>> from harv.data.containers import SystemData
>>> sys_data = SystemData(
...     primary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([10.0, -10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
...     secondary=RVData(
...         time=Q([0.0, 50.0], "day"),
...         rv=Q([-10.0, 10.0], "km/s"),
...         rv_err=Q([0.5, 0.5], "km/s"),
...     ),
... )
>>> ax = sys_data.plot()
>>> plt.close("all")
stacked_by_type(data_type)

Stack all datasets of the requested type.

Parameters:

data_type (type[TypeVar(_DT, bound= AbstractAstrometryData | RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.

Return type:

TypeVar(_DT, bound= AbstractAstrometryData | RVData)

Examples

>>> from harv.data.datasets import RVData, GaiaAstrometryData
>>> from harv.data.containers import SourceData
>>> source_data = SourceData(
...     keck_rv=RVData(...),
...     wiyn_rv=RVData(...),
...     gaia=GaiaAstrometryData(...),
... )
>>> source_data.stacked_by_type(RVData)
RVData(...)
property t_ref: Real[Quantity[PhysicalType('time')], ''] | None

Reference epoch shared by all contained datasets.

Guaranteed to be consistent across components because every concrete subclass calls _synchronize_t_refs() in its __init__.

values()

Dataset/component values.

Return type:

Iterator[AbstractAstrometryData | RVData]

harv.data.datasets module

Observation data classes for time series data.

class harv.data.datasets.AbstractAstrometryData

Bases: AbstractData

Abstract base class for astrometric data.

Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.datasets.AbstractData

Bases: Module

Abstract base class for observational data time series.

Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

property n_times: int

Number of times / epochs / observations.

__init__(time, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

class harv.data.datasets.GaiaAstrometryData

Bases: AbstractAstrometryData

Gaia epoch astrometry (along-scan measurements).

Examples

>>> import jax.numpy as jnp
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

al_position: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan position.

al_position_err: Real[Quantity[PhysicalType('angle')], 'n']

Along-scan uncertainty.

scan_angle: Real[Quantity[PhysicalType('angle')], 'n']

Per-CCD scan angle.

parallax_factor: Float[Array, 'n']

AL parallax factors.

plot(ax=None, *, al_unit=None, add_labels=True, relative_to_t_ref=False, **kwargs)

Plot along-scan residuals vs time.

Parameters:
  • ax (Any) – matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • al_unit (str | None) – Display unit for the along-scan position. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import jax.numpy as jnp
>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> from harv import GaiaAstrometryData
>>> data = GaiaAstrometryData(
...     time=Q([0.0, 100.0, 200.0], "day"),
...     al_position=Q([0.1, -0.2, 0.05], "mas"),
...     al_position_err=Q([0.01, 0.01, 0.01], "mas"),
...     scan_angle=Q([0.5, 1.2, 2.8], "rad"),
...     parallax_factor=jnp.array([0.3, -0.1, 0.4]),
... )
>>> ax = data.plot()
>>> plt.close("all")
__init__(time, al_position, al_position_err, scan_angle, parallax_factor, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • al_position (Real[Quantity[PhysicalType('angle')], 'n'])

  • al_position_err (Real[Quantity[PhysicalType('angle')], 'n'])

  • scan_angle (Real[Quantity[PhysicalType('angle')], 'n'])

  • parallax_factor (Float[Array, 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

class harv.data.datasets.RVData

Bases: AbstractData

Radial velocity measurements.

Examples

>>> from unxt import Q
>>> from harv import RVData
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> data.n_times
3
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

__init__(time, rv, rv_err, *, t_ref=None)
Parameters:
  • time (Real[Quantity[PhysicalType('time')], 'n'])

  • rv (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • rv_err (Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])

  • t_ref (Real[Quantity[PhysicalType('time')], ''] | None)

Return type:

None

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

TypeVar(_ModuleT, bound= Module)

property n_times: int

Number of times / epochs / observations.

t_ref: Real[Quantity[PhysicalType('time')], ''] | None = None

Reference epoch. If None, uses mean observation time.

time: Real[Quantity[PhysicalType('time')], 'n']

Barycentric TCB times.

rv: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocities.

rv_err: Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']

Radial velocity uncertainties.

plot(ax=None, *, rv_unit=None, add_labels=True, relative_to_t_ref=False, phase_fold=None, **kwargs)

Plot RV data as error bars.

Parameters:
  • ax (Any) – The matplotlib.axes.Axes instance to draw on. If None, uses plt.gca().

  • rv_unit (str | None) – Display unit for the RV axis. Defaults to the data’s own unit.

  • add_labels (bool) – Add axis labels.

  • relative_to_t_ref (bool) – Plot time relative to t_ref. Mutually exclusive with phase_fold.

  • phase_fold (Any | None) – If given, fold observations to orbital phase using this period: x = (time - t_ref) / phase_fold mod 1. Mutually exclusive with relative_to_t_ref.

  • **kwargs (Any) – Passed to ax.errorbar(). Defaults can be overridden.

Returns:

The matplotlib.axes.Axes instance.

Return type:

Axes

Examples

>>> import matplotlib.pyplot as plt
>>> from unxt import Q
>>> data = RVData(
...     time=Q([0.0, 50.0, 100.0], "day"),
...     rv=Q([1.0, -2.0, 0.5], "km/s"),
...     rv_err=Q([0.5, 0.5, 0.5], "km/s"),
... )
>>> ax = data.plot()  # uses errorbar() with sensible defaults
>>> ax = data.plot(color="C1", markersize=6)  # override style
>>> ax = data.plot(phase_fold=Q(50.0, "day"))  # phase-folded
>>> plt.close("all")

harv.data.helpers module

Helper functions for stacking and combining datasets.

harv.data.helpers.build_indicator_matrix(datasets, reference)

Build indicator matrix for multi-survey data of the same type.

Parameters:
  • datasets (dict[str, TypeVar(DT, bound= AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order must match the order used when stacking (see stack_datasets()).

  • reference (str) – Name of the reference instrument (its observations get no offset column).

Return type:

tuple[TypeVar(DT, bound= AbstractData), Array | None, tuple[str, ...] | None]

Returns:

  • stacked (DT) – Stacked dataset containing all observations.

  • indicator (jax.Array | None) – Shape (n_obs_total, n_non_ref). indicator[i, j] = 1 when observation i belongs to non-reference instrument j.

  • instrument_names (tuple[str, …] | None) – Names of the non-reference instruments, in column order.

Examples

>>> from unxt import Q
>>> from harv.data import RVData
>>> from harv.data.helpers import build_indicator_matrix
>>> rv1 = RVData(
...     time=Q([0.0, 50.0], "day"),
...     rv=Q([1.0, -2.0], "km/s"),
...     rv_err=Q([0.5, 0.5], "km/s"),
... )
>>> rv2 = RVData(
...     time=Q([10.0, 60.0], "day"),
...     rv=Q([0.5, -1.5], "km/s"),
...     rv_err=Q([0.3, 0.3], "km/s"),
... )
>>> stacked, indicator, names = build_indicator_matrix(
...     {"survey1": rv1, "survey2": rv2}, reference="survey1",
... )
>>> stacked.n_times
4
>>> names
('survey2',)
>>> indicator.shape
(4, 1)
harv.data.helpers.stack_datasets(datasets)

Concatenate multiple datasets in dict order into a single one.

Parameters:

datasets (dict[str, TypeVar(DT, bound= AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order determines the row order in the stacked output; it must match the order used when building the indicator matrix (see build_indicator_matrix()).

Return type:

TypeVar(DT, bound= AbstractData)

Examples

>>> from unxt import Q
>>> from harv.data import RVData
>>> from harv.data.helpers import stack_datasets
>>> rv1 = RVData(
...     time=Q([0.0, 50.0], "day"),
...     rv=Q([1.0, -2.0], "km/s"),
...     rv_err=Q([0.5, 0.5], "km/s"),
... )
>>> rv2 = RVData(
...     time=Q([10.0, 60.0], "day"),
...     rv=Q([0.5, -1.5], "km/s"),
...     rv_err=Q([0.3, 0.3], "km/s"),
... )
>>> stacked = stack_datasets({"instr1": rv1, "instr2": rv2})
>>> stacked.n_times
4