harv.data package¶
Data classes for representing time series data.
TODO: we could add support for metadata for the data type classes below.
- class harv.data.AbstractAstrometryData¶
Bases:
AbstractDataAbstract base class for astrometric data.
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.AbstractData¶
Bases:
ModuleAbstract base class for observational data time series.
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.GaiaAstrometryData¶
Bases:
AbstractAstrometryDataGaia epoch astrometry (along-scan measurements).
Examples
>>> import jax.numpy as jnp >>> from unxt import Q >>> from harv import GaiaAstrometryData >>> data = GaiaAstrometryData( ... time=Q([0.0, 100.0, 200.0], "day"), ... al_position=Q([0.1, -0.2, 0.05], "mas"), ... al_position_err=Q([0.01, 0.01, 0.01], "mas"), ... scan_angle=Q([0.5, 1.2, 2.8], "rad"), ... parallax_factor=jnp.array([0.3, -0.1, 0.4]), ... ) >>> data.n_times 3
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])al_position (
Real[Quantity[PhysicalType('angle')], 'n'])al_position_err (
Real[Quantity[PhysicalType('angle')], 'n'])scan_angle (
Real[Quantity[PhysicalType('angle')], 'n'])parallax_factor (
Float[Array, 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, al_position, al_position_err, scan_angle, parallax_factor, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])al_position (
Real[Quantity[PhysicalType('angle')], 'n'])al_position_err (
Real[Quantity[PhysicalType('angle')], 'n'])scan_angle (
Real[Quantity[PhysicalType('angle')], 'n'])parallax_factor (
Float[Array, 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
- plot(ax=None, *, al_unit=None, add_labels=True, relative_to_t_ref=False, **kwargs)¶
Plot along-scan residuals vs time.
- Parameters:
ax (
Any) –matplotlib.axes.Axesinstance to draw on. IfNone, usesplt.gca().al_unit (
str|None) – Display unit for the along-scan position. Defaults to the data’s own unit.add_labels (
bool) – Add axis labels.relative_to_t_ref (
bool) – Plot time relative tot_ref.**kwargs (
Any) – Passed toax.errorbar(). Defaults can be overridden.
- Returns:
The
matplotlib.axes.Axesinstance.- Return type:
Examples
>>> import jax.numpy as jnp >>> import matplotlib.pyplot as plt >>> from unxt import Q >>> from harv import GaiaAstrometryData >>> data = GaiaAstrometryData( ... time=Q([0.0, 100.0, 200.0], "day"), ... al_position=Q([0.1, -0.2, 0.05], "mas"), ... al_position_err=Q([0.01, 0.01, 0.01], "mas"), ... scan_angle=Q([0.5, 1.2, 2.8], "rad"), ... parallax_factor=jnp.array([0.3, -0.1, 0.4]), ... ) >>> ax = data.plot() >>> plt.close("all")
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
al_position:
Real[Quantity[PhysicalType('angle')], 'n']¶ Along-scan position.
-
al_position_err:
Real[Quantity[PhysicalType('angle')], 'n']¶ Along-scan uncertainty.
-
scan_angle:
Real[Quantity[PhysicalType('angle')], 'n']¶ Per-CCD scan angle.
-
parallax_factor:
Float[Array, 'n']¶ AL parallax factors.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.RVData¶
Bases:
AbstractDataRadial velocity measurements.
Examples
>>> from unxt import Q >>> from harv import RVData >>> data = RVData( ... time=Q([0.0, 50.0, 100.0], "day"), ... rv=Q([1.0, -2.0, 0.5], "km/s"), ... rv_err=Q([0.5, 0.5, 0.5], "km/s"), ... ) >>> data.n_times 3
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])rv (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])rv_err (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, rv, rv_err, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])rv (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])rv_err (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
- plot(ax=None, *, rv_unit=None, add_labels=True, relative_to_t_ref=False, phase_fold=None, **kwargs)¶
Plot RV data as error bars.
- Parameters:
ax (
Any) – Thematplotlib.axes.Axesinstance to draw on. IfNone, usesplt.gca().rv_unit (
str|None) – Display unit for the RV axis. Defaults to the data’s own unit.add_labels (
bool) – Add axis labels.relative_to_t_ref (
bool) – Plot time relative tot_ref. Mutually exclusive withphase_fold.phase_fold (
Any|None) – If given, fold observations to orbital phase using this period: x = (time - t_ref) / phase_fold mod 1. Mutually exclusive withrelative_to_t_ref.**kwargs (
Any) – Passed toax.errorbar(). Defaults can be overridden.
- Returns:
The
matplotlib.axes.Axesinstance.- Return type:
Examples
>>> import matplotlib.pyplot as plt >>> from unxt import Q >>> data = RVData( ... time=Q([0.0, 50.0, 100.0], "day"), ... rv=Q([1.0, -2.0, 0.5], "km/s"), ... rv_err=Q([0.5, 0.5, 0.5], "km/s"), ... ) >>> ax = data.plot() # uses errorbar() with sensible defaults >>> ax = data.plot(color="C1", markersize=6) # override style >>> ax = data.plot(phase_fold=Q(50.0, "day")) # phase-folded >>> plt.close("all")
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
rv:
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']¶ Radial velocities.
-
rv_err:
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']¶ Radial velocity uncertainties.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.SourceData¶
Bases:
AbstractDatasetContainerContainer for multiple named datasets for a single source.
Accepts arbitrary named datasets via keyword arguments. Names are user-defined and can be anything (e.g., gaia, keck_rv, hst_imaging).
- Parameters:
datasets (
AbstractAstrometryData|RVData)
- __init__(**datasets)¶
- Parameters:
datasets (
AbstractAstrometryData|RVData)- Return type:
None
- static __new__(cls, *args, **kwargs)¶
- get_datasets_by_type(data_type)¶
Get all datasets/components of a specific data type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.- Return type:
dict[str,TypeVar(_DT, bound=AbstractAstrometryData|RVData)]
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.get_datasets_by_type(RVData) {'keck_rv': RVData(...)} >>> source_data.get_datasets_by_type(GaiaAstrometryData) {'gaia': GaiaAstrometryData(...)}
- indicator_data_by_type(data_type, reference)¶
Return stacked data and indicator flags for one dataset type.
This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.reference (
str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).
- Return type:
tuple[TypeVar(_DT, bound=AbstractAstrometryData|RVData),Array|None,tuple[str,...] |None]
- plot(*args, **kwargs)¶
Plot all datasets on a single axes.
Only valid when every contained dataset shares the same concrete type; plotting heterogeneous types (e.g. RV in km/s and astrometry in mas) on a single axes would overlay incompatible y-axes. Use
get_datasets_by_type()to filter to a single type first when needed.Parameters mirror
AbstractDatasetContainer.plot().
- stacked_by_type(data_type)¶
Stack all datasets of the requested type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.- Return type:
TypeVar(_DT, bound=AbstractAstrometryData|RVData)
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... wiyn_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.stacked_by_type(RVData) RVData(...)
- property t_ref: Real[Quantity[PhysicalType('time')], ''] | None¶
Reference epoch shared by all contained datasets.
Guaranteed to be consistent across components because every concrete subclass calls
_synchronize_t_refs()in its__init__.
- values()¶
Dataset/component values.
- Return type:
- class harv.data.SystemData¶
Bases:
AbstractDatasetContainerContainer for a multi-component system.
Each named component holds the same concrete data class representing observations of a distinct physical body or photocenter in a gravitationally bound system.
- Parameters:
datasets (
AbstractAstrometryData|RVData)
- __init__(**datasets)¶
- Parameters:
datasets (
AbstractAstrometryData|RVData)- Return type:
None
- static __new__(cls, *args, **kwargs)¶
- property dataset_type: type[AbstractData]¶
Concrete dataset class shared by all components.
- get_datasets_by_type(data_type)¶
Get all datasets/components of a specific data type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.- Return type:
dict[str,TypeVar(_DT, bound=AbstractAstrometryData|RVData)]
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.get_datasets_by_type(RVData) {'keck_rv': RVData(...)} >>> source_data.get_datasets_by_type(GaiaAstrometryData) {'gaia': GaiaAstrometryData(...)}
- indicator_data(reference)¶
Return stacked data and component-indicator flags.
- indicator_data_by_type(data_type, reference)¶
Return stacked data and indicator flags for one dataset type.
This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.reference (
str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).
- Return type:
tuple[TypeVar(_DT, bound=AbstractAstrometryData|RVData),Array|None,tuple[str,...] |None]
- plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)¶
Plot all contained datasets on the same axes.
Dispatches to each dataset’s
.plot()method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color fromcolor_cycler(or the currentaxes.prop_cyclewhen not specified).This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (
SystemDataenforces homogeneity at construction;SourceDatavalidates at call time).- Parameters:
ax (
Any) – Thematplotlib.axes.Axesinstance to draw on. IfNone, a new figure is created.add_legend (
bool) – Whether to add a legend labelled by component name. Default:True.color_cycler (
Any) – Acycler.Cyclerwhose"color"key supplies per-component colors. WhenNone(default), colors are taken from the currentaxes.prop_cyclercParam.**kwargs (
Any) – Forwarded to each component’s.plot()method. Acolorkeyword here overrides the cycler for all components.
- Return type:
- Returns:
The
matplotlib.axes.Axesinstance.
Examples
>>> import matplotlib.pyplot as plt >>> from unxt import Q >>> from harv import RVData >>> from harv.data.containers import SystemData >>> sys_data = SystemData( ... primary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([10.0, -10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... secondary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([-10.0, 10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... ) >>> ax = sys_data.plot() >>> plt.close("all")
- stacked()¶
Stack all component datasets.
- Return type:
- stacked_by_type(data_type)¶
Stack all datasets of the requested type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.- Return type:
TypeVar(_DT, bound=AbstractAstrometryData|RVData)
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... wiyn_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.stacked_by_type(RVData) RVData(...)
- property t_ref: Real[Quantity[PhysicalType('time')], ''] | None¶
Reference epoch shared by all contained datasets.
Guaranteed to be consistent across components because every concrete subclass calls
_synchronize_t_refs()in its__init__.
- values()¶
Dataset/component values.
- Return type:
- harv.data.build_indicator_matrix(datasets, reference)¶
Build indicator matrix for multi-survey data of the same type.
- Parameters:
datasets (
dict[str,TypeVar(DT, bound=AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order must match the order used when stacking (seestack_datasets()).reference (
str) – Name of the reference instrument (its observations get no offset column).
- Return type:
tuple[TypeVar(DT, bound=AbstractData),Array|None,tuple[str,...] |None]- Returns:
stacked (DT) – Stacked dataset containing all observations.
indicator (jax.Array | None) – Shape
(n_obs_total, n_non_ref).indicator[i, j] = 1when observationibelongs to non-reference instrumentj.instrument_names (tuple[str, …] | None) – Names of the non-reference instruments, in column order.
Examples
>>> from unxt import Q >>> from harv.data import RVData >>> from harv.data.helpers import build_indicator_matrix >>> rv1 = RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([1.0, -2.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ) >>> rv2 = RVData( ... time=Q([10.0, 60.0], "day"), ... rv=Q([0.5, -1.5], "km/s"), ... rv_err=Q([0.3, 0.3], "km/s"), ... ) >>> stacked, indicator, names = build_indicator_matrix( ... {"survey1": rv1, "survey2": rv2}, reference="survey1", ... ) >>> stacked.n_times 4 >>> names ('survey2',) >>> indicator.shape (4, 1)
- harv.data.stack_datasets(datasets)¶
Concatenate multiple datasets in dict order into a single one.
- Parameters:
datasets (
dict[str,TypeVar(DT, bound=AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order determines the row order in the stacked output; it must match the order used when building the indicator matrix (seebuild_indicator_matrix()).- Return type:
TypeVar(DT, bound=AbstractData)
Examples
>>> from unxt import Q >>> from harv.data import RVData >>> from harv.data.helpers import stack_datasets >>> rv1 = RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([1.0, -2.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ) >>> rv2 = RVData( ... time=Q([10.0, 60.0], "day"), ... rv=Q([0.5, -1.5], "km/s"), ... rv_err=Q([0.3, 0.3], "km/s"), ... ) >>> stacked = stack_datasets({"instr1": rv1, "instr2": rv2}) >>> stacked.n_times 4
Submodules¶
harv.data.containers module¶
Dataset containers for multi-component and multi-instrument data.
- class harv.data.containers.AbstractDatasetContainer¶
Bases:
ModuleBase class providing a dict-like interface over named datasets.
Subclasses (SystemData, SourceData) share this common interface but carry different semantic meaning.
- Parameters:
_datasets (
dict[str,AbstractAstrometryData|RVData])
- property t_ref: Real[Quantity[PhysicalType('time')], ''] | None¶
Reference epoch shared by all contained datasets.
Guaranteed to be consistent across components because every concrete subclass calls
_synchronize_t_refs()in its__init__.
- values()¶
Dataset/component values.
- Return type:
- get_datasets_by_type(data_type)¶
Get all datasets/components of a specific data type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.- Return type:
dict[str,TypeVar(_DT, bound=AbstractAstrometryData|RVData)]
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.get_datasets_by_type(RVData) {'keck_rv': RVData(...)} >>> source_data.get_datasets_by_type(GaiaAstrometryData) {'gaia': GaiaAstrometryData(...)}
- stacked_by_type(data_type)¶
Stack all datasets of the requested type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.- Return type:
TypeVar(_DT, bound=AbstractAstrometryData|RVData)
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... wiyn_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.stacked_by_type(RVData) RVData(...)
- indicator_data_by_type(data_type, reference)¶
Return stacked data and indicator flags for one dataset type.
This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.reference (
str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).
- Return type:
tuple[TypeVar(_DT, bound=AbstractAstrometryData|RVData),Array|None,tuple[str,...] |None]
- plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)¶
Plot all contained datasets on the same axes.
Dispatches to each dataset’s
.plot()method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color fromcolor_cycler(or the currentaxes.prop_cyclewhen not specified).This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (
SystemDataenforces homogeneity at construction;SourceDatavalidates at call time).- Parameters:
ax (
Any) – Thematplotlib.axes.Axesinstance to draw on. IfNone, a new figure is created.add_legend (
bool) – Whether to add a legend labelled by component name. Default:True.color_cycler (
Any) – Acycler.Cyclerwhose"color"key supplies per-component colors. WhenNone(default), colors are taken from the currentaxes.prop_cyclercParam.**kwargs (
Any) – Forwarded to each component’s.plot()method. Acolorkeyword here overrides the cycler for all components.
- Return type:
- Returns:
The
matplotlib.axes.Axesinstance.
Examples
>>> import matplotlib.pyplot as plt >>> from unxt import Q >>> from harv import RVData >>> from harv.data.containers import SystemData >>> sys_data = SystemData( ... primary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([10.0, -10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... secondary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([-10.0, 10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... ) >>> ax = sys_data.plot() >>> plt.close("all")
- __init__(_datasets)¶
- Parameters:
_datasets (
dict[str,AbstractAstrometryData|RVData])- Return type:
None
- class harv.data.containers.SourceData¶
Bases:
AbstractDatasetContainerContainer for multiple named datasets for a single source.
Accepts arbitrary named datasets via keyword arguments. Names are user-defined and can be anything (e.g., gaia, keck_rv, hst_imaging).
- Parameters:
datasets (
AbstractAstrometryData|RVData)
- __init__(**datasets)¶
- Parameters:
datasets (
AbstractAstrometryData|RVData)- Return type:
None
- plot(*args, **kwargs)¶
Plot all datasets on a single axes.
Only valid when every contained dataset shares the same concrete type; plotting heterogeneous types (e.g. RV in km/s and astrometry in mas) on a single axes would overlay incompatible y-axes. Use
get_datasets_by_type()to filter to a single type first when needed.Parameters mirror
AbstractDatasetContainer.plot().
- static __new__(cls, *args, **kwargs)¶
- get_datasets_by_type(data_type)¶
Get all datasets/components of a specific data type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.- Return type:
dict[str,TypeVar(_DT, bound=AbstractAstrometryData|RVData)]
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.get_datasets_by_type(RVData) {'keck_rv': RVData(...)} >>> source_data.get_datasets_by_type(GaiaAstrometryData) {'gaia': GaiaAstrometryData(...)}
- indicator_data_by_type(data_type, reference)¶
Return stacked data and indicator flags for one dataset type.
This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.reference (
str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).
- Return type:
tuple[TypeVar(_DT, bound=AbstractAstrometryData|RVData),Array|None,tuple[str,...] |None]
- stacked_by_type(data_type)¶
Stack all datasets of the requested type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.- Return type:
TypeVar(_DT, bound=AbstractAstrometryData|RVData)
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... wiyn_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.stacked_by_type(RVData) RVData(...)
- property t_ref: Real[Quantity[PhysicalType('time')], ''] | None¶
Reference epoch shared by all contained datasets.
Guaranteed to be consistent across components because every concrete subclass calls
_synchronize_t_refs()in its__init__.
- values()¶
Dataset/component values.
- Return type:
- class harv.data.containers.SystemData¶
Bases:
AbstractDatasetContainerContainer for a multi-component system.
Each named component holds the same concrete data class representing observations of a distinct physical body or photocenter in a gravitationally bound system.
- Parameters:
datasets (
AbstractAstrometryData|RVData)
- __init__(**datasets)¶
- Parameters:
datasets (
AbstractAstrometryData|RVData)- Return type:
None
- property dataset_type: type[AbstractData]¶
Concrete dataset class shared by all components.
- stacked()¶
Stack all component datasets.
- Return type:
- indicator_data(reference)¶
Return stacked data and component-indicator flags.
- static __new__(cls, *args, **kwargs)¶
- get_datasets_by_type(data_type)¶
Get all datasets/components of a specific data type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by.- Return type:
dict[str,TypeVar(_DT, bound=AbstractAstrometryData|RVData)]
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.get_datasets_by_type(RVData) {'keck_rv': RVData(...)} >>> source_data.get_datasets_by_type(GaiaAstrometryData) {'gaia': GaiaAstrometryData(...)}
- indicator_data_by_type(data_type, reference)¶
Return stacked data and indicator flags for one dataset type.
This is a convenience wrapper around get_datasets_by_type + build_indicator_matrix for use in extensions that need to build a kernel matrix across multiple datasets of the same type (e.g. multiple RV instruments).
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.reference (
str) – Name of the reference dataset to use for time coordinates and metadata. Must be one of the keys in the returned dict from get_datasets_by_type(data_type).
- Return type:
tuple[TypeVar(_DT, bound=AbstractAstrometryData|RVData),Array|None,tuple[str,...] |None]
- plot(ax=None, *, add_legend=True, color_cycler=None, **kwargs)¶
Plot all contained datasets on the same axes.
Dispatches to each dataset’s
.plot()method, drawing all components onto a single axes panel with a legend showing the names. Each dataset is assigned a distinct color fromcolor_cycler(or the currentaxes.prop_cyclewhen not specified).This base implementation does not check that the contained datasets share a concrete type; concrete subclasses are responsible for any preconditions (
SystemDataenforces homogeneity at construction;SourceDatavalidates at call time).- Parameters:
ax (
Any) – Thematplotlib.axes.Axesinstance to draw on. IfNone, a new figure is created.add_legend (
bool) – Whether to add a legend labelled by component name. Default:True.color_cycler (
Any) – Acycler.Cyclerwhose"color"key supplies per-component colors. WhenNone(default), colors are taken from the currentaxes.prop_cyclercParam.**kwargs (
Any) – Forwarded to each component’s.plot()method. Acolorkeyword here overrides the cycler for all components.
- Return type:
- Returns:
The
matplotlib.axes.Axesinstance.
Examples
>>> import matplotlib.pyplot as plt >>> from unxt import Q >>> from harv import RVData >>> from harv.data.containers import SystemData >>> sys_data = SystemData( ... primary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([10.0, -10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... secondary=RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([-10.0, 10.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ), ... ) >>> ax = sys_data.plot() >>> plt.close("all")
- stacked_by_type(data_type)¶
Stack all datasets of the requested type.
- Parameters:
data_type (
type[TypeVar(_DT, bound=AbstractAstrometryData|RVData)]) – Concrete data class (e.g. RVData, GaiaAstrometryData) to filter by before stacking.- Return type:
TypeVar(_DT, bound=AbstractAstrometryData|RVData)
Examples
>>> from harv.data.datasets import RVData, GaiaAstrometryData >>> from harv.data.containers import SourceData >>> source_data = SourceData( ... keck_rv=RVData(...), ... wiyn_rv=RVData(...), ... gaia=GaiaAstrometryData(...), ... ) >>> source_data.stacked_by_type(RVData) RVData(...)
- property t_ref: Real[Quantity[PhysicalType('time')], ''] | None¶
Reference epoch shared by all contained datasets.
Guaranteed to be consistent across components because every concrete subclass calls
_synchronize_t_refs()in its__init__.
- values()¶
Dataset/component values.
- Return type:
harv.data.datasets module¶
Observation data classes for time series data.
- class harv.data.datasets.AbstractAstrometryData¶
Bases:
AbstractDataAbstract base class for astrometric data.
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.datasets.AbstractData¶
Bases:
ModuleAbstract base class for observational data time series.
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
- class harv.data.datasets.GaiaAstrometryData¶
Bases:
AbstractAstrometryDataGaia epoch astrometry (along-scan measurements).
Examples
>>> import jax.numpy as jnp >>> from unxt import Q >>> from harv import GaiaAstrometryData >>> data = GaiaAstrometryData( ... time=Q([0.0, 100.0, 200.0], "day"), ... al_position=Q([0.1, -0.2, 0.05], "mas"), ... al_position_err=Q([0.01, 0.01, 0.01], "mas"), ... scan_angle=Q([0.5, 1.2, 2.8], "rad"), ... parallax_factor=jnp.array([0.3, -0.1, 0.4]), ... ) >>> data.n_times 3
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])al_position (
Real[Quantity[PhysicalType('angle')], 'n'])al_position_err (
Real[Quantity[PhysicalType('angle')], 'n'])scan_angle (
Real[Quantity[PhysicalType('angle')], 'n'])parallax_factor (
Float[Array, 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
-
al_position:
Real[Quantity[PhysicalType('angle')], 'n']¶ Along-scan position.
-
al_position_err:
Real[Quantity[PhysicalType('angle')], 'n']¶ Along-scan uncertainty.
-
scan_angle:
Real[Quantity[PhysicalType('angle')], 'n']¶ Per-CCD scan angle.
-
parallax_factor:
Float[Array, 'n']¶ AL parallax factors.
- plot(ax=None, *, al_unit=None, add_labels=True, relative_to_t_ref=False, **kwargs)¶
Plot along-scan residuals vs time.
- Parameters:
ax (
Any) –matplotlib.axes.Axesinstance to draw on. IfNone, usesplt.gca().al_unit (
str|None) – Display unit for the along-scan position. Defaults to the data’s own unit.add_labels (
bool) – Add axis labels.relative_to_t_ref (
bool) – Plot time relative tot_ref.**kwargs (
Any) – Passed toax.errorbar(). Defaults can be overridden.
- Returns:
The
matplotlib.axes.Axesinstance.- Return type:
Examples
>>> import jax.numpy as jnp >>> import matplotlib.pyplot as plt >>> from unxt import Q >>> from harv import GaiaAstrometryData >>> data = GaiaAstrometryData( ... time=Q([0.0, 100.0, 200.0], "day"), ... al_position=Q([0.1, -0.2, 0.05], "mas"), ... al_position_err=Q([0.01, 0.01, 0.01], "mas"), ... scan_angle=Q([0.5, 1.2, 2.8], "rad"), ... parallax_factor=jnp.array([0.3, -0.1, 0.4]), ... ) >>> ax = data.plot() >>> plt.close("all")
- __init__(time, al_position, al_position_err, scan_angle, parallax_factor, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])al_position (
Real[Quantity[PhysicalType('angle')], 'n'])al_position_err (
Real[Quantity[PhysicalType('angle')], 'n'])scan_angle (
Real[Quantity[PhysicalType('angle')], 'n'])parallax_factor (
Float[Array, 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
- class harv.data.datasets.RVData¶
Bases:
AbstractDataRadial velocity measurements.
Examples
>>> from unxt import Q >>> from harv import RVData >>> data = RVData( ... time=Q([0.0, 50.0, 100.0], "day"), ... rv=Q([1.0, -2.0, 0.5], "km/s"), ... rv_err=Q([0.5, 0.5, 0.5], "km/s"), ... ) >>> data.n_times 3
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])rv (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])rv_err (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- __init__(time, rv, rv_err, *, t_ref=None)¶
- Parameters:
time (
Real[Quantity[PhysicalType('time')], 'n'])rv (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])rv_err (
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n'])t_ref (
Real[Quantity[PhysicalType('time')], '']|None)
- Return type:
None
- static __new__(cls, *args, **kwargs)¶
-
t_ref:
Real[Quantity[PhysicalType('time')], '']|None= None¶ Reference epoch. If None, uses mean observation time.
-
time:
Real[Quantity[PhysicalType('time')], 'n']¶ Barycentric TCB times.
-
rv:
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']¶ Radial velocities.
-
rv_err:
Real[Quantity[PhysicalType({'speed', 'velocity'})], 'n']¶ Radial velocity uncertainties.
- plot(ax=None, *, rv_unit=None, add_labels=True, relative_to_t_ref=False, phase_fold=None, **kwargs)¶
Plot RV data as error bars.
- Parameters:
ax (
Any) – Thematplotlib.axes.Axesinstance to draw on. IfNone, usesplt.gca().rv_unit (
str|None) – Display unit for the RV axis. Defaults to the data’s own unit.add_labels (
bool) – Add axis labels.relative_to_t_ref (
bool) – Plot time relative tot_ref. Mutually exclusive withphase_fold.phase_fold (
Any|None) – If given, fold observations to orbital phase using this period: x = (time - t_ref) / phase_fold mod 1. Mutually exclusive withrelative_to_t_ref.**kwargs (
Any) – Passed toax.errorbar(). Defaults can be overridden.
- Returns:
The
matplotlib.axes.Axesinstance.- Return type:
Examples
>>> import matplotlib.pyplot as plt >>> from unxt import Q >>> data = RVData( ... time=Q([0.0, 50.0, 100.0], "day"), ... rv=Q([1.0, -2.0, 0.5], "km/s"), ... rv_err=Q([0.5, 0.5, 0.5], "km/s"), ... ) >>> ax = data.plot() # uses errorbar() with sensible defaults >>> ax = data.plot(color="C1", markersize=6) # override style >>> ax = data.plot(phase_fold=Q(50.0, "day")) # phase-folded >>> plt.close("all")
harv.data.helpers module¶
Helper functions for stacking and combining datasets.
- harv.data.helpers.build_indicator_matrix(datasets, reference)¶
Build indicator matrix for multi-survey data of the same type.
- Parameters:
datasets (
dict[str,TypeVar(DT, bound=AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order must match the order used when stacking (seestack_datasets()).reference (
str) – Name of the reference instrument (its observations get no offset column).
- Return type:
tuple[TypeVar(DT, bound=AbstractData),Array|None,tuple[str,...] |None]- Returns:
stacked (DT) – Stacked dataset containing all observations.
indicator (jax.Array | None) – Shape
(n_obs_total, n_non_ref).indicator[i, j] = 1when observationibelongs to non-reference instrumentj.instrument_names (tuple[str, …] | None) – Names of the non-reference instruments, in column order.
Examples
>>> from unxt import Q >>> from harv.data import RVData >>> from harv.data.helpers import build_indicator_matrix >>> rv1 = RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([1.0, -2.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ) >>> rv2 = RVData( ... time=Q([10.0, 60.0], "day"), ... rv=Q([0.5, -1.5], "km/s"), ... rv_err=Q([0.3, 0.3], "km/s"), ... ) >>> stacked, indicator, names = build_indicator_matrix( ... {"survey1": rv1, "survey2": rv2}, reference="survey1", ... ) >>> stacked.n_times 4 >>> names ('survey2',) >>> indicator.shape (4, 1)
- harv.data.helpers.stack_datasets(datasets)¶
Concatenate multiple datasets in dict order into a single one.
- Parameters:
datasets (
dict[str,TypeVar(DT, bound=AbstractData)]) – Ordered mapping of instrument name -> dataset. Dict order determines the row order in the stacked output; it must match the order used when building the indicator matrix (seebuild_indicator_matrix()).- Return type:
TypeVar(DT, bound=AbstractData)
Examples
>>> from unxt import Q >>> from harv.data import RVData >>> from harv.data.helpers import stack_datasets >>> rv1 = RVData( ... time=Q([0.0, 50.0], "day"), ... rv=Q([1.0, -2.0], "km/s"), ... rv_err=Q([0.5, 0.5], "km/s"), ... ) >>> rv2 = RVData( ... time=Q([10.0, 60.0], "day"), ... rv=Q([0.5, -1.5], "km/s"), ... rv_err=Q([0.3, 0.3], "km/s"), ... ) >>> stacked = stack_datasets({"instr1": rv1, "instr2": rv2}) >>> stacked.n_times 4