Mixins#

Mixin methods and classes for lightkurve data objects

class lkdata.mixins.AggMixin[source]#

Bases: object

Mixin class for data aggregation methods like downsampling

bin(bins, level, agg_func=None, uncertainty_agg_func=None, counts=False, index_agg_func='mean', **agg_kwargs)[source]#

Perform user-defined binning.

Parameters:

binsArrayLike: An array of the left edge values for the bins
levelint or str: Level of the index on which to apply bins
agg_funcstr or function, optional: The aggregation method by which to combine data. If None, the class’ ds_agg_func will be used which is summation for data and bitwise classes, and np.logical_or for boolean classes.
uncertainty_agg_funcstr or function, optional: For data classes, define how uncertainty should be aggregated. If None is given for a class with uncertainty, the root mean square is used. If the class has no associated uncertainty, this is ignored.
countsbool, default = False: Whether to return the counts for each bin including NaNs.
index_agg_func: str: How to aggregate the indices, by default uses “mean” which returns a mean aggregation. “detailed” will use a mean aggregation and add an index to track the indices used in aggregation. All other pandas supported aggregation identifiable by a string are supported.

Returns:

self.__class__: Returns an aggregated object of the same type given.

downsample(nframes=5, level=-1, index_agg_func='mean')[source]#

Downsample the data by averaging over nframes consecutive rows.

Parameters:

nframesint, default=5: Number of frames to average over. Default is 5.
levelUnion[int, str], default=-1: Index level to use for downsampling. Default is -1 (last level).
index_agg_func: str: How to aggregate the indices, by default uses “mean” which returns a mean aggregation. “detailed” will use a mean aggregation and add an index to track the indices used in aggregation. All other pandas supported aggregation identifiable by a string are supported.

Returns:

Same type as self: A new object with downsampled data.

Notes

This method works by creating bins of nframes consecutive rows, then averaging the data within each bin. Only bins with exactly nframes rows are included in the result.

If the object has an uncertainty attribute, it will be propagated by summing the squares of the uncertainties within each bin and then taking the square root.

The resulting object will have a new index that represents the mean of the original indices within each bin. If the original index included a ‘time_index’ or ‘indices’ level, this information is preserved in the new index.

static get_bins(index, nframes, right=False)[source]#

Calculate bin edges for downsampling.

Parameters:

indexarray-like: The index to be binned.
nframesint: Number of frames to average over.
rightbool, optional: Whether the intervals should be closed on the right (default: False).

Returns:

binsarray-like: Evenly spaced left bin edges for the given index containing, on on average, the appropriate number of frames.

spatial_aggregate(nrows, ncols)[source]#

Similar to spatial downsample, but specify desired dimensions

TODO: This shouldn’t be a mixin for lk products, it’s Cube specific with an application to a non-timeseries DataFrame and isn’t meaningful for series at all.

spatial_downsample(factor=2, col_factor=None, row_name=None, col_name=None, **kwargs)[source]#

Spatially downsamples a DataCube by a given factor.

Parameters:

factorint or tuple of int, default 2: If a tuple is given, the first value will be used as the factor by which to reduce the size of the row axis and the second as the column factor. If factor is an integer and col_factor is also given, this is the factor by which to decrease the spatial resolution of the row axis. If col_factor is not given, this is the both the row and column factor.
col_factorint, optional: Factor by which to decrease the spatial resolution of the column axis.
row_namestr, optional: Name of the axis corresponding to the row to be downsampled. By default the primary row axis is used.
col_namestr, optional: Name of the axis corresponding to the column to be downsampled. By default the primary column axis is used.

Returns:

lkdata.DataCube: A spatially downsampled object of the same type.
TODO: This shouldn’t be a mixin for lk products, it’s Cube specific
with an application to a non-timeseries DataFrame and isn’t meaningful
for Series at all.

super_sample(nrows, ncols)[source]#: Split pixels for super sampling

class lkdata.mixins.BitwiseMixin[source]#

Bases: IndexProcessorMixin

Mixin class that provides functionality for handling bitwise data.

Bitwise data are data which are integers in binary form. In the context of the Kepler and TESS Missions, flags are given as integers which, when converted to their binary form, indicate which flags apply to the data. Each flag corresponds to a bit. For an example, see Table 32 of the TESS Science Data Products Description Document. A value of 5 is represented in binary as 101, indicating that the 1st and 3rd bits are “on” corresponding to flags Attitude Tweak and Spacecraft is in a Coarse Point from the table.

In aggregating bitwise data, i.e. via downsampling, we combine all flags.

bin_to_int = <numpy.vectorize object>#

static breakdown(val)[source]#: Breaks down an integer into its constituent powers of 2.

property codes#: Return the code dictionary used in this Bitwise product.

convert_set_to_int = <numpy.vectorize object>#

parse_code(val)[source]#: Parse a bitset into a dictionary of corresponding codes.

stylize_frame(df, **kwargs)[source]#

Overrides default to remove background gradient and to parse integers to a set of codes based on binary representation.

Return type:: Styler

property values_display#

Get the current display mode for values.

Returns:

str: The current display mode for values. Possible values are: - ‘int’: Display the raw integer values. - ‘bitset’: Display the values as sets of powers of 2. - ‘detailed’: Display the values as dictionaries mapping powers of 2 to their corresponding codes.

Notes

This property is used to control how values are displayed in the object’s string representation and in any generated output (e.g., when using Jupyter notebooks).

class lkdata.mixins.BoolMixin[source]#

Bases: IndexProcessorMixin

Math mixins for lightkurve bool objects.

All operators should simply return the “logical or” for each element.

ds_agg_func(array, axis=0, dtype=None, out=None, keepdims=False, initial=<no value>, where=True)#

Reduces array’s dimension by one, by applying ufunc along one axis.

Let \(array.shape = (N_0, ..., N_i, ..., N_{M-1})\). Then \(ufunc.reduce(array, axis=i)[k_0, ..,k_{i-1}, k_{i+1}, .., k_{M-1}]\) = the result of iterating j over \(range(N_i)\), cumulatively applying ufunc to each \(array[k_0, ..,k_{i-1}, j, k_{i+1}, .., k_{M-1}]\). For a one-dimensional array, reduce produces results equivalent to:

r = op.identity # op = ufunc
for i in range(len(A)):
  r = op(r, A[i])
return r

For example, add.reduce() is equivalent to sum().

Parameters:

arrayarray_like

The array to act on.

axisNone or int or tuple of ints, optional

Axis or axes along which a reduction is performed. The default (axis = 0) is perform a reduction over the first dimension of the input array. axis may be negative, in which case it counts from the last to the first axis.

If this is None, a reduction is performed over all the axes. If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.

For operations which are either not commutative or not associative, doing a reduction over multiple axes is not well-defined. The ufuncs do not currently raise an exception in this case, but will likely do so in the future.

dtypedata-type code, optional

The data type used to perform the operation. Defaults to that of out if given, and the data type of array otherwise (though upcast to conserve precision for some cases, such as numpy.add.reduce for integer or boolean input).

outndarray, None, …, or tuple of ndarray and None, optional

Location into which the result is stored. If not provided or None, a freshly-allocated array is returned. If passed as a keyword argument, can be Ellipses (out=...) to ensure an array is returned even if the result is 0-dimensional (which is useful especially for object dtype), or a 1-element tuple (latter for consistency with ufunc.__call__).

Added in version 2.3: Support for out=... was added.

keepdimsbool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.

initialscalar, optional

The value with which to start the reduction. If the ufunc has no identity or the dtype is object, this defaults to None - otherwise it defaults to ufunc.identity. If None is given, the first element of the reduction is used, and an error is thrown if the reduction is empty.

wherearray_like of bool, optional

A boolean array which is broadcasted to match the dimensions of array, and selects elements to include in the reduction. Note that for ufuncs like minimum that do not have an identity defined, one has to pass in also initial.

Returns:

rndarray: The reduced array. If out was supplied, r is a reference to it.

Examples

>>> import numpy as np
>>> np.multiply.reduce([2,3,5])
30

A multi-dimensional array example:

>>> X = np.arange(8).reshape((2,2,2))
>>> X
array([[[0, 1],
        [2, 3]],
       [[4, 5],
        [6, 7]]])
>>> np.add.reduce(X, 0)
array([[ 4,  6],
       [ 8, 10]])
>>> np.add.reduce(X) # confirm: default axis value is 0
array([[ 4,  6],
       [ 8, 10]])
>>> np.add.reduce(X, 1)
array([[ 2,  4],
       [10, 12]])
>>> np.add.reduce(X, 2)
array([[ 1,  5],
       [ 9, 13]])

You can use the initial keyword argument to initialize the reduction with a different value, and where to select specific elements to include:

>>> np.add.reduce([10], initial=5)
15
>>> np.add.reduce(np.ones((2, 2, 2)), axis=(0, 2), initial=10)
array([14., 14.])
>>> a = np.array([10., np.nan, 10])
>>> np.add.reduce(a, where=~np.isnan(a))
20.0

Allows reductions of empty arrays where they would normally fail, i.e. for ufuncs without an identity.

>>> np.minimum.reduce([], initial=np.inf)
inf
>>> np.minimum.reduce([[1., 2.], [3., 4.]], initial=10., where=[True, False])
array([ 1., 10.])
>>> np.minimum.reduce([])
Traceback (most recent call last):
    ...
ValueError: zero-size array to reduction operation minimum which has no identity

class lkdata.mixins.ConvenienceMixins[source]#

Bases: object

Convenience mixins which add properties to lightkurve data objects as attributes.

fillna(*args, **kwargs)[source]#: Overwrite pandas method to return lk object

fold(period, t0=None, level=1, inplace=False, label='phase')[source]#

Fold data on a given period and adds the folded time as an index.

Parameters:

periodfloat: The period on which to fold the data. The user must ensure that it this has the same units and scale as the level.
t0float, optional: The time at which to start the first period, by default None and t0 becomes the minimum value of the time array.
levelUnion[int, str], optional: The index level on which to fold, by default 1, presumed to be the first time index that aren’t cadences.
inplacebool, optional: Whether to modify the object itself or return a new object, by default False
labelstr, optional: What label to give the new time index, by default “phase”

Returns:

Union[Cube, Frame, Series]: Returns an object of the same type given.

property ntime#: Number of cadences in the data.

property user_kwargs#: Keywords passed by the user

class lkdata.mixins.IndexProcessorMixin[source]#

Bases: object

A mixin class that provides methods for processing and manipulating index-related operations.

This mixin is designed to be used with pandas-based data structures and provides utilities for folding time series, parsing indices, and handling various index operations specific to astronomical time series data.

See also

pandas.Index: The basic object storing axis labels for all pandas objects.

Notes

This mixin is particularly useful for handling complex index structures in astronomical time series data, such as those found in Kepler and TESS observations. It provides methods for parsing and manipulating indices, which is crucial for operations like folding light curves and handling spatial information in image data.

The methods in this mixin assume that the class it’s mixed into has certain attributes and methods typical of pandas-based data structures, such as index, columns, and pandas-style indexing operations.

static agg_index(index_names, new_index_gb, agg_func='mean')[source]#

Aggregate indices by mean and a string of indices included

Parameters:

index_nameslist[str]: List of index names
new_index_gbpd.groupby: Groupby object used to aggregate the index
agg_funcstr: Method by which to aggregate the indices. Pandas recognized aggregation strings (“mean”, “median”, etc.) and “detailed” are supported. “detailed” will aggregate by mean as well and a new index containing a string of all indices in a grouping. Note: Aggregation keyword arguments are not supported here.

Returns:

pd.MultiIndex: An aggregated index

droplevel(level, axis=0)[source]#

Drop a level from the index, dropping columns this way is disabled

Parameters:

levelint or str: The level to be dropped, cannot be the 0th level
axis{0 or ‘index’}, default 0: axis must be 0, dropping columns is not supported. Included for consistency with pandas.

Returns:

self.__class__: Returns a new instance of the calling class with the index dropped

Raises:

ValueError: 0-level indices cannot be dropped by this method.
NotImplementedError: _description_

See also

pandas.DataFrame.droplevel: method to drop levels from DataFrames
pandas.Series.droplevel: method to drop levels from Series

parse_columns(columns=None, row_indices=None, col_indices=None, nrow=0, ncol=0, continuous=False, nseries=0)[source]#

Parse row and column information from given information

Parameters:

columnspd.MultiIndex, optional: An existing columns instance, easiest to deal with, by default None
row_indicesdict, optional: A dictionary of row arrays, by default None
col_indicesdict, optional: A dictionary of column arrays, by default None
nrowint, default 0: The number of rows, by default 0. Must be defined if row_indices is not None.
ncolint, default 0: The number of columns, by default 0. Must be defined if col_indices is not None.
continuousbool, default False: Whether the rows and columns in row and col indices should be interpreted as continuous. If not continuous, the arrays given in row and col indices should correspond to coordinates by pixel. For DataCubes, the region must be continous. For DataFrames, it is assumed that the region is non-contiguous, by default False.
nseriesint, default 0: The number of series, by default 0.

Returns:

pd.MultiIndex, int, int: Returns a tuple of the parsed columns instance, the number of rows, and the number of columns inferred from the inputs.

static parse_index(index=None, time_indices=None, ntime=0, default=True)[source]#

Parse given indices and return a single pandas MultiIndex

Parameters:

indexpd.MultiIndex, optional: An existing pandas MultiIndex to be parsed. If provided, its levels will be incorporated into the resulting index.
time_indicesdict or array-like, optional: A dictionary of time-related indices or an array of time values. If a dictionary, keys represent index names and values are the corresponding arrays. Special keys ‘row’ and ‘col’ are reserved and will raise an error if used.
ntimeint, optional: The number of time points. Used to create a default time index if no other time information is provided. Default is 0, overwritten by the shape of any other given parameter.

Returns:

pd.MultiIndex: A pandas MultiIndex constructed from the input parameters.

Raises:

ValueError: If ‘row’ or ‘col’ keys are present in time_indices. If ‘index’ is provided but is not a pd.MultiIndex.

Notes

If neither ‘time_index’ nor ‘mid_index’ are present in the input, a default ‘time_index’ will be created using numpy.arange.
For downsampled data, ‘mid_index’ is used in place of ‘time_index’, and an additional ‘indices’ level is included, containing a string of all indices aggregated for the row.
The method prioritizes existing index information, falling back to provided time_indices, and finally to a default range index if necessary.

static parse_pos_indices(row_indices, col_indices, nrow, ncol, nseries=0)[source]#

Parse and process row and column indices for data representation.

TPF data are typically stored in an intuitive 3D structure, with time as the 1st dimension, row (or column) as the 2nd, and the column (or row) as the 3rd. In using pandas as the backend for our data, we store time as the index of the DataFrame and need rows and columns to be in the DataFrame.columns.

This method processes the given row and column indices, ensuring they are in the correct format and shape for the data representation. It handles various input types and converts them into a standardized dictionary format.

The standard for row and column arrays is to provide an array of size nrow and ncol respectively, definining the row and column indices.

I.e. for a 3x4 image, a possible scenario is that row = [1, 2, 3] and col = [1, 2, 3, 4]. In the DataFrame, this must be organized such that each column corresponds to one of the coordinates. So row and col must be flattend to row = [1, 1, 1, 1, 2, 2, …, 3, 3] and col = [1, 2, 3, 4, 1, 2, …, 3, 4] so that series[0] is [1, 1], series[2] is [1, 2], …, and series[11] is [3, 4]

Parameters:

row_indicesint, array-like, or dict: The row indices. Can be an integer (starting index), an array-like object, or a dictionary of named row indices.
col_indicesint, array-like, or dict: The column indices. Can be an integer (starting index), an array-like object, or a dictionary of named column indices.
nrowint: The number of rows in the data.
ncolint: The number of columns in the data.

Returns:

tuple of dicts: A tuple containing two dictionaries: - The first dictionary contains the processed row indices. - The second dictionary contains the processed column indices.

Raises:

ValueError: If the shape of the provided indices doesn’t match the shape of the data, or if ‘time_index’ is used as a key in row_indices or col_indices.

Notes

If row_indices or col_indices is an integer, it’s interpreted as the starting index, and a range is created.
If row_indices or col_indices is an array-like object, it’s processed to ensure compatibility with the data shape.
If row_indices or col_indices is a dictionary, each value is processed to ensure compatibility with the data shape.

sort_index(*args, **kwargs)[source]#

Sort the index of the data structure.

This method wraps pandas’ sort_index method and extends it to handle the uncertainty array and maintain the internal array structure.

Parameters:

*args: Positional arguments to pass to pandas’ sort_index method.
**kwargs: Keyword arguments to pass to pandas’ sort_index method. Notable kwargs include: inplace : bool, optional. If True, perform operation in-place. level : int or str, optional. If index is a MultiIndex, sort on the given level.

Returns:

self.__class__ or None: If inplace=False, returns a new sorted object. If inplace=True, sorts in-place and returns None.

See also

pandas.DataFrame.sort_index: The pandas method this wraps.

Notes

This method maintains the structure of the data and uncertainty arrays when sorting. It also ensures that convenience attributes are updated after sorting.

class lkdata.mixins.MathMixin[source]#

Bases: IndexProcessorMixin

Mixin class to add arithmetic to lightkurve data classes with uncertainty.

See also

astropy.nddata.nduncertainty: astropy module from which uncertainty classes and operations have been derived.

property array#

Numpy array representation

Cubes have shape (ntime, nrow, ncol) SeriesCollections have shape (ntime, nseries) and Series have shape (ntime)

Returns:

np.ndarray: An array representation of the data.

Notes

Uncertainties rely on parent data that are persistent and array-like. For Data classes, therefore, array must be stored in memory. This overwrites the on-call form defined in the ConvenienceMixin class.

property data#: Alias for self.array

property uncertainty#: An NDData.Uncertainty object

class lkdata.mixins.StatsMixin[source]#

Bases: MathMixin

Generic mixin class for statistical methods in lightkurve data objects.

This mixin provides common statistical methods such as mean, sum, std, var, min, max, and prod for lightkurve data objects. It also includes cumulative methods like cumsum, cummin, cummax, and cumprod.

Attributes:

ds_agg_funcfunc: The aggregation function to use for downsampling, default is “sum”.

Methods

median(**kwargs)

Calculates the median of the data along a given axis.

median(axis=None, **kwargs)[source]#

Get the median of the data along the given axis.

This method calculates the median of the data and also estimates the uncertainty of the median based on the uncertainty of the mean.

Parameters:

axis{0, 1, None}, default None: Axis along which to calculate the median. If None, then calculate along each axis. If 0, then calculates the median for each pixel over all time steps and returns a DataFrame. If 1, then calculates the median of all pixels for each time step and returns a Series.
**kwargsdict, optional: Additional keyword arguments to be passed to numpy’s median function. See np.ndarray.median for more details.

Returns:

resultpd.DataFrame, lkData.Series, or float: The median of the data along the specified axis.

Raises:

ValueError: If axis=2 is specified for Cubes, as it is not supported.

See also

numpy.median: NumPy’s median function used internally.

Notes

The uncertainty of the median is calculated based on the uncertainty of the mean, adjusted by a factor related to the number of data points.

The efficiency of the variance of the median to the variance of the mean is calculated as (π * N) / (2 * (N - 1)), where N is the number of data points along the specified axis (this is the inverse of the form usually presented for the efficiency of the mean to the median). See https://mathworld.wolfram.com/StatisticalMedian.html.

Mixins#

This Page