Mixins#
Mixin methods and classes for lightkurve data objects
- class lkdata.mixins.AggMixin[source]#
Bases:
objectMixin class for data aggregation methods like downsampling
- bin(bins, level, agg_func=None, uncertainty_agg_func=None, counts=False, index_agg_func='mean', **agg_kwargs)[source]#
Perform user-defined binning.
- Parameters:
- binsArrayLike
An array of the left edge values for the bins
- levelint or str
Level of the index on which to apply bins
- agg_funcstr or function, optional
The aggregation method by which to combine data. If None, the class’ ds_agg_func will be used which is summation for data and bitwise classes, and np.logical_or for boolean classes.
- uncertainty_agg_funcstr or function, optional
For data classes, define how uncertainty should be aggregated. If None is given for a class with uncertainty, the root mean square is used. If the class has no associated uncertainty, this is ignored.
- countsbool, default = False
Whether to return the counts for each bin including NaNs.
- index_agg_func: str
How to aggregate the indices, by default uses “mean” which returns a mean aggregation. “detailed” will use a mean aggregation and add an index to track the indices used in aggregation. All other pandas supported aggregation identifiable by a string are supported.
- Returns:
- self.__class__
Returns an aggregated object of the same type given.
- downsample(nframes=5, level=-1, index_agg_func='mean')[source]#
Downsample the data by averaging over
nframesconsecutive rows.- Parameters:
- nframesint, default=5
Number of frames to average over. Default is 5.
- levelUnion[int, str], default=-1
Index level to use for downsampling. Default is -1 (last level).
- index_agg_func: str
How to aggregate the indices, by default uses “mean” which returns a mean aggregation. “detailed” will use a mean aggregation and add an index to track the indices used in aggregation. All other pandas supported aggregation identifiable by a string are supported.
- Returns:
- Same type as self
A new object with downsampled data.
Notes
This method works by creating bins of
nframesconsecutive rows, then averaging the data within each bin. Only bins with exactlynframesrows are included in the result.If the object has an uncertainty attribute, it will be propagated by summing the squares of the uncertainties within each bin and then taking the square root.
The resulting object will have a new index that represents the mean of the original indices within each bin. If the original index included a ‘time_index’ or ‘indices’ level, this information is preserved in the new index.
- static get_bins(index, nframes, right=False)[source]#
Calculate bin edges for downsampling.
- Parameters:
- indexarray-like
The index to be binned.
- nframesint
Number of frames to average over.
- rightbool, optional
Whether the intervals should be closed on the right (default: False).
- Returns:
- binsarray-like
Evenly spaced left bin edges for the given index containing, on on average, the appropriate number of frames.
- spatial_aggregate(nrows, ncols)[source]#
Similar to spatial downsample, but specify desired dimensions
TODO: This shouldn’t be a mixin for lk products, it’s Cube specific with an application to a non-timeseries DataFrame and isn’t meaningful for series at all.
- spatial_downsample(factor=2, col_factor=None, row_name=None, col_name=None, **kwargs)[source]#
Spatially downsamples a DataCube by a given factor.
- Parameters:
- factorint or tuple of int, default 2
If a tuple is given, the first value will be used as the factor by which to reduce the size of the row axis and the second as the column factor. If factor is an integer and col_factor is also given, this is the factor by which to decrease the spatial resolution of the row axis. If col_factor is not given, this is the both the row and column factor.
- col_factorint, optional
Factor by which to decrease the spatial resolution of the column axis.
- row_namestr, optional
Name of the axis corresponding to the row to be downsampled. By default the primary row axis is used.
- col_namestr, optional
Name of the axis corresponding to the column to be downsampled. By default the primary column axis is used.
- Returns:
- lkdata.DataCube
A spatially downsampled object of the same type.
- TODO: This shouldn’t be a mixin for lk products, it’s Cube specific
- with an application to a non-timeseries DataFrame and isn’t meaningful
- for Series at all.
- class lkdata.mixins.BitwiseMixin[source]#
Bases:
IndexProcessorMixinMixin class that provides functionality for handling bitwise data.
Bitwise data are data which are integers in binary form. In the context of the Kepler and TESS Missions, flags are given as integers which, when converted to their binary form, indicate which flags apply to the data. Each flag corresponds to a bit. For an example, see Table 32 of the TESS Science Data Products Description Document. A value of 5 is represented in binary as 101, indicating that the 1st and 3rd bits are “on” corresponding to flags Attitude Tweak and Spacecraft is in a Coarse Point from the table.
In aggregating bitwise data, i.e. via downsampling, we combine all flags.
- bin_to_int = <numpy.vectorize object>#
- property codes#
Return the code dictionary used in this Bitwise product.
- convert_set_to_int = <numpy.vectorize object>#
- stylize_frame(df, **kwargs)[source]#
Overrides default to remove background gradient and to parse integers to a set of codes based on binary representation.
- Return type:
- property values_display#
Get the current display mode for values.
- Returns:
- str
The current display mode for values. Possible values are: - ‘int’: Display the raw integer values. - ‘bitset’: Display the values as sets of powers of 2. - ‘detailed’: Display the values as dictionaries mapping powers of 2 to their corresponding codes.
Notes
This property is used to control how values are displayed in the object’s string representation and in any generated output (e.g., when using Jupyter notebooks).
- class lkdata.mixins.BoolMixin[source]#
Bases:
IndexProcessorMixinMath mixins for lightkurve bool objects.
All operators should simply return the “logical or” for each element.
- ds_agg_func(array, axis=0, dtype=None, out=None, keepdims=False, initial=<no value>, where=True)#
Reduces
array’s dimension by one, by applying ufunc along one axis.Let \(array.shape = (N_0, ..., N_i, ..., N_{M-1})\). Then \(ufunc.reduce(array, axis=i)[k_0, ..,k_{i-1}, k_{i+1}, .., k_{M-1}]\) = the result of iterating
jover \(range(N_i)\), cumulatively applying ufunc to each \(array[k_0, ..,k_{i-1}, j, k_{i+1}, .., k_{M-1}]\). For a one-dimensional array, reduce produces results equivalent to:r = op.identity # op = ufunc for i in range(len(A)): r = op(r, A[i]) return r
For example, add.reduce() is equivalent to sum().
- Parameters:
- arrayarray_like
The array to act on.
- axisNone or int or tuple of ints, optional
Axis or axes along which a reduction is performed. The default (
axis= 0) is perform a reduction over the first dimension of the input array.axismay be negative, in which case it counts from the last to the first axis.If this is None, a reduction is performed over all the axes. If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
For operations which are either not commutative or not associative, doing a reduction over multiple axes is not well-defined. The ufuncs do not currently raise an exception in this case, but will likely do so in the future.
- dtypedata-type code, optional
The data type used to perform the operation. Defaults to that of
outif given, and the data type ofarrayotherwise (though upcast to conserve precision for some cases, such asnumpy.add.reducefor integer or boolean input).- outndarray, None, …, or tuple of ndarray and None, optional
Location into which the result is stored. If not provided or None, a freshly-allocated array is returned. If passed as a keyword argument, can be Ellipses (
out=...) to ensure an array is returned even if the result is 0-dimensional (which is useful especially for object dtype), or a 1-element tuple (latter for consistency withufunc.__call__).Added in version 2.3: Support for
out=...was added.- keepdimsbool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original
array.- initialscalar, optional
The value with which to start the reduction. If the ufunc has no identity or the dtype is object, this defaults to None - otherwise it defaults to ufunc.identity. If
Noneis given, the first element of the reduction is used, and an error is thrown if the reduction is empty.- wherearray_like of bool, optional
A boolean array which is broadcasted to match the dimensions of
array, and selects elements to include in the reduction. Note that for ufuncs likeminimumthat do not have an identity defined, one has to pass in alsoinitial.
- Returns:
- rndarray
The reduced array. If
outwas supplied,ris a reference to it.
Examples
>>> import numpy as np >>> np.multiply.reduce([2,3,5]) 30
A multi-dimensional array example:
>>> X = np.arange(8).reshape((2,2,2)) >>> X array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> np.add.reduce(X, 0) array([[ 4, 6], [ 8, 10]]) >>> np.add.reduce(X) # confirm: default axis value is 0 array([[ 4, 6], [ 8, 10]]) >>> np.add.reduce(X, 1) array([[ 2, 4], [10, 12]]) >>> np.add.reduce(X, 2) array([[ 1, 5], [ 9, 13]])
You can use the
initialkeyword argument to initialize the reduction with a different value, andwhereto select specific elements to include:>>> np.add.reduce([10], initial=5) 15 >>> np.add.reduce(np.ones((2, 2, 2)), axis=(0, 2), initial=10) array([14., 14.]) >>> a = np.array([10., np.nan, 10]) >>> np.add.reduce(a, where=~np.isnan(a)) 20.0
Allows reductions of empty arrays where they would normally fail, i.e. for ufuncs without an identity.
>>> np.minimum.reduce([], initial=np.inf) inf >>> np.minimum.reduce([[1., 2.], [3., 4.]], initial=10., where=[True, False]) array([ 1., 10.]) >>> np.minimum.reduce([]) Traceback (most recent call last): ... ValueError: zero-size array to reduction operation minimum which has no identity
- class lkdata.mixins.ConvenienceMixins[source]#
Bases:
objectConvenience mixins which add properties to lightkurve data objects as attributes.
- fold(period, t0=None, level=1, inplace=False, label='phase')[source]#
Fold data on a given period and adds the folded time as an index.
- Parameters:
- periodfloat
The period on which to fold the data. The user must ensure that it this has the same units and scale as the level.
- t0float, optional
The time at which to start the first period, by default None and t0 becomes the minimum value of the time array.
- levelUnion[int, str], optional
The index level on which to fold, by default 1, presumed to be the first time index that aren’t cadences.
- inplacebool, optional
Whether to modify the object itself or return a new object, by default False
- labelstr, optional
What label to give the new time index, by default “phase”
- Returns:
- Union[Cube, Frame, Series]
Returns an object of the same type given.
- property ntime#
Number of cadences in the data.
- property user_kwargs#
Keywords passed by the user
- class lkdata.mixins.IndexProcessorMixin[source]#
Bases:
objectA mixin class that provides methods for processing and manipulating index-related operations.
This mixin is designed to be used with pandas-based data structures and provides utilities for folding time series, parsing indices, and handling various index operations specific to astronomical time series data.
See also
pandas.IndexThe basic object storing axis labels for all pandas objects.
Notes
This mixin is particularly useful for handling complex index structures in astronomical time series data, such as those found in Kepler and TESS observations. It provides methods for parsing and manipulating indices, which is crucial for operations like folding light curves and handling spatial information in image data.
The methods in this mixin assume that the class it’s mixed into has certain attributes and methods typical of pandas-based data structures, such as
index,columns, and pandas-style indexing operations.- static agg_index(index_names, new_index_gb, agg_func='mean')[source]#
Aggregate indices by mean and a string of indices included
- Parameters:
- index_nameslist[str]
List of index names
- new_index_gbpd.groupby
Groupby object used to aggregate the index
- agg_funcstr
Method by which to aggregate the indices. Pandas recognized aggregation strings (“mean”, “median”, etc.) and “detailed” are supported. “detailed” will aggregate by mean as well and a new index containing a string of all indices in a grouping. Note: Aggregation keyword arguments are not supported here.
- Returns:
- pd.MultiIndex
An aggregated index
- droplevel(level, axis=0)[source]#
Drop a level from the index, dropping columns this way is disabled
- Parameters:
- levelint or str
The level to be dropped, cannot be the 0th level
- axis{0 or ‘index’}, default 0
axis must be 0, dropping columns is not supported. Included for consistency with pandas.
- Returns:
- self.__class__
Returns a new instance of the calling class with the index dropped
- Raises:
- ValueError
0-level indices cannot be dropped by this method.
- NotImplementedError
_description_
See also
pandas.DataFrame.droplevelmethod to drop levels from DataFrames
pandas.Series.droplevelmethod to drop levels from Series
- parse_columns(columns=None, row_indices=None, col_indices=None, nrow=0, ncol=0, continuous=False, nseries=0)[source]#
Parse row and column information from given information
- Parameters:
- columnspd.MultiIndex, optional
An existing columns instance, easiest to deal with, by default None
- row_indicesdict, optional
A dictionary of row arrays, by default None
- col_indicesdict, optional
A dictionary of column arrays, by default None
- nrowint, default 0
The number of rows, by default 0. Must be defined if row_indices is not None.
- ncolint, default 0
The number of columns, by default 0. Must be defined if col_indices is not None.
- continuousbool, default False
Whether the rows and columns in row and col indices should be interpreted as continuous. If not continuous, the arrays given in row and col indices should correspond to coordinates by pixel. For DataCubes, the region must be continous. For DataFrames, it is assumed that the region is non-contiguous, by default False.
- nseriesint, default 0
The number of series, by default 0.
- Returns:
- pd.MultiIndex, int, int
Returns a tuple of the parsed columns instance, the number of rows, and the number of columns inferred from the inputs.
- static parse_index(index=None, time_indices=None, ntime=0, default=True)[source]#
Parse given indices and return a single pandas MultiIndex
- Parameters:
- indexpd.MultiIndex, optional
An existing pandas MultiIndex to be parsed. If provided, its levels will be incorporated into the resulting index.
- time_indicesdict or array-like, optional
A dictionary of time-related indices or an array of time values. If a dictionary, keys represent index names and values are the corresponding arrays. Special keys ‘row’ and ‘col’ are reserved and will raise an error if used.
- ntimeint, optional
The number of time points. Used to create a default time index if no other time information is provided. Default is 0, overwritten by the shape of any other given parameter.
- Returns:
- pd.MultiIndex
A pandas MultiIndex constructed from the input parameters.
- Raises:
- ValueError
If ‘row’ or ‘col’ keys are present in time_indices. If ‘index’ is provided but is not a pd.MultiIndex.
Notes
If neither ‘time_index’ nor ‘mid_index’ are present in the input, a default ‘time_index’ will be created using numpy.arange.
For downsampled data, ‘mid_index’ is used in place of ‘time_index’, and an additional ‘indices’ level is included, containing a string of all indices aggregated for the row.
The method prioritizes existing index information, falling back to provided time_indices, and finally to a default range index if necessary.
- static parse_pos_indices(row_indices, col_indices, nrow, ncol, nseries=0)[source]#
Parse and process row and column indices for data representation.
TPF data are typically stored in an intuitive 3D structure, with time as the 1st dimension, row (or column) as the 2nd, and the column (or row) as the 3rd. In using pandas as the backend for our data, we store time as the index of the DataFrame and need rows and columns to be in the DataFrame.columns.
This method processes the given row and column indices, ensuring they are in the correct format and shape for the data representation. It handles various input types and converts them into a standardized dictionary format.
The standard for row and column arrays is to provide an array of size nrow and ncol respectively, definining the row and column indices.
I.e. for a 3x4 image, a possible scenario is that row = [1, 2, 3] and col = [1, 2, 3, 4]. In the DataFrame, this must be organized such that each column corresponds to one of the coordinates. So row and col must be flattend to row = [1, 1, 1, 1, 2, 2, …, 3, 3] and col = [1, 2, 3, 4, 1, 2, …, 3, 4] so that series[0] is [1, 1], series[2] is [1, 2], …, and series[11] is [3, 4]
- Parameters:
- row_indicesint, array-like, or dict
The row indices. Can be an integer (starting index), an array-like object, or a dictionary of named row indices.
- col_indicesint, array-like, or dict
The column indices. Can be an integer (starting index), an array-like object, or a dictionary of named column indices.
- nrowint
The number of rows in the data.
- ncolint
The number of columns in the data.
- Returns:
- tuple of dicts
A tuple containing two dictionaries: - The first dictionary contains the processed row indices. - The second dictionary contains the processed column indices.
- Raises:
- ValueError
If the shape of the provided indices doesn’t match the shape of the data, or if ‘time_index’ is used as a key in row_indices or col_indices.
Notes
If row_indices or col_indices is an integer, it’s interpreted as the starting index, and a range is created.
If row_indices or col_indices is an array-like object, it’s processed to ensure compatibility with the data shape.
If row_indices or col_indices is a dictionary, each value is processed to ensure compatibility with the data shape.
- sort_index(*args, **kwargs)[source]#
Sort the index of the data structure.
This method wraps pandas’ sort_index method and extends it to handle the uncertainty array and maintain the internal array structure.
- Parameters:
- *args
Positional arguments to pass to pandas’ sort_index method.
- **kwargs
Keyword arguments to pass to pandas’ sort_index method. Notable kwargs include: inplace : bool, optional. If True, perform operation in-place. level : int or str, optional. If index is a MultiIndex, sort on the given level.
- Returns:
- self.__class__ or None
If inplace=False, returns a new sorted object. If inplace=True, sorts in-place and returns None.
See also
pandas.DataFrame.sort_indexThe pandas method this wraps.
Notes
This method maintains the structure of the data and uncertainty arrays when sorting. It also ensures that convenience attributes are updated after sorting.
- class lkdata.mixins.MathMixin[source]#
Bases:
IndexProcessorMixinMixin class to add arithmetic to lightkurve data classes with uncertainty.
See also
astropy.nddata.nduncertaintyastropy module from which uncertainty classes and operations have been derived.
- property array#
Numpy array representation
Cubes have shape (ntime, nrow, ncol) SeriesCollections have shape (ntime, nseries) and Series have shape (ntime)
- Returns:
- np.ndarray
An array representation of the data.
Notes
Uncertainties rely on parent data that are persistent and array-like. For Data classes, therefore,
arraymust be stored in memory. This overwrites the on-call form defined in the ConvenienceMixin class.
- property data#
Alias for self.array
- property uncertainty#
An NDData.Uncertainty object
- class lkdata.mixins.StatsMixin[source]#
Bases:
MathMixinGeneric mixin class for statistical methods in lightkurve data objects.
This mixin provides common statistical methods such as mean, sum, std, var, min, max, and prod for lightkurve data objects. It also includes cumulative methods like cumsum, cummin, cummax, and cumprod.
- Attributes:
- ds_agg_funcfunc
The aggregation function to use for downsampling, default is “sum”.
Methods
median(**kwargs)
Calculates the median of the data along a given axis.
- median(axis=None, **kwargs)[source]#
Get the median of the data along the given axis.
This method calculates the median of the data and also estimates the uncertainty of the median based on the uncertainty of the mean.
- Parameters:
- axis{0, 1, None}, default None
Axis along which to calculate the median. If None, then calculate along each axis. If 0, then calculates the median for each pixel over all time steps and returns a DataFrame. If 1, then calculates the median of all pixels for each time step and returns a Series.
- **kwargsdict, optional
Additional keyword arguments to be passed to numpy’s median function. See np.ndarray.median for more details.
- Returns:
- resultpd.DataFrame, lkData.Series, or float
The median of the data along the specified axis.
- Raises:
- ValueError
If axis=2 is specified for Cubes, as it is not supported.
See also
numpy.medianNumPy’s median function used internally.
Notes
The uncertainty of the median is calculated based on the uncertainty of the mean, adjusted by a factor related to the number of data points.
The efficiency of the variance of the median to the variance of the mean is calculated as (π * N) / (2 * (N - 1)), where N is the number of data points along the specified axis (this is the inverse of the form usually presented for the efficiency of the mean to the median). See https://mathworld.wolfram.com/StatisticalMedian.html.