Tutorial#

[1]:
from lksearch import MASTSearch, KeplerSearch, K2Search, TESSSearch

Welcome to the new lksearch module! This package allows users to peruse available data products for the TESS, Kepler, and K2 missions. This notebook will guide you through several examples of how to use search functions.

The result of the search is a MASTSearch object, which contains among other things a full list of results stored in a pandas dataframe.

NOTE: While MASTSearch is a usable class, it does not have all of the functionality or nicities of the mission-specific searches (TESSSearch/KeplerSearch/K2Search). It is therefore recommended you as the user interact with these instead.

Basic Searches#

Data Exploration#

The lksearch package provides a user-friendly wrapper to search the MAST data archive. The most generic search is to use MASTsearch, which checks for mission products from three missions (Kepler, K2, and TESS). This search can be useful for data exploration, but does not have full functionality, as discussed below.

In addition, you can specify

  • search_radius: a search radius (assumes arcsec by default, but you can specify anything by using astropy units)

  • exptime: the exposure time of the observation. Either a number or a range in the form of a tuple

  • mission: the mission - only Kepler, K2, and TESS are directly supported

  • pipline: the pipeline(s) used to create the product, eg. Kepler, K2, SPOC, QLP, KBONUS-BKG, etc

and in the case of mission-specific searches

  • a sequence number*

    • sector for TESS

    • quarter/month for Kepler

    • campaign for K2

**NOTE* MASTSearch allows a sequence number, but it will result in selecting the same sequence for all mission. For example, if you provide sequence = 5, it will return only data from TESS sector 5, K2 campaign 5, or Kepler quarter 5.

For data exploration, it is suggested users search with the default setting (ie, providing only the target). You can use the provided class functions to extract the data products you want to download, as demonstrated throughout this tutorial.

[2]:
# First, we can check what data is available by any mission (TESS, Kepler, or K2)
# TOI-1161 is the same as Kepler-13. You can search using either name to get the same results
search_result = MASTSearch("TOI 1161")
search_result
[2]:
MASTSearch object containing 244 data products
target_name pipeline mission exptime distance year description
0 158324245 SPOC TESS 120.0 0.0 2019 full data validation report
1 158324245 SPOC TESS 120.0 0.0 2019 Light curves
2 158324245 SPOC TESS 120.0 0.0 2019 TCE summary report
3 158324245 SPOC TESS 120.0 0.0 2019 Data validation mini report
4 158324245 SPOC TESS 120.0 0.0 2019 Target pixel files
... ... ... ... ... ... ... ...
239 kplr009941662 Kepler Kepler 60.0 0.0 2012 Lightcurve Short Cadence (CSC) - Q14
240 kplr009941662 Kepler Kepler 60.0 0.0 2012 Target Pixel Short Cadence (TPS) - Q14
241 kplr009941662 Kepler Kepler 60.0 0.0 2013 Lightcurve Short Cadence (CSC) - Q17
242 kplr009941662 Kepler Kepler 60.0 0.0 2011 Target Pixel Short Cadence (TPS) - Q11
243 kplr009941662 Kepler Kepler 60.0 0.0 2009 Target Pixel Short Cadence (TPS) - Q2

244 rows × 7 columns

Note that search_result is a MASTSearch object. When calling the object, a summary of the contents (MASTSearch object containing X data products) is printed to the screen along with a subset of the observation table.

The returned MASTSearch object has several properties to easily access specific observation characteristics. These include

  • target name (target_name)

  • right ascension (ra)

  • declination (dec)

  • exposure time (exptime)

  • mission

  • obsrvation year (year)

  • reduction pipeline (pipeline)

  • data location URI (uris)

  • data location in cloud storage (cloud_uris)

[3]:
# Let's use this to check what mission(s) have observed this target
print(search_result.mission)

['TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS'
 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'TESS' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler' 'Kepler'
 'Kepler' 'Kepler']
[4]:
print(f"There are {sum(search_result.mission == 'TESS')} observations by TESS and {sum(search_result.mission == 'Kepler')} by Kepler")

There are 128 observations by TESS and 116 by Kepler

Accessing the full result table#

To view all of the results, you can access the ‘table’ variable. This table is a pandas dataframe that contains the full results of the MAST search as well as a few added parameters such as year, start_time, and end_time.

[5]:
search_result.table
[5]:
intentType obs_collection_obs provenance_name instrument_name project_obs filters_obs wavelength_region target_name target_classification obs_id ... size parent_obsid dataRights calib_level_prod filters_prod pipeline mission year start_time end_time
0 science TESS SPOC Photometer TESS TESS Optical 158324245 NaN tess2019198215352-s0014-0000000158324245-0150-s ... 16426251 27448285 PUBLIC 3 TESS SPOC TESS 2019 2019-07-18 20:30:33.624 2019-08-14 16:56:23.634
1 science TESS SPOC Photometer TESS TESS Optical 158324245 NaN tess2019198215352-s0014-0000000158324245-0150-s ... 1964160 27448285 PUBLIC 3 TESS SPOC TESS 2019 2019-07-18 20:30:33.624 2019-08-14 16:56:23.634
2 science TESS SPOC Photometer TESS TESS Optical 158324245 NaN tess2019198215352-s0014-0000000158324245-0150-s ... 1322689 27448285 PUBLIC 3 TESS SPOC TESS 2019 2019-07-18 20:30:33.624 2019-08-14 16:56:23.634
3 science TESS SPOC Photometer TESS TESS Optical 158324245 NaN tess2019198215352-s0014-0000000158324245-0150-s ... 5082604 27448285 PUBLIC 3 TESS SPOC TESS 2019 2019-07-18 20:30:33.624 2019-08-14 16:56:23.634
4 science TESS SPOC Photometer TESS TESS Optical 158324245 NaN tess2019198215352-s0014-0000000158324245-0150-s ... 47376000 27448285 PUBLIC 2 TESS SPOC TESS 2019 2019-07-18 20:30:33.624 2019-08-14 16:56:23.634
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
239 science Kepler Kepler Kepler Kepler KEPLER OPTICAL kplr009941662 NaN kplr009941662_sc_Q003300033333333332 ... 5045760 599738 PUBLIC 2 KEPLER Kepler Kepler 2012 2009-05-02 00:54:00.000 2013-05-11 12:15:00.000
240 science Kepler Kepler Kepler Kepler KEPLER OPTICAL kplr009941662 NaN kplr009941662_sc_Q003300033333333332 ... 76302956 599738 PUBLIC 2 KEPLER Kepler Kepler 2012 2009-05-02 00:54:00.000 2013-05-11 12:15:00.000
241 science Kepler Kepler Kepler Kepler KEPLER OPTICAL kplr009941662 NaN kplr009941662_sc_Q003300033333333332 ... 648000 599738 PUBLIC 2 KEPLER Kepler Kepler 2013 2009-05-02 00:54:00.000 2013-05-11 12:15:00.000
242 science Kepler Kepler Kepler Kepler KEPLER OPTICAL kplr009941662 NaN kplr009941662_sc_Q003300033333333332 ... 130764115 599738 PUBLIC 2 KEPLER Kepler Kepler 2011 2009-05-02 00:54:00.000 2013-05-11 12:15:00.000
243 science Kepler Kepler Kepler Kepler KEPLER OPTICAL kplr009941662 NaN kplr009941662_sc_Q003300033333333332 ... 64708791 599738 PUBLIC 2 KEPLER Kepler Kepler 2009 2009-05-02 00:54:00.000 2013-05-11 12:15:00.000

244 rows × 59 columns

By default, MASTSearch returns any available data provided by an official mission pipeline. This means that any available High Level Science Products (HLSPs) are NOT returned. Additonally, TESS full frame images (FFIs) are not returned by MASTSearch. To search for these data types, we recommend using the mission-specific searches.

Configuration and Caching#

lksearch has a default file download location that serves as the file cache, and an optional configuration file that can be created and used to overwrite the default values

lksearch File Download and Cache#

The lksearch file cache is a directory where files are downloaded to. This directory also serves as a cache directory, and if a file matching the name of the file to be downloaded exists we treat this as a cached file and by default do not overwrite the current file on disk.

The default file download and cache directory is located at: ~/.lksearch/cache

This can be verified using the get_cache_dir convenience function in the config sub-module, e.g.:

[31]:
from lksearch import config as lkconfig
lkconfig.get_cache_dir()
[31]:
'/Users/tapritc2/.lksearch/cache'

Clearing the Cache & Corrupted Files#

If you wish to delete an individual file that you downloaded (for example, if you are concerned that a previously downloaded file is corrupted), the easiest way to do that is using the Local Path information in the manifest returned by the .download() function.

[32]:
import os
# The manifest returned by download() is a pandas DataFrame
# We will access the first local path using iloc as so
os.remove(manifest.iloc[0]["Local Path"])

If you want to clear everything from your cache, you can use the config.clearcache() function to completely empty your cache of downloaded files. by default this will run in “test” mode and print what you will be deleting. To confirm deletion, run with test=False optional parameter.

[33]:
lkconfig.clearcache()
Running in test mode, rerun with test=False to clear cache
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESS
removing /Users/tapritc2/.lksearch/cache/mastDownload/K2
removing /Users/tapritc2/.lksearch/cache/mastDownload/Kepler
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESSCut
removing /Users/tapritc2/.lksearch/cache/mastDownload/HLSP

Passing ``test=False`` will then fully delete the above directories

e.g. lkconfig.clearcache(test=False)

lksearch Configuration and Configuration File#

lksearch has a number of configuration parameters, these are contained in the ~lksearch.Conf class. One can modify these parameters for a given python session by updating the values in the Conf class. To modify these configuration parameters default values, lksearch also has an optional configuration file that is built on-top of ~astropy.config using ~astropy.config.ConfigNamespace. This file does not exist by default, but a default version can be created using the config.create_config_file helper function. Modifications to the values in this file will then update the default ~lksearch.Conf values.

[34]:
lkconfig.create_config_file(overwrite = True)

This file can be found in the below location. To edit this, please see the astropy.config documentation.

[35]:
lkconfig.get_config_dir()
[35]:
'/Users/tapritc2/.lksearch/config'
[36]:
lkconfig.get_config_file()
[36]:
'/Users/tapritc2/.lksearch/config/lksearch.cfg'

lksearch Cloud Configuration#

lksearch has three configuration parameters that are particularly relevant to cloud-based science platforms. These are: - CLOUD_ONLY: Only Download cloud based data. If False, will download all data. If True, will only download data located on a cloud (Amazon S3) bucket - PREFER_CLOUD: Prefer Cloud-based data product retrieval where available - DDOWNLOAD_CLOUD: Download cloud based data.If False, download() will return a pointer to the cloud based datainstead of downloading it - intended usage for cloud-based science platforms (e.g. TIKE)

CLOUD_ONLY governs whether or not non-cloud based data will be possible to be downloaded. Many science files have both a cloud-based location (typically on Amazon S3) and a MAST archive location. By default this is False, and all products will be downloaded regardless of whether the file is available via cloud-hosting or MAST archive hosting. If CLOUD_ONLY is True, only files available for download on a cloud-based platform will be retrieved. This configuration parameter is passed through to the ~astroquery.mast parameter of the same name.

PREFER_CLOUD governs the default download behaviour in the event that a data product is available from both a cloud-based location and a MAST-hosted archive location. If True (default), then lksearch will preferentially download files from the cloud-host rather than the MAST-hosted Archive. This configuration parameter is passed through to the ~astroquery.mast parameter of the same name.

DOWNLOAD_CLOUD governs whether files that are hosted on the cloud are downloaded locally. If this value is True (default), cloud-hosted files are downloaded normally. If False, then files hosted on a cloud based platform are not downloaded, and a URI containing the path to the desired file on the cloud-host is returned instead of the local path to the file. This path can then be used to read the file remotely (see ~astropy.io.fits working with remote and cloud hosted files for more information). This ability may be most relevant when using lksearch on a cloud-based science platform where the remote read is very rapid and short-term local storage comparatively expensive.

Using this DOWNLOAD_CLOUD functionality, we can find a cloud-hosted file and read it directly into memory like so:

[37]:
#First, lets update our configuration to not download a cloud-hosted file
from lksearch import Conf
Conf.DOWNLOAD_CLOUD=False

# Now, lets find some data. We use this target earlier in the tutorial.
toi = TESSSearch('TOI 1161')

#What happens when we try to download it in our updated configuration?
cloud_result = toi.timeseries.mission_products[0].download()
cloud_result
pipeline products: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 395.88it/s]
[37]:
Local Path Status Message URL
0 s3://stpubdata/tess/public/tid/s0014/0000/0001... COMPLETE Link to S3 bucket for remote read None

As we can see above, instead of downloading the above file we have instead returned an amazon S3 URI for its cloud hosted location. If we want to access the file, we can do it using the remote-read capabilities of ~astropy.io.fits.

[38]:
import astropy.io.fits as fits
with fits.open(cloud_result["Local Path"].values[0], use_fsspec=True, fsspec_kwargs={"anon": True}) as hdu:
    for item in hdu:
        print(item.fileinfo())
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 0, 'datLoc': 5760, 'datSpan': 0}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 5760, 'datLoc': 20160, 'datSpan': 1935360}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 1955520, 'datLoc': 1961280, 'datSpan': 2880}