lksearch Configuration and working with Cloud Data#

lksearch has a config class and configuration file that can be used to configure the default behaviour of the package including how lksearch treats cloud data and where (or if) local files are cached.

lksearch File Download and Cache#

The lksearch file cache is a directory where files are downloaded to. This directory also serves as a cache directory, and if a file matching the name of the file to be downloaded exists we treat this as a cached file and by default do not overwrite the current file on disk.

The default file download and cache directory is located at: ~/.lksearch/cache

This can be verified using the get_cache_dir convenience function in the config sub-module, e.g.:

[1]:
from lksearch import config as lkconfig

lkconfig.get_cache_dir()
[1]:
'/Users/tapritc2/.lksearch/cache'

Clearing the Cache & Corrupted Files#

If you wish to delete an individual file that you downloaded (for example, if you are concerned that a previously downloaded file is corrupted), the easiest way to do that is using the Local Path information in the manifest returned by the .download() function.

[2]:
import os
from lksearch import K2Search

##First lets download a few files
manifest = K2Search("K2-18").HLSPs.timeseries.download()
manifest
Downloading products: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.94it/s]
[2]:
Local Path Status Message URL
0 /Users/tapritc2/.lksearch/cache/mastDownload/H... COMPLETE None None
1 /Users/tapritc2/.lksearch/cache/mastDownload/H... COMPLETE None None
2 /Users/tapritc2/.lksearch/cache/mastDownload/H... COMPLETE None None
[3]:
# The manifest returned by download() is a pandas DataFrame
# We will access the first local path using iloc as so
os.remove(manifest.iloc[0]["Local Path"])

If you want to clear everything from your cache, you can use the config.clearcache() function to completely empty your cache of downloaded files. by default this will run in “test” mode and print what you will be deleting. To confirm deletion, run with test=False optional parameter.

[4]:
lkconfig.clearcache()
Running in test mode, rerun with test=False to clear cache
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESS
removing /Users/tapritc2/.lksearch/cache/mastDownload/K2
removing /Users/tapritc2/.lksearch/cache/mastDownload/Kepler
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESSCut
removing /Users/tapritc2/.lksearch/cache/mastDownload/HLSP

Passing ``test=False`` will then fully delete the above directories

e.g. lkconfig.clearcache(test=False)

lksearch Configuration and Configuration File#

lksearch has a number of configuration parameters, these are contained in the ~lksearch.Conf class. One can modify these parameters for a given python session by updating the values in the Conf class. To modify these configuration parameters default values, lksearch also has an optional configuration file that is built on-top of ~astropy.config using ~astropy.config.ConfigNamespace. This file does not exist by default, but a default version can be created using the config.create_config_file helper function. Modifications to the values in this file will then update the default ~lksearch.Conf values.

[5]:
lkconfig.create_config_file(overwrite=True)

This file can be found in the below location. To edit this, please see the astropy.config documentation.

[6]:
lkconfig.get_config_dir()
[6]:
'/Users/tapritc2/.lksearch/config'
[7]:
lkconfig.get_config_file()
[7]:
'/Users/tapritc2/.lksearch/config/lksearch.cfg'

lksearch Cloud Configuration#

lksearch has three configuration parameters that are particularly relevant to cloud-based science platforms. These are: - CLOUD_ONLY: Only Download cloud based data. If False, will download all data. If True, will only download data located on a cloud (Amazon S3) bucket - PREFER_CLOUD: Prefer Cloud-based data product retrieval where available - DOWNLOAD_CLOUD: Download cloud based data. If False, download() will return a pointer to the cloud based datainstead of downloading it - intended usage for cloud-based science platforms (e.g. TiKE)

CLOUD_ONLY governs whether or not non-cloud based data will be possible to be downloaded. Many science files have both a cloud-based location (typically on Amazon S3) and a MAST archive location. By default this is False, and all products will be downloaded regardless of whether the file is available via cloud-hosting or MAST archive hosting. If CLOUD_ONLY is True, only files available for download on a cloud-based platform will be retrieved. This configuration parameter is passed through to the ~astroquery.mast parameter of the same name.

PREFER_CLOUD governs the default download behaviour in the event that a data product is available from both a cloud-based location and a MAST-hosted archive location. If True (default), then lksearch will preferentially download files from the cloud-host rather than the MAST-hosted Archive. This configuration parameter is passed through to the ~astroquery.mast parameter of the same name.

DOWNLOAD_CLOUD governs whether files that are hosted on the cloud are downloaded locally. If this value is True (default), cloud-hosted files are downloaded normally. If False, then files hosted on a cloud based platform are not downloaded, and a URI containing the path to the desired file on the cloud-host is returned instead of the local path to the file. This path can then be used to read the file remotely (see ~astropy.io.fits working with remote and cloud hosted files for more information). This ability may be most relevant when using lksearch on a cloud-based science platform where the remote read is very rapid and short-term local storage comparatively expensive.

Using this DOWNLOAD_CLOUD functionality, we can find a cloud-hosted file and read it directly into memory like so:”

[8]:
# First, lets update our configuration to not download a cloud-hosted file
from lksearch import Conf, TESSSearch

Conf.DOWNLOAD_CLOUD = False

# Now, lets find some data. We use this target earlier in the tutorial.
toi = TESSSearch("TOI 1161")

# What happens when we try to download it in our updated configuration?
cloud_result = toi.timeseries.mission_products[0].download()
cloud_result
Downloading products: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 584.16it/s]
[8]:
Local Path Status Message URL
0 s3://stpubdata/tess/public/tid/s0014/0000/0001... COMPLETE Link to S3 bucket for remote read None

As we can see above, instead of downloading the above file we have instead returned an amazon S3 URI for its cloud hosted location. If we want to access the file, we can do it using the remote-read capabilities of ~astropy.io.fits.

(Note: to do this you will need to install fsspec and s3fs.)

[9]:
import astropy.io.fits as fits

with fits.open(
    cloud_result["Local Path"].values[0], use_fsspec=True, fsspec_kwargs={"anon": True}
) as hdu:
    for item in hdu:
        print(item.fileinfo())
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 0, 'datLoc': 5760, 'datSpan': 0}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 5760, 'datLoc': 20160, 'datSpan': 1935360}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 1955520, 'datLoc': 1961280, 'datSpan': 2880}
[ ]: