lksearch Configuration and working with Cloud Data#
lksearch has a config
class and configuration file that can be used to configure the default behaviour of the package including how lksearch treats cloud data and where (or if) local files are cached.
lksearch File Download and Cache#
The lksearch
file cache is a directory where files are downloaded to. This directory also serves as a cache directory, and if a file matching the name of the file to be downloaded exists we treat this as a cached file and by default do not overwrite the current file on disk.
The default file download and cache directory is located at: ~/.lksearch/cache
This can be verified using the get_cache_dir convenience function in the config sub-module, e.g.:
[1]:
from lksearch import config as lkconfig
lkconfig.get_cache_dir()
[1]:
'/Users/tapritc2/.lksearch/cache'
Clearing the Cache & Corrupted Files#
If you wish to delete an individual file that you downloaded (for example, if you are concerned that a previously downloaded file is corrupted), the easiest way to do that is using the Local Path
information in the manifest returned by the .download()
function.
[2]:
import os
from lksearch import K2Search
##First lets download a few files
manifest = K2Search("K2-18").HLSPs.timeseries.download()
manifest
Downloading products: 100%|████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.94it/s]
[2]:
Local Path | Status | Message | URL | |
---|---|---|---|---|
0 | /Users/tapritc2/.lksearch/cache/mastDownload/H... | COMPLETE | None | None |
1 | /Users/tapritc2/.lksearch/cache/mastDownload/H... | COMPLETE | None | None |
2 | /Users/tapritc2/.lksearch/cache/mastDownload/H... | COMPLETE | None | None |
[3]:
# The manifest returned by download() is a pandas DataFrame
# We will access the first local path using iloc as so
os.remove(manifest.iloc[0]["Local Path"])
If you want to clear everything from your cache, you can use the config.clearcache()
function to completely empty your cache of downloaded files. by default this will run in “test” mode and print what you will be deleting. To confirm deletion, run with test=False
optional parameter.
[4]:
lkconfig.clearcache()
Running in test mode, rerun with test=False to clear cache
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESS
removing /Users/tapritc2/.lksearch/cache/mastDownload/K2
removing /Users/tapritc2/.lksearch/cache/mastDownload/Kepler
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESSCut
removing /Users/tapritc2/.lksearch/cache/mastDownload/HLSP
Passing ``test=False`` will then fully delete the above directories
e.g. lkconfig.clearcache(test=False)
lksearch Configuration and Configuration File#
lksearch has a number of configuration parameters, these are contained in the ~lksearch.Conf
class. One can modify these parameters for a given python session by updating the values in the Conf class. To modify these configuration parameters default values, lksearch also has an optional configuration file that is built on-top of ~astropy.config
using ~astropy.config.ConfigNamespace
. This file does not exist by
default, but a default version can be created using the config.create_config_file
helper function. Modifications to the values in this file will then update the default ~lksearch.Conf
values.
[5]:
lkconfig.create_config_file(overwrite=True)
This file can be found in the below location. To edit this, please see the astropy.config documentation.
[6]:
lkconfig.get_config_dir()
[6]:
'/Users/tapritc2/.lksearch/config'
[7]:
lkconfig.get_config_file()
[7]:
'/Users/tapritc2/.lksearch/config/lksearch.cfg'
lksearch Cloud Configuration#
lksearch
has three configuration parameters that are particularly relevant to cloud-based science platforms. These are: - CLOUD_ONLY
: Only Download cloud based data. If False
, will download all data. If True
, will only download data located on a cloud (Amazon S3) bucket - PREFER_CLOUD
: Prefer Cloud-based data product retrieval where available - DOWNLOAD_CLOUD
: Download cloud based data. If False
, download() will return a pointer to the cloud based datainstead of
downloading it - intended usage for cloud-based science platforms (e.g. TiKE)
CLOUD_ONLY
governs whether or not non-cloud based data will be possible to be downloaded. Many science files have both a cloud-based location (typically on Amazon S3) and a MAST archive location. By default this is False
, and all products will be downloaded regardless of whether the file is available via cloud-hosting or MAST archive hosting. If CLOUD_ONLY
is True
, only files available for download on a cloud-based platform will be retrieved. This configuration parameter is
passed through to the ~astroquery.mast
parameter of the same name.
PREFER_CLOUD
governs the default download behaviour in the event that a data product is available from both a cloud-based location and a MAST-hosted archive location. If True
(default), then lksearch
will preferentially download files from the cloud-host rather than the MAST-hosted Archive. This configuration parameter is passed through to the ~astroquery.mast
parameter of the same name.
DOWNLOAD_CLOUD
governs whether files that are hosted on the cloud are downloaded locally. If this value is True
(default), cloud-hosted files are downloaded normally. If False
, then files hosted on a cloud based platform are not downloaded, and a URI containing the path to the desired file on the cloud-host is returned instead of the local path to the file. This path can then be used to read the file remotely (see ~astropy.io.fits
working with remote and cloud hosted
files for more information). This ability may be most relevant when using lksearch
on a cloud-based science platform where the remote read is very rapid and short-term local storage comparatively expensive.
Using this DOWNLOAD_CLOUD
functionality, we can find a cloud-hosted file and read it directly into memory like so:”
[8]:
# First, lets update our configuration to not download a cloud-hosted file
from lksearch import Conf, TESSSearch
Conf.DOWNLOAD_CLOUD = False
# Now, lets find some data. We use this target earlier in the tutorial.
toi = TESSSearch("TOI 1161")
# What happens when we try to download it in our updated configuration?
cloud_result = toi.timeseries.mission_products[0].download()
cloud_result
Downloading products: 100%|███████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 584.16it/s]
[8]:
Local Path | Status | Message | URL | |
---|---|---|---|---|
0 | s3://stpubdata/tess/public/tid/s0014/0000/0001... | COMPLETE | Link to S3 bucket for remote read | None |
As we can see above, instead of downloading the above file we have instead returned an amazon S3 URI for its cloud hosted location. If we want to access the file, we can do it using the remote-read capabilities of ~astropy.io.fits
.
(Note: to do this you will need to install fsspec
and s3fs
.)
[9]:
import astropy.io.fits as fits
with fits.open(
cloud_result["Local Path"].values[0], use_fsspec=True, fsspec_kwargs={"anon": True}
) as hdu:
for item in hdu:
print(item.fileinfo())
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 0, 'datLoc': 5760, 'datSpan': 0}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 5760, 'datLoc': 20160, 'datSpan': 1935360}
{'file': <astropy.io.fits.file._File <File-like object S3FileSystem, stpubdata/tess/public/tid/s0014/0000/0001/5832/4245/tess2019198215352-s0014-0000000158324245-0150-s_lc.fits>>, 'filemode': 'readonly', 'hdrLoc': 1955520, 'datLoc': 1961280, 'datSpan': 2880}
[ ]: