load.gcs
Usage
These are defined in the top-level load.gcs module.
- list_gcs_datasets
Lists all datasets in the bucket as a DataFrame. This works by checking which folders have a specific file, which we call the
anchor
.
- download
Downloads a file from Google Cloud Storage and returns the local file path.
- open_file
Downloads and opens a file from Google Cloud Storage. Returns a file handle.
- open_image
Downloads and returns the PIL image from Google Cloud Storage.
Pathing
The path to specify is relative to the bucket, which is frdc-ds
by default.
For example this filesystem on GCS:
To download bounds.json
, use download(r"chestnut_nature_park/20201218/90deg/bounds.json")
. By default, all files will be downloaded to PROJ_DIR/rsc/...
.
Configuration
If you need granular control over
where the files are downloaded
the credentials used
the project used
the bucket used
Then edit conf.py
.
- GCS_CREDENTIALS
Google Cloud credentials.
A
google.oauth2.service_account.Credentials
object. See the object documentation for more information.- LOCAL_DATASET_ROOT_DIR
Local directory to download files to.
Path to a directory, or a
Path
object.- GCS_PROJECT_ID
Google Cloud project ID.
- GCS_BUCKET_NAME
Google Cloud Storage bucket name.