Documentation 0.1.2 Help

load.gcs

Usage

These are defined in the top-level load.gcs module.

list_gcs_datasets

Lists all datasets in the bucket as a DataFrame. This works by checking which folders have a specific file, which we call the anchor.

download

Downloads a file from Google Cloud Storage and returns the local file path.

open_file

Downloads and opens a file from Google Cloud Storage. Returns a file handle.

open_image

Downloads and returns the PIL image from Google Cloud Storage.

Pathing

The path to specify is relative to the bucket, which is frdc-ds by default.

For example this filesystem on GCS:

# On Google Cloud Storage frdc-ds ├── chestnut_nature_park │ └── 20201218 │ └── 90deg │ └── bounds.json

To download bounds.json, use download(r"chestnut_nature_park/20201218/90deg/bounds.json"). By default, all files will be downloaded to PROJ_DIR/rsc/....

# On local filesystem PROJ_DIR ├── rsc │ └── chestnut_nature_park │ └── 20201218 │ └── 90deg │ └── bounds.json

Configuration

If you need granular control over

  • where the files are downloaded

  • the credentials used

  • the project used

  • the bucket used

Then edit conf.py.

GCS_CREDENTIALS

Google Cloud credentials.


A google.oauth2.service_account.Credentials object. See the object documentation for more information.

LOCAL_DATASET_ROOT_DIR

Local directory to download files to.


Path to a directory, or a Path object.

GCS_PROJECT_ID

Google Cloud project ID.


GCS_BUCKET_NAME

Google Cloud Storage bucket name.


Last modified: 26 June 2024