Documentation 0.1.2 Help

Getting Started

Installing the Dev. Environment

  1. Ensure that you have the right version of Python. The required Python version can be seen in pyproject.toml

    [tool.poetry.dependencies] python = "..."
  2. Start by cloning our repository.

    git clone https://github.com/FR-DC/FRDC-ML.git
  3. Then, create a Python Virtual Env pyvenv

    python -m venv venv/
    python3 -m venv venv/
  4. Install Poetry Then check if it's installed with

    poetry --version
  5. Activate the virtual environment

    cd venv/Scripts activate cd ../..
    source venv/bin/activate
  6. Install the dependencies. You should be in the same directory as pyproject.toml

    poetry install --with dev
  7. Make a copy of the .env.example file and rename it to .env

  8. Fill in additional environment variables in the .env file

    LABEL_STUDIO_API_KEY=... LABEL_STUDIO_HOST=10.97.41.70 LABEL_STUDIO_PORT=8080 GCS_PROJECT_ID=frmodel GCS_BUCKET_NAME=frdc-ds
  9. Install Pre-Commit Hooks

    pre-commit install

Setting Up Google Cloud

  1. We use Google Cloud to store our datasets. To set up Google Cloud, install the Google Cloud CLI

  2. Then, authenticate your account.

    gcloud auth login
  3. Finally, set up Application Default Credentials (ADC).

    gcloud auth application-default login
  4. To make sure everything is working, run the tests.

Setting Up Label Studio

  1. We use Label Studio to annotate our datasets. We won't go through how to install Label Studio, for contributors, it should be up on localhost:8080.

  2. Then, retrieve your own API key from Label Studio. Go to your account page and copy the API key.


  3. Set your API key as an environment variable.

    In Windows, go to "Edit environment variables for your account" and add this as a new environment variable with name LABEL_STUDIO_API_KEY.

    Export it as an environment variable.

    export LABEL_STUDIO_API_KEY=...

    In all cases, you can create a .env file in the root of the project and add the following line: LABEL_STUDIO_API_KEY=...

Setting Up Weight and Biases

  1. We use W&B to track our experiments. To set up W&B, install the W&B CLI

  2. Then, authenticate your account.

    wandb login

Pre-commit Hooks

  • pre-commit install

Running the Tests

  • Run the tests to make sure everything is working

    pytest

Troubleshooting

ModuleNotFoundError

It's likely that your src and tests directories are not in PYTHONPATH. To fix this, run the following command:

export PYTHONPATH=$PYTHONPATH:./src:./tests

Or, set it in your IDE, for example, IntelliJ allows setting directories as Source Roots.

google.auth.exceptions.DefaultCredentialsError

It's likely that you haven't authenticated your Google Cloud account. See Setting Up Google Cloud

Couldn't connect to Label Studio

Label Studio must be running locally, exposed on localhost:8080. Furthermore, you need to specify the LABEL_STUDIO_API_KEY environment variable. See Setting Up Label Studio

Cannot login to W&B

You need to authenticate your W&B account. See Setting Up Weight and Biases If you're facing difficulties, set the WANDB_MODE environment variable to offline to disable W&B.

Our Repository Structure

Before starting development, take a look at our repository structure. This will help you understand where to put your code.

Core Dependencies
Resources
Tests
Repo Dependencies
Dataset Loaders
Preprocessing Fn.
Train Deps
Model Architectures
Datasets ...
FRDC
src/frdc/
rsc/
tests/
pyproject.toml,poetry.lock
./load/
./preprocess/
./train/
./models/
./dataset_name/
src/frdc/

Source Code for our package. These are the unit components of our pipeline.

rsc/

Resources. These are usually cached datasets

tests/

PyTest tests. These are unit, integration, and model tests.

Unit, Integration, and Pipeline Tests

We have 3 types of tests:

  • Unit Tests are usually small, single function tests.

  • Integration Tests are larger tests that tests a mock pipeline.

  • Model Tests are the true production pipeline tests that will generate a model.

Where Should I contribute?

Changing a small component

If you're changing a small component, such as a argument for preprocessing, a new model architecture, or a new configuration for a dataset, take a look at the src/frdc/ directory.

Adding a test

By adding a new component, you'll need to add a new test. Take a look at the tests/ directory.

Changing the model pipeline

If you're a ML Researcher, you'll probably be changing the pipeline. Take a look at the tests/model_tests/ directory.

Adding a dependency

If you're adding a new dependency, use poetry add PACKAGE and commit the changes to pyproject.toml and poetry.lock.

Last modified: 26 June 2024