Getting Started

Installing the Dev. Environment

Ensure that you have the right version of Python. The required Python version can be seen in pyproject.toml
[tool.poetry.dependencies] python = "..."
Start by cloning our repository.
git clone https://github.com/FR-DC/FRDC-ML.git
Then, create a Python Virtual Env pyvenv
python -m venv venv/
python3 -m venv venv/
Install Poetry Then check if it's installed with
poetry --version
If poetry is not found, it's likely not in the user PATH.
Activate the virtual environment
cd venv/Scripts activate cd ../..
source venv/bin/activate
Install the dependencies. You should be in the same directory as pyproject.toml
poetry install --with dev
Make a copy of the .env.example file and rename it to .env
Fill in additional environment variables in the .env file
LABEL_STUDIO_API_KEY=... LABEL_STUDIO_HOST=10.97.41.70 LABEL_STUDIO_PORT=8080 GCS_PROJECT_ID=frmodel GCS_BUCKET_NAME=frdc-ds
Install Pre-Commit Hooks
pre-commit install

Setting Up Google Cloud

We use Google Cloud to store our datasets. To set up Google Cloud, install the Google Cloud CLI
Then, authenticate your account.
gcloud auth login
Finally, set up Application Default Credentials (ADC).
gcloud auth application-default login
To make sure everything is working, run the tests.

Setting Up Label Studio

We use Label Studio to annotate our datasets. We won't go through how to install Label Studio, for contributors, it should be up on localhost:8080.
Then, retrieve your own API key from Label Studio. Go to your account page and copy the API key.
Set your API key as an environment variable.
In Windows, go to "Edit environment variables for your account" and add this as a new environment variable with name LABEL_STUDIO_API_KEY.
Export it as an environment variable.
export LABEL_STUDIO_API_KEY=...
In all cases, you can create a .env file in the root of the project and add the following line: LABEL_STUDIO_API_KEY=...

Setting Up Weight and Biases

We use W&B to track our experiments. To set up W&B, install the W&B CLI
Then, authenticate your account.
wandb login

Pre-commit Hooks

pre-commit install

Running the Tests

Run the tests to make sure everything is working
pytest

Troubleshooting

ModuleNotFoundError

It's likely that your src and tests directories are not in PYTHONPATH. To fix this, run the following command:

export PYTHONPATH=$PYTHONPATH:./src:./tests

Or, set it in your IDE, for example, IntelliJ allows setting directories as Source Roots.

google.auth.exceptions.DefaultCredentialsError

It's likely that you haven't authenticated your Google Cloud account. See Setting Up Google Cloud

Couldn't connect to Label Studio

Label Studio must be running locally, exposed on localhost:8080. Furthermore, you need to specify the LABEL_STUDIO_API_KEY environment variable. See Setting Up Label Studio

You need to authenticate your W&B account. See Setting Up Weight and Biases If you're facing difficulties, set the WANDB_MODE environment variable to offline to disable W&B.

Our Repository Structure

Before starting development, take a look at our repository structure. This will help you understand where to put your code.

src/frdc/: Source Code for our package. These are the unit components of our pipeline.
rsc/: Resources. These are usually cached datasets
tests/: PyTest tests. These are unit, integration, and model tests.

Unit, Integration, and Pipeline Tests

We have 3 types of tests:

Unit Tests are usually small, single function tests.
Integration Tests are larger tests that tests a mock pipeline.
Model Tests are the true production pipeline tests that will generate a model.

Where Should I contribute?

Changing a small component: If you're changing a small component, such as a argument for preprocessing, a new model architecture, or a new configuration for a dataset, take a look at the src/frdc/ directory.
Adding a test: By adding a new component, you'll need to add a new test. Take a look at the tests/ directory.
Changing the model pipeline: If you're a ML Researcher, you'll probably be changing the pipeline. Take a look at the tests/model_tests/ directory.
Adding a dependency: If you're adding a new dependency, use poetry add PACKAGE and commit the changes to pyproject.toml and poetry.lock.
E.g. Adding numpy is the same as poetry add numpy

Last modified: 26 June 2024