Getting Started
Installing the Dev. Environment
Ensure that you have the right version of Python. The required Python version can be seen in
pyproject.toml
[tool.poetry.dependencies] python = "..."Start by cloning our repository.
git clone https://github.com/FR-DC/FRDC-ML.gitThen, create a Python Virtual Env
pyvenv
python -m venv venv/python3 -m venv venv/Install Poetry Then check if it's installed with
poetry --versionActivate the virtual environment
cd venv/Scripts activate cd ../..source venv/bin/activateInstall the dependencies. You should be in the same directory as
pyproject.toml
poetry install --with devMake a copy of the
.env.example
file and rename it to.env
Fill in additional environment variables in the
.env
fileLABEL_STUDIO_API_KEY=... LABEL_STUDIO_HOST=10.97.41.70 LABEL_STUDIO_PORT=8080 GCS_PROJECT_ID=frmodel GCS_BUCKET_NAME=frdc-dsInstall Pre-Commit Hooks
pre-commit install
Setting Up Google Cloud
We use Google Cloud to store our datasets. To set up Google Cloud, install the Google Cloud CLI
Then, authenticate your account.
gcloud auth loginFinally, set up Application Default Credentials (ADC).
gcloud auth application-default loginTo make sure everything is working, run the tests.
Setting Up Label Studio
We use Label Studio to annotate our datasets. We won't go through how to install Label Studio, for contributors, it should be up on
localhost:8080
.Then, retrieve your own API key from Label Studio. Go to your account page and copy the API key.
Set your API key as an environment variable.
In Windows, go to "Edit environment variables for your account" and add this as a new environment variable with name
LABEL_STUDIO_API_KEY
.Export it as an environment variable.
export LABEL_STUDIO_API_KEY=...In all cases, you can create a
.env
file in the root of the project and add the following line:LABEL_STUDIO_API_KEY=...
Setting Up Weight and Biases
We use W&B to track our experiments. To set up W&B, install the W&B CLI
Then, authenticate your account.
wandb login
Pre-commit Hooks
- pre-commit install
Running the Tests
Run the tests to make sure everything is working
pytest
Troubleshooting
ModuleNotFoundError
It's likely that your src
and tests
directories are not in PYTHONPATH
. To fix this, run the following command:
Or, set it in your IDE, for example, IntelliJ allows setting directories as Source Roots.
google.auth.exceptions.DefaultCredentialsError
It's likely that you haven't authenticated your Google Cloud account. See Setting Up Google Cloud
Couldn't connect to Label Studio
Label Studio must be running locally, exposed on localhost:8080
. Furthermore, you need to specify the LABEL_STUDIO_API_KEY
environment variable. See Setting Up Label Studio
Cannot login to W&B
You need to authenticate your W&B account. See Setting Up Weight and Biases If you're facing difficulties, set the WANDB_MODE
environment variable to offline
to disable W&B.
Our Repository Structure
Before starting development, take a look at our repository structure. This will help you understand where to put your code.
- src/frdc/
Source Code for our package. These are the unit components of our pipeline.
- rsc/
Resources. These are usually cached datasets
- tests/
PyTest tests. These are unit, integration, and model tests.
Unit, Integration, and Pipeline Tests
We have 3 types of tests:
Unit Tests are usually small, single function tests.
Integration Tests are larger tests that tests a mock pipeline.
Model Tests are the true production pipeline tests that will generate a model.
Where Should I contribute?
- Changing a small component
If you're changing a small component, such as a argument for preprocessing, a new model architecture, or a new configuration for a dataset, take a look at the
src/frdc/
directory.- Adding a test
By adding a new component, you'll need to add a new test. Take a look at the
tests/
directory.- Changing the model pipeline
If you're a ML Researcher, you'll probably be changing the pipeline. Take a look at the
tests/model_tests/
directory.- Adding a dependency
If you're adding a new dependency, use
poetry add PACKAGE
and commit the changes topyproject.toml
andpoetry.lock
.