RL training

Chain training and evaluation stages using job dependencies.

Two-stage pipeline


# Stage 1: Train
JOB1=$(curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://computalot.com/api/v1/jobs \
  -d '{
    "type": "structured_runner",
    "runner_command": ["python", "train.py"],
    "payload": {"epochs": 50, "lr": 0.001},
    "project": "my-rl-project",
    "timeout_s": 7200,
    "requirements": {"profile": "gpu"}
  }' | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
 
# Stage 2: Evaluate (waits for stage 1)
curl -sS -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://computalot.com/api/v1/jobs \
  -d "{
    \"type\": \"structured_runner\",
    \"runner_command\": [\"python\", \"evaluate.py\"],
    \"payload\": {\"model_path\": \"model.pt\"},
    \"depends_on\": [\"$JOB1\"],
    \"project\": \"my-rl-project\",
    \"timeout_s\": 600
  }"

If stage 1 fails, stage 2 auto-cancels.