LightGBM Train
Train or cross-validate a LightGBM model against an immutable tabular dataset artifact. Upload your dataset once as an artifact, then submit training jobs with different hyperparameters.
Current public support: shared placement on CPU workers only. Submit via POST /api/v1/recipes/lightgbm_train/jobs or POST /api/v1/jobs with "recipe": "lightgbm_train".
Operations
| Operation | Description |
|---|---|
train | Train one model with a held-out validation split and publish the model artifact |
cross_validate | Run deterministic K-fold cross validation |
Payload Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
operation | string | no | train | train or cross_validate |
dataset_ref | string | yes | — | Artifact ID of a CSV, TSV, JSONL, or Parquet dataset |
target_column | string | yes | — | Name of the target column |
objective | string | no | auto | regression, binary, or multiclass |
metric | string | no | auto | rmse, mae, binary_logloss, auc, multi_logloss, multi_error |
seed | integer | no | 42 | Random seed |
n_estimators | integer | no | 200 | Number of boosting rounds |
num_leaves | integer | no | 31 | Max leaves per tree |
learning_rate | number | no | 0.05 | Learning rate |
feature_fraction | number | no | 1.0 | Feature sampling ratio |
bagging_fraction | number | no | 1.0 | Row sampling ratio |
test_size | number | no | 0.2 | Validation split fraction |
folds | integer | no | — | Number of CV folds (cross_validate only) |
weight_column | string | no | — | Sample weight column |
categorical_columns | array | no | — | Columns to treat as categorical |
drop_columns | array | no | — | Columns to exclude |
row_limit | integer | no | — | Max rows to use |
stratified | boolean | no | true | Stratified splits for classification |
Examples
Train a model
POST /api/v1/recipes/lightgbm_train/jobs
{
"payload": {
"dataset_ref": "art_dataset_123",
"target_column": "price",
"objective": "regression",
"metric": "rmse",
"n_estimators": 500,
"learning_rate": 0.03
},
"timeout_s": 600
}Cross-validate
POST /api/v1/recipes/lightgbm_train/jobs
{
"payload": {
"operation": "cross_validate",
"dataset_ref": "art_dataset_123",
"target_column": "label",
"objective": "binary",
"metric": "auc",
"folds": 5
},
"timeout_s": 900
}Workflow
-
Upload your dataset as an artifact:
curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/octet-stream" \ -H "X-Artifact-Filename: dataset.parquet" \ --data-binary @dataset.parquet \ "$BASE_URL/api/v1/artifacts" -
Submit a training job using the artifact ID as
dataset_ref -
The trained model is published as an artifact you can download
Result Fields
Train: success, objective, metric, train_rows, validation_rows, feature_count, best_iteration, validation_score, model_artifact_saved, elapsed_s
Cross-validate: success, objective, metric, folds, rows, feature_count, mean_score, std_score, elapsed_s
Artifact Outputs
| Artifact | When | Description |
|---|---|---|
model | train | LightGBM model file |
train_metrics | train | Training metrics JSON |
feature_importance | train | Feature importance JSON |
cross_validation | cross_validate | CV metrics JSON |