Skip to Content
Recipe CatalogLightGBM Train

LightGBM Train

Train or cross-validate a LightGBM model against an immutable tabular dataset artifact. Upload your dataset once as an artifact, then submit training jobs with different hyperparameters.

Current public support: shared placement on CPU workers only. Submit via POST /api/v1/recipes/lightgbm_train/jobs or POST /api/v1/jobs with "recipe": "lightgbm_train".

Operations

OperationDescription
trainTrain one model with a held-out validation split and publish the model artifact
cross_validateRun deterministic K-fold cross validation

Payload Fields

FieldTypeRequiredDefaultDescription
operationstringnotraintrain or cross_validate
dataset_refstringyesArtifact ID of a CSV, TSV, JSONL, or Parquet dataset
target_columnstringyesName of the target column
objectivestringnoautoregression, binary, or multiclass
metricstringnoautormse, mae, binary_logloss, auc, multi_logloss, multi_error
seedintegerno42Random seed
n_estimatorsintegerno200Number of boosting rounds
num_leavesintegerno31Max leaves per tree
learning_ratenumberno0.05Learning rate
feature_fractionnumberno1.0Feature sampling ratio
bagging_fractionnumberno1.0Row sampling ratio
test_sizenumberno0.2Validation split fraction
foldsintegernoNumber of CV folds (cross_validate only)
weight_columnstringnoSample weight column
categorical_columnsarraynoColumns to treat as categorical
drop_columnsarraynoColumns to exclude
row_limitintegernoMax rows to use
stratifiedbooleannotrueStratified splits for classification

Examples

Train a model

POST /api/v1/recipes/lightgbm_train/jobs { "payload": { "dataset_ref": "art_dataset_123", "target_column": "price", "objective": "regression", "metric": "rmse", "n_estimators": 500, "learning_rate": 0.03 }, "timeout_s": 600 }

Cross-validate

POST /api/v1/recipes/lightgbm_train/jobs { "payload": { "operation": "cross_validate", "dataset_ref": "art_dataset_123", "target_column": "label", "objective": "binary", "metric": "auc", "folds": 5 }, "timeout_s": 900 }

Workflow

  1. Upload your dataset as an artifact:

    curl -sS -X POST -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/octet-stream" \ -H "X-Artifact-Filename: dataset.parquet" \ --data-binary @dataset.parquet \ "$BASE_URL/api/v1/artifacts"
  2. Submit a training job using the artifact ID as dataset_ref

  3. The trained model is published as an artifact you can download

Result Fields

Train: success, objective, metric, train_rows, validation_rows, feature_count, best_iteration, validation_score, model_artifact_saved, elapsed_s

Cross-validate: success, objective, metric, folds, rows, feature_count, mean_score, std_score, elapsed_s

Artifact Outputs

ArtifactWhenDescription
modeltrainLightGBM model file
train_metricstrainTraining metrics JSON
feature_importancetrainFeature importance JSON
cross_validationcross_validateCV metrics JSON
Last updated on