Monday, February 3, 2025

explain training progress print out



Below is an overview of what you’re seeing in each line of the training log and why the number “163” appears:

  1. Epoch n/20
    This indicates which epoch (out of 20 total) is currently being trained.

  2. 163/163

    • This means your dataset has been split into 163 mini-batches (steps) for each epoch.
    • In other words, every epoch processes 163 batches of data before finishing.
    • For instance, if you have a total of 16,300 samples in your training set and you set the batch size to 100, you would end up with 163 steps per epoch.
  3. Time/Step Info
    For example: 80s 415ms/step. The first number (80s) usually shows how long the entire epoch took. Then 415ms/step is roughly how long each training step took on average.

  4. acc: 0.7738 (Training Accuracy)
    The model’s accuracy on the training set at the end of the epoch.

  5. f1_score: 0.6252 (Training F1)
    The model’s F1 score on the training set at the end of the epoch, if you’ve added F1 as a metric.

  6. loss: 0.5913 (Training Loss)
    The training loss calculated over all batches in that epoch.

  7. val_acc, val_f1_score, val_loss (Validation Metrics)
    These show how the model performed on the validation set during that epoch:

    • val_acc: Validation accuracy.
    • val_f1_score: Validation F1 score.
    • val_loss: Loss on the validation set.

Putting it all together: each epoch processes 163 mini-batches (hence 163/163), and at the end, TensorFlow displays the final metrics on both the training set (acc, f1_score, loss) and the validation set (val_acc, val_f1_score, val_loss)

 



No comments:

Post a Comment