Correct t5 max lenght for Stable Diffsuion 3.5 LoRA training

It has recently come to light that the t5_max_length for Stable Diffusion 3 models (both Large and Medium) is not 256 as previously documented.


Including the line t5_max_length: 154 in the training configuration has shown significant improvements in LoRA training results, as demonstrated with both StableTuner and AI-Toolkit training scripts. While I have not yet tested this with kohya_ss there is no indication that the improvement would not apply to it as well.

Supporting References:

  1. Reddit Post from terminusresearchorg:
    "The eternal problem child, SD3.5, has some training parameter fixes that make it worth reattempting training for. The T5 text encoder, previously claimed by StabilityAI to use a sequence length of 256, is now understood to have actually used a sequence length of 154. Updating this results in more likeness being trained into the model with less degradation."

  2. GitHub Pull Request by bghira (SimpleTuner creator):
    This PR highlights that the "256 tokens is total, not just T5." See code change here.

Request:

  • Update the official documentation for SD3.5 models to reflect the correct t5_max_length: 154 value.

Please authenticate to join the conversation.

Upvoters
Status

Awaiting Dev Review

Board

πŸ’‘ Feature Request

Date

About 1 year ago

Author

doctor_diffusion

Subscribe to post

Get notified by email when there are changes.