It has recently come to light that the t5_max_length for Stable Diffusion 3 models (both Large and Medium) is not 256 as previously documented.
Including the line t5_max_length: 154 in the training configuration has shown significant improvements in LoRA training results, as demonstrated with both StableTuner and AI-Toolkit training scripts. While I have not yet tested this with kohya_ss there is no indication that the improvement would not apply to it as well.
Supporting References:
Reddit Post from terminusresearchorg:
"The eternal problem child, SD3.5, has some training parameter fixes that make it worth reattempting training for. The T5 text encoder, previously claimed by StabilityAI to use a sequence length of 256, is now understood to have actually used a sequence length of 154. Updating this results in more likeness being trained into the model with less degradation."
GitHub Pull Request by bghira (SimpleTuner creator):
This PR highlights that the "256 tokens is total, not just T5." See code change here.
Update the official documentation for SD3.5 models to reflect the correct t5_max_length: 154 value.
Please authenticate to join the conversation.
Awaiting Dev Review
π‘ Feature Request
About 1 year ago

doctor_diffusion
Get notified by email when there are changes.
Awaiting Dev Review
π‘ Feature Request
About 1 year ago

doctor_diffusion
Get notified by email when there are changes.