⌛Model Version and Clip Length
These options allow you to control which AI model interprets your prompt, and how long the generation clip will be.
Next up are the controls for specifying which models you want to use, and how long the clip you want to generate is. Let's look at the them in turn, starting with models:
Model Version

Udio currently has three models to pick from:
1.0 - The oldest and first model, first introduced when Udio debuted. It lacks some of the options of later models, but remains an option for those who wish to stick with the legacy generation style.
1.5 - This was the first major update to the model, and introduces options the older model doesn't have such as Clarity and Styles. There is much theorycraft involved in Udio model differences, but the general understanding is 1.5 is more "difficult" to prompt as it relies more on the user than the AI, but produces higher quality when it is "used correctly".
1.5 Allegro - This is essentially the same model as 1.5, but a quicker version which generates songs a lot faster. There appear to be some differences in output however, particularly with certain vocal styles, percussion, bass and certain genres. At current Allegro cannot utilise Styles.
Which model you use to generate (or combinations of models) remains a somewhat heavily debated topic in Udio craft! The most important thing in our opinion is you use a model you are comfortable using, which is giving you output you like. All questions beyond this are ultimately meaningless. The fact remains however, if you want access to later tools and options, you will have to use later models of Udio, as the original model will not incorporate them.
Clip Length

When generating a fresh clip, or remixing an existing clip, you can specify the length of the clip you generate. This is a simple choice between two options: 32 seconds or 130 seconds.
How much content you put into the lyrics box will vary based upon the clip size you're generating. a 130 second long clip will require far more lyrical input to have a "typical" song flow for example. Working with 130 second clips initially can save you some time (and be credit efficient) but it does require you to, if you're using manual mode and custom lyrics, understand more directly how a song might progress over time in order to "sound correct". It also gives up some of the control 32 second generations have section-section. For example, whilst you can specify when a 130 second clip starts lyrics and ends them, you cannot specify at all what the flow is like in the middle outside of precise metagging, which is a fairly large chunk of the clip compared to a 32 second one.
Last updated