• Aatube@kbin.melroy.org
    link
    fedilink
    arrow-up
    2
    ·
    13 hours ago

    Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn’t need to retrain its knowledge nearly as much as training a model from scratch. It’s still important, but the training resources aren’t really directly comparable.

    • ArchRecord@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      13 hours ago

      True, but I’m of the belief that we’ll probably see a continuation of the existing trend of building and improving upon existing models, rather than always starting entirely from scratch. For instance, you’ll almost always see nearly any newly released model talk about the performance of their Llama version, because it just produces better results when you combine it with the existing quality of Llama.

      I think we’ll see a similar trend now, just with R1 variants instead of Llama variants being the primary new type used. It’s just fundamentally inefficient to start over from scratch every time, so it makes sense that newer iterations would be built directly on previous ones.