Is anyone actually surprised by this?

  • ayaya@lemdro.id
    link
    fedilink
    English
    arrow-up
    15
    ·
    1 day ago

    This is mildly pedantic but you’re not actually running Deepseek R1, you’re running a 7B version of Qwen that’s been fine-tuned on Deepseek R1 outputs. All of the “distilled” models are existing models trained on R1.