• Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 days ago

    I’m mostly talking about being able to train on copyrighted content. This is on me though, I got mixed up. That’s what I meant in my first comment.

    If you think someone can train a model on legally obtained data (Google images, YouTube, internet archive), then that is fair.

    Personally, I think using pirated or at least bought content that is ripped (Netflix, DVDs) should be exempt (for everyone obviously, not just OpenAI.) Some data is already behind huge mega corps like record labels, Hollywood, publishing houses, etc. OpenAI can afford the cost but the little guys will be screwed when it comes to SOTA.

    It’s also worth noting that most current lawsuits are aimed at how the data is used and not how it’s sourced if I’m not mistaken. The laws coming from these lawsuits won’t be used to bolster anti-piracy laws but copyright laws instead, targeting fair use and transformative clauses imo.