• dogslayeggs@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 days ago

    Sure. My company has a database of all technical papers written by employees in the last 30-ish years. Nearly all of these contain proprietary information from other companies (we deal with tons of other companies and have access to their data), so we can’t build a public LLM nor use a public LLM. So we created an internal-only LLM that is only trained on our data.

    • Fmstrat@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      1 day ago

      I’d bet my lunch this internal LLM is a trained open weight model, which has lots of public data in it. Not complaining about what your company has done, as I think that makes sense, just providing a counterpoint.

    • utopiah@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      1 day ago

      You are solely using your own data or rather you are refining an existing LLM or rather RAG?

      I’m not an expert but AFAIK training an LLM requires, by definition, a vast mount of text so I’m skeptical that ANY company publish enough papers to do so. I understand if you can’t share more about the process. Maybe me saying “AI” was too broad.