If you were running a LLM locally on android through llama.cpp for use as a private personal assistant. What model would you use?

Thanks for any recommendations in advance.

  • Smee@poeng.link
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    2 days ago

    It very much depends on your phone hardware, RAM affects how big models can be and CPU affects how fast you’ll get the replies. I’ve successfully ran 4B models on my 8GB RAM phone, but since it’s the usual server and client setup which needs full internet access due to the lack of granular permissions on Android (Even AIO setups needs open ports to connect to itself) I prefer a proper home server. Which, with a cheap GFX card, is indescribably faster and more capable.

    • nagaram@startrek.website
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      I was honestly impressed with the speed and accuracy I was getting with Deepseek, llama, and Gemma on my 1660ti.

      $100 used and it was seconds to get responses.