• by declaredapple on 3/11/2024, 5:34:54 PM

    Take # parameters and divide by 2 and then add like 1 to estimate it for 4bit quantization. A 13B model will need ~7GB of memory at 4bit.

    Most quantization models start falling apart rapidly after 4bit, but you can go lower...

    If you can fit it in GPU memory it'll likely be usable, if you can't fit it in memory be prepared for single digit t/s or less, the only exception is for people with M1/M2/M3 cpus.

  • by PaulHoule on 3/11/2024, 4:59:51 PM

    Use sbert.net if you believe Steven Covey and your goal is for your model to “first understand”.