by msoad on 12/26/2024, 12:58:26 PM
by rahimnathwani on 12/26/2024, 8:34:17 PM
Pricing per million tokens:
Model Input Output
─────────────────────────────────────
Claude 3.5 Sonnet $3.00 $15.00
GPT-4o $2.50 $10.00
Gemini 1.5 Pro $1.25 $5.00
Deepseek V3 $0.27 $1.10
GPT-4o-mini $0.15 $0.60
by zardinality on 12/26/2024, 3:17:58 PM
In the introduction of the paper it says: "Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks." They have indeed a very strong infra team.
by futureshock on 12/26/2024, 1:12:44 PM
Someone pointed out on Reddit that DeekSeek v3 is 53x cheaper to inference than Claude Sonnet which it trades blows with in the benchmarks. As we saw with o3, compute cost to hit a certain benchmark score will become an important number now that we are in the era that you can throw an arbitrary amount of test time compute to hit an arbitrary benchmark number.
https://old.reddit.com/r/LocalLLaMA/comments/1hmm8v9/psa_dee...
by handzhiev on 12/26/2024, 10:38:52 PM
How is this not on the front page. It's a remarkable release.
by wenyuanyu on 12/26/2024, 3:38:18 PM
Truly remarkable! Their approach to distributed inference is on an entirely new level. For the prefill stage, they utilized a deployment unit comprising 32 H800 GPUs, while the decoding stage scaled up to 320!! H800 GPUs per unit. Incorporates a multitude of sophisticated parallelization and communication overlap techniques, setting a standard that’s rarely seen in other setups.
[0] https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSee...
by WiSaGaN on 12/26/2024, 2:32:04 PM
It still fails my private physics testing question half the time, where claude 3.5 sonnet and openai o1 (both web version) most of the time passes. So I'd say close to SOTA but not quite. However given deekseek already has the r1 lite preview, and they can achieve comparable performance for much less compute (assuming the API cost of close models roughly represent the inference cost), then it's not unreasonable to believe deepseek may be close to release very good test compute scaling model that is similar to o3 high effort.
by sergiotapia on 12/27/2024, 4:42:40 AM
I'm using their API - the model is referenced by `deepseek-chat` and works really well. Seeing some more intelligent responses to my users inputs. Better adherence to the "spirit" of what I was trying to accomplish with the prompt. This is so exciting!
Take note of their suggested temperatures! https://api-docs.deepseek.com/quick_start/parameter_settings
by williamstein on 12/26/2024, 7:14:25 PM
What is the DeepSeek team? Who is making this?
by gck1 on 12/28/2024, 12:07:10 PM
Since most of my usage of LLMs is through cline or APIs now, specifically for coding assistance, and I’m not comfortable trusting my codebase or potentially leaked secrets, to a company operating under CCP supervision, I’ll stick to waiting until this forces Claude to lower their pricing on 3.5 Sonnet instead.
by rubslopes on 12/26/2024, 5:44:28 PM
Already available at OpenRouter: https://openrouter.ai/deepseek/deepseek-chat
Cost / million tokens: Input $0.14 Output $0.28
by janice1999 on 12/26/2024, 4:39:25 PM
> a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
What kind of hardware do you need to run this?
by deyiao on 12/26/2024, 12:36:40 PM
The benchmark results seem unrealistically good, but I'm not sure from which angles I should challenge them.
by bobosha on 12/26/2024, 12:58:37 PM
The results look quite promising.i will give this a try...
by orena on 12/27/2024, 12:39:23 PM
So no bitter lesson?
You can run a model that can beat 4o which was released less than 6 months ago _locally_! I know this requires a ton of hardware but OpenAI will not be the leader in 2025 I can assume. Always bet on open source (or rather somewhat more open development strategies)
The math and coding performance is what we really care about. I am paying for o1 Pro and also Sonnet, in my experience beside Sonnet being faster, it is also better at many tasks. In a few instances I got answers from o1 Pro but it's not justifying the price so I am cancelling and going back to $20/mo.
I am currently paying for Cursor, Claude, ChatGPT and v0! The productivity I am gaining from those tools are totally worth it (except for o1 Pro). But I am really hoping at some point those tools converge so I can pay less. For instance I am looking forward to VSCode Copilot improvements so I can go back to VSCode and once Claude has no limits I rather pay for one AI system.