Joined 5/9/2022, 8:24:04 PM has 608 karma
Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model
kit - Code Intelligence Toolkit
DeepSeek Open Source Optimized Parallelism Strategies, 3 repos
DeepSeek Open Source DeepGEMM – FP8 GEMM Library(300 lines for 1350+ FP8 TFLOPS)
Alibaba Open Source Large-Scale Video Generative Models: Wan2.1
DeepSeek open source DeepEP – library for MoE training and Inference
DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs
New Qwen2.5-Max Outperforms DeepSeek V3 in Benchmarks
Longest context up to 4M, MiniMax-01 hybrid 456B Open source model
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
NeurIPS and Dr. Picard released statement for singling out Chinese scholars
Tencent Hunyuan-Large