• Top
  • New

helloericsf

Joined 5/9/2022, 8:24:04 PM has 608 karma

Posts

  • Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model

    by helloericsf on 6/16/2025, 5:28:51 PM with 0 comments
  • kit - Code Intelligence Toolkit

    by helloericsf on 5/8/2025, 11:16:19 PM with 0 comments
  • DeepSeek Open Source Optimized Parallelism Strategies, 3 repos

    by helloericsf on 2/27/2025, 2:01:41 AM with 8 comments
  • DeepSeek Open Source DeepGEMM – FP8 GEMM Library(300 lines for 1350+ FP8 TFLOPS)

    by helloericsf on 2/26/2025, 1:08:29 AM with 1 comments
  • Alibaba Open Source Large-Scale Video Generative Models: Wan2.1

    by helloericsf on 2/25/2025, 3:03:22 PM with 2 comments
  • DeepSeek open source DeepEP – library for MoE training and Inference

    by helloericsf on 2/25/2025, 2:27:29 AM with 71 comments
  • DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs

    by helloericsf on 2/24/2025, 1:37:24 AM with 108 comments
  • New Qwen2.5-Max Outperforms DeepSeek V3 in Benchmarks

    by helloericsf on 1/28/2025, 4:08:44 PM with 2 comments
  • Longest context up to 4M, MiniMax-01 hybrid 456B Open source model

    by helloericsf on 1/14/2025, 7:32:05 PM with 1 comments
  • DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

    by helloericsf on 12/26/2024, 11:47:29 AM with 9 comments
  • NeurIPS and Dr. Picard released statement for singling out Chinese scholars

    by helloericsf on 12/16/2024, 6:16:49 PM with 2 comments
  • Tencent Hunyuan-Large

    by helloericsf on 11/5/2024, 6:52:09 PM with 103 comments