• by johnfn on 6/5/2025, 5:02:18 PM

    Impressive seeing Google notch up another ~25 ELO on lmarena, on top of the previous #1, which was also Gemini!

    That being said, I'm starting to doubt the leaderboards as an accurate representation of model ability. While I do think Gemini is a good model, having used both Gemini and Claude Opus 4 extensively in the last couple of weeks I think Opus is in another league entirely. I've been dealing with a number of gnarly TypeScript issues, and after a bit Gemini would spin in circles or actually (I've never seen this before!) give up and say it can't do it. Opus solved the same problems with no sweat. I know that that's a fairly isolated anecdote and not necessarily fully indicative of overall performance, but my experience with Gemini is that it would really want to kludge on code in order to make things work, where I found Opus would tend to find cleaner approaches to the problem. Additionally, Opus just seemed to have a greater imagination? Or perhaps it has been tailored to work better in agentic scenarios? I saw it do things like dump the DOM and inspect it for issues after a particular interaction by writing a one-off playwright script, which I found particularly remarkable. My experience with Gemini is that it tries to solve bugs by reading the code really really hard, which is naturally more limited.

    Again, I think Gemini is a great model, I'm very impressed with what Google has put out, and until 4.0 came out I would have said it was the best.

  • by chollida1 on 6/5/2025, 6:08:26 PM

    I'd start to worry about OpenAI, from a valuation standpoint. The company has some serious competition now and is arguably no longer the leader.

    its going to be interesting to see how easily they can raise more money. Their valuation is already in the $300B range. How much larger can it get given their relatively paltry revenue at the moment and increasingly rising costs for hardware and electricity.

    If the next generation of llms needs new data sources, then Facebook and Google seem well positioned there, OpenAI on the other hand seems like its going to lose such race for proprietary data sets as unlike those other two, they don't have another business that generates such data.

    When they were the leader in both research and in user facing applications they certainly deserved their lofty valuation.

    What is new money coming into OpenAI getting now?

    At even a $300B valuation a typical wall street analysts would want to value them at 2x sales which would mean they'd expect OpenAI to have $600B in annual sales to account for this valuation when they go public.

    Or at an extremely lofty P/E ratio of say 100 that would be $3B in annual earnings, that analysts would have to expect you to double each year for the next 10ish years looking out, ala AMZN in the 2000s, to justify this valuation.

    They seem to have boxed themselves into a corner where it will be painful to go public, assuming they can ever figure out the nonprofit/profit issue their company has.

    Congrats to Google here, they have done great work and look like they'll be one of the biggest winners of the AI race.

  • by vthallam on 6/5/2025, 4:57:19 PM

    As if 3 different preview versions of the same model is not confusing enough, the last two dates are 05-06 and 06-05. They could have held off for a day:)

  • by wiradikusuma on 6/5/2025, 5:59:04 PM

    I have two issues with Gemini that I don't experience with Claude: 1. It RENAMES VARIABLE NAMES even in places where I don't tell it to change (I pass them just as context). and 2. Sometimes it's missing closing square brackets.

    Sure I'm a lazy bum, I call the variable "json" instead of "jsonStringForX", but it's contextual (within a closure or function), and I appreciate the feedback, but it makes reviewing the changes difficult (too much noise).

  • by hu3 on 6/5/2025, 5:11:03 PM

    I pay for both ChatGPT Plus and Gemini Pro.

    I'm thinking of cancelling my ChatGPT subscription because I keep hitting rate limits.

    Meanwhile I have yet to hit any rate limit with Gemini/AI Studio.

  • by abraxas on 6/5/2025, 5:22:24 PM

    I found all the previous Gemini models somewhat inferior even compared to Claude 3.7 Sonnet (and much worse than 4) as my coding assistants. I'm keeping an open mind but also not rushing to try this one until some evaluations roll in. I'm actually baffled that the internet at large seems to be very pumped about Gemini but it's not reflective of my personal experience. Not to be that tinfoil hat guy but I smell at least a bit of astroturf activity around Gemini.

  • by unpwn on 6/5/2025, 4:55:06 PM

    I feel like instead of constantly releasing these preview versions with different dates attached they should just add a patch version and bump that.

  • by Workaccount2 on 6/5/2025, 5:08:58 PM

    Apparently 06-05 bridges the gap that people were feeling between the 03-25 and 05-06 release[1]

    [1]https://nitter.net/OfficialLoganK/status/1930657743251349854...

  • by jcuenod on 6/5/2025, 4:56:56 PM

    82.2 on Aider

    Still actually falling behind the official scores for o3 high. https://aider.chat/docs/leaderboards/

  • by zone411 on 6/5/2025, 10:09:07 PM

    Omproves on the Extended NYT Connections benchmark compared to both Gemini 2.5 Pro Exp (03-25) and Gemini 2.5 Pro Preview (05-06), scoring 58.7. The decline observed between 03-25 and 05-06 has been reversed - https://github.com/lechmazur/nyt-connections/.

  • by unsupp0rted on 6/5/2025, 4:57:36 PM

    Curious to see how this compares to Claude 4 Sonnet in code.

    This table seems to indicate it's markedly worse?

    https://blog.google/products/gemini/gemini-2-5-pro-latest-pr...

  • by Alifatisk on 6/5/2025, 7:27:30 PM

    Finally Google is advertising their ai studio, it's a shame they didn't push that beautiful app before.

  • by pu_pe on 6/5/2025, 5:15:56 PM

    I just checked and it looks like the limits for Jules has been bumped from 5 free daily tasks to 60. Not sure it uses the latest model, but I would assume it does

  • by jbellis on 6/5/2025, 4:53:56 PM

    Did it get upgraded in-place again or do you need to opt in to the new model?

  • by pelorat on 6/5/2025, 5:08:11 PM

    Why not call it Gemini 2.6?

  • by op00to on 6/5/2025, 4:55:49 PM

    I found Gemini 2.5 Pro highly useful for text summaries, and even reasoning in long conversations... UP TO the last 2 weeks or month. Recently, it seems to totally forget what I'm talking about after 4-5 messages of a paragraph of text each. We're not talking huge amounts of context, but conversational braindeadness. Between ChatGPT's sycophancy, Gemini's forgetfulness and poor attention, I'm just sticking with whatever local model du jour fits my needs and whatever crap my company is paying for today. It's super annoying, hopefully Gemini gets its memory back!

  • by carbocation on 6/5/2025, 6:07:53 PM

    Is it possible to know which model version their chat app ( https://gemini.google.com/app ) is using?

  • by lxe on 6/5/2025, 6:09:48 PM

    Gemini is a good and fast model, but I think the style of code it writes is... amateur / inexperienced. It doesn't make a lot of mistakes typical of an LLM, but rather chooses approaches that are typical of someone who just learned programming. I have to always nudge it to avoid verbosity, keep structure less repetitive, optimize async code, etc. With claude, I rarely have this problem -- it feels more like working with a more experienced developer.

  • by fallinditch on 6/5/2025, 11:02:03 PM

    As a Windsurf user I was happy with Claude 3.7 but then switched to Google Gemini 2.5 when Claude started glitching on a particularly large file. It's a bummer that 3.7 has gone from Windsurf - I considered cancelling my Windsurf subscription, but decided not to because it is still good value for money.

  • by consumer451 on 6/5/2025, 8:27:51 PM

    Man, if the benchmarks are to be believed, this is a lifeline for Windsurf as Anthropic becomes less and less friendly.

    However, in my personal experience Sonnet 3.x has still been king so far. Will be interesting to watch this unfold. At this point, it's still looking grim for Windsurf.

  • by sergiotapia on 6/5/2025, 5:01:55 PM

    In Cursor this is called "gemini-2.5-pro-preview-06-05" you have to enable it manually.

  • by aienjoyer on 6/6/2025, 6:43:04 PM

    The truth is that Gemini 2.5 6-05 is a fraud in coding; before, out of 10 codes you wrote, 1 or 2 might not work, meaning they had errors. Now, out of 10 codes, 9 or 10 are wrong. Why does it have so many errors???

  • by jdmoreira on 6/5/2025, 6:09:50 PM

    Is there a no brainer alternative to Claude Code where I can try other models?

  • by emehex on 6/5/2025, 5:03:49 PM

    Is this "kingfall"?

  • by tibbar on 6/5/2025, 5:14:06 PM

    Interesting, I just learned about matharena.ai. Google cherry-picks one result where they're the best here, but in the overall results, it's still O3 and o4-mini-high who are in the lead.

  • by energy123 on 6/5/2025, 4:58:11 PM

    So there's both a 05-06 model and a 06-05 model, and the launch page for 06-05 has some graphs with benchmarks for the 05-06 model but without the 06-05 model?

  • by johnnyApplePRNG on 6/5/2025, 6:58:30 PM

    General first impressions are that it's not as capable as 05-06, although it's technically testing better on the leaderboards... interesting.

  • by bli940505 on 6/5/2025, 10:09:30 PM

    I'm confused by the naming. It advertises itself as "Thinking" so is this the release of the new "Deep Think" model or not?

  • by excerionsforte on 6/6/2025, 3:08:22 AM

    Ok Google, I was deflated after you guys took away 03-25, but now I am happy again with 06-05. Hell yes, we are back baby!

  • by simianwords on 6/5/2025, 6:26:42 PM

    I feel stupid for asking but how do I enable deepthink?

  • by _pdp_ on 6/5/2025, 9:06:50 PM

    Is it still rate limited though?

  • by BDivyesh on 6/6/2025, 11:48:23 AM

    It depends on where and how you use it, I only use the gemini pro model on aistudio, and set the temperature to 0.05 or 0.1 in rare cases I bump it to 0.3 if I need some frontend creativity, it still isn't impressive, I see that claude is still far better, o4-mini-high too. When it comes to o3 I despise it, despite being ranked very high on benchmarks, the best version of it is only available through api.

  • by InTheArena on 6/6/2025, 1:36:37 AM

    RIght now, the claude code tooling and ChatGPT codex are far better then anything else I have seen for massive code development. Is there a better option out there with Gemini at the heart of it? I noticed the command line codex might support it.

  • by kisamoto on 6/6/2025, 5:12:56 AM

    Amateur question, how are people using this for coding?

    Direct chat and copy pasting code? Seems clunky.

    Or manually switching in cursor? Although is extra cost and not required for a lot of tasks where Cursor tab is faster and good enough. So need to opt in on demand.

    Cline + open router in VSCode?

    Something else?