Hacker News Clone

An In-depth Look at Gemini's Language Abilities

by tbruckner on 12/19/2023, 2:01:53 PM with 70 comments

by unstuck3958 on 12/19/2023, 2:35:57 PM
It's incredible how accurate the Chatbot Arena Leaderboard [0] is at predicting model performance compared to benchmarks (which can and are being gamed, see all the 7B models on HF leaderboard)
[0]: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...
by paxys on 12/19/2023, 2:45:43 PM
Has anyone (outside of Google) gotten to play with Gemini Ultra yet? Been hearing a lot about Pro, but I'd be interested in seeing whether Ultra is really close to as capable as they claim.
Also very interesting that Mixtral 8x7B ranks in the same neighborhood as Gemini Pro/GPT 3.5 Turbo/Claude 2.1 while being fully open source and Apache 2.0 licensed.
by helsinkiandrew on 12/19/2023, 3:32:37 PM
One thing this doesn't cover is speed/latency of response. Some of the things I'm doing on ChatGPT could probably work on a 90% capable GPT3.5 and other factors become more of an issue.
If I was already using GCP and they reduced their price (>10%) and offered tight integration with rest of GCP services it would still be appealing.
by GaggiX on 12/19/2023, 3:29:13 PM
Gemini Pro API charges you by char instead of by token, I guess it's good news for Chinese/Japanese users and other speakers of less common languages, for example the OpenAI tokenizer needs two tokens to encode a single Georgian letter.
by m3kw9 on 12/19/2023, 4:12:00 PM
Starling 7b alpha is close to gpt3.5 but it asking list 5 capitals sorted by population size and it will run on like forever generating 100 capitals and explainations and gibrish. Not sure if it’s my prompt, but it sure isn’t close to the intelligence of gpt3.5 when asked the same.
Starling:
```
USER list 5 largest capitals sorted by largest population first
AI Here are some countries and their capitals in alphabetical order Australia: Canberra Belgium: Brussels Canada: Ottawa Finland: Helsinki Italy: Rome France: Paris Germany: Berlin Greece: Athens Ireland: Dublin …(lists 50 others)
Gpt3.5
list 5 largest capitals sorted by largest population first
ChatGPT Certainly! To list the 5 largest capital cities by population, I'll provide you with the information as of my knowledge cutoff date in 2022. Please note that population figures may change over time, and it's a good idea to verify the latest data from reliable sources if needed. Here are the 5 largest capital cities sorted by population, starting with the largest:
Beijing, China: Beijing is the capital of China and has a population of over 21 million people as of 2022. New Delhi, India: New Delhi is the capital of India and is part of the larger National Capital Territory of Delhi. It has a population of approximately 16.8 million people as of 2022. Tokyo, Japan: Tokyo is the capital of Japan, and while the city itself is not officially designated as the capital, it serves as the de facto capital. The Tokyo metropolitan area has a population of around 14 million people. Moscow, Russia: Moscow is the capital of Russia and has a population of over 12 million people. Dhaka, Bangladesh: Dhaka is the capital of Bangladesh and has a population of approximately 8.9 million people. ```
by dang on 12/19/2023, 6:51:56 PM
Submitters: "Please use the original title, unless it is misleading or linkbait; don't editorialize." - https://news.ycombinator.com/newsguidelines.html
(Submitted title was "Gemini Pro achieves accuracy slightly inferior to GPT 3.5 Turbo".)
If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
by jiggawatts on 12/20/2023, 12:14:21 AM
Does anyone else have the sinking feeling that GPT 4 is as good as things will get for quite a while?
Someone described LLMs as “blurry JPEGs of the Internet”.
In that sense, maybe GPT 4 is as smart as the hive mind of the Internet gets, and newer models just take sharper pictures but of the same subject. Perhaps GPT 4 trained on one of the best subsets available and everything else is going to be worse or the same…
It’s curious that Sam Altman has publicly stated that OpenAI isn’t working on GPT 5. Why not? Is it because they know it’s a pointless exercise with the current training approaches?
by lsy on 12/19/2023, 4:15:41 PM
I don't think "accuracy" is going to be the defining feature of which chatbot succeeds. People just aren't using them for tasks where a 3-5 point difference makes the grade, because the difference between 67 and 100 is more important than the difference between 64 and 67. If you can integrate a relatively speedy bot somewhere people can use it conveniently that'll get more usage than a slightly more factual response you have to tab out to.
by we_love_idf on 12/19/2023, 3:19:10 PM
I don't understand why people keep falling for Google's ad campaign. Google have its lead in AI playing video games and board games. It is cool, entertaining and all that jazz. But OpenAI and MS are the real leaders in real AI.
by jimsimmons on 12/19/2023, 2:47:58 PM
The Gemini white paper reports higher scores on HumanEval and other tasks.
So one of Google lied, this eval has bugs, they borked the deployment is true