Hacker News Clone

Google Illuminate: Books and papers turned into audio

by leblancfg on 9/10/2024, 4:22:13 PM with 243 comments

by freefaler on 9/10/2024, 5:24:15 PM
Great idea. I wonder how long until we'd see a lot of "autogenerated" podcasts with syndicated advertising inside spamming the podcast space.
Like with robovoiced videos on YT reading some scraped content.
by fny on 9/10/2024, 5:17:38 PM
Very clever use case. I'm presuming the set up here is as follows:
- LLM-driven back and forth with the paper as context
- Text-to-speech
Pricing for high quality text to speech with Google's studio voices run at USD 160.00/1M count. And given the average 10 minute recording at the average 130 WPM is 1,300 words and at 5 characters per word is 6500, we can estimate an audio cost of $1. LLM cost is probably about the same given the research paper processing and conversation.
So only costs about $2-3 per 10 minute recording. Wild.
by dlisboa on 9/10/2024, 5:40:17 PM
One problem I see with this is legitimizing LLM-extracted content as canon. The realistic human speech masks the fact that the LLM might be hallucinating or highlighting the wrong parts of a book/paper as important.
by falcor84 on 9/11/2024, 2:29:55 PM
This is really cool, and it got me thinking - is there any missing piece to creating a full AI lecturer based on this?
What I'm thinking of is that I'd input a pdf, and the AI will do a bit of preprocessing leading to the creation of learning outcomes, talking points, visual aids and comprehension questions for me; and then once it's ready, will begin to lecture to me about the topic, allowing me to interrupt it at any point with my questions, after which it'll resume the lecture while adapting to any new context from my interruptions.
Are we there yet?
by vincentpants on 9/10/2024, 8:40:20 PM
Listening to an AI generated discussion-based podcast on the topic of anticipating the scraping of deceased people's digital footprint to create an AI copy of your loved one makes the cells that make up my body want to give up on fighting entropy.
by nxobject on 9/10/2024, 5:48:19 PM
A related experiment from Google: NotebookLM (notebooklm.google.com), which takes a group of documents and provides a RAG Gemini chatbot in return.
I wish Google would make these experiments more well-known!
by syntaxing on 9/10/2024, 5:55:55 PM
I’ve been using the ElevenLabs Reader app to read some articles during my drive and it’s been amazing. It’s great to be able to listen to Money Stuff whenever I want to. The audio quality is about 90% there. Occasionally, the tone of the sentence is wrong (like surprised when it should be sad) and the wrong enunciation (bow, like bowing down or tying a bow) but still very listenable.
by leobg on 9/10/2024, 8:10:32 PM
I made something like this for my kids:
1. Take a science book. I used one Einstein loved as a kid, in German. But I can also use Asimov in English. Or anything else. We’ll handle language and outdated information on the LLM level.
2. Extract the core ideas and narrative with an LLM and rewrite it into a conversation, say, between a curious 7 year old girl and her dad. We can take into account what my kids are interested in, what they already know, facts from their own life, comparisons with their surroundings etc. to make it more engaging.
3. Turn it into audio using Text-to-Speech (multiple voices).
by lasermike026 on 9/11/2024, 12:17:28 PM
While this is very nice what I need is my computer to take voice commands, read content in various formats and structure, and take dictation for all of my apps. I need this in my phone too. I can do this now but I have to use a bunch of different tools that don't work seamless together. I need the Voice and Conversational User Interface that is built into the operating system.
by banku on 9/11/2024, 1:26:09 PM
I like how it generates a conversation, rather than just "reading out" or simplifying the content. You can extend this idea to enhance the dynamics of agent interactions
by elashri on 9/10/2024, 9:37:48 PM
One useful use case would be helping making academic papers more accessible. It would be useful also for people to listen to arxiv papers that seems interesting. It would be useful tool in academic world. Also useful for students who would have more accessible form of learning.
I have a project idea already to use arxiv RSS API to fetch interesting papers based on keywords (or some LLM summary) and then pass it to something like illuminate and then you have a listening queue to follow latest in the field. Though there will be some problems with formatting but then you could just open the pdf to see the plots and equations.
by banach on 9/10/2024, 7:56:07 PM
I can see this working reasonably for text that you can understand without referring to figures, and for texts for which there is external content available that such a conversation could be based on. For a new, say, math paper, without prose interspersed, I’d be surprised if the generated conversation will be worth much. On the other hand, that is a corner case and, personally, I suspect I will be using this for the many texts where all I need is a presentation of the material that is easy to listen to.
by bitshiftfaced on 9/10/2024, 6:31:27 PM
Occasionally there's a podcast or video I'd like to listen to, but one of the voices is either difficult to understand, or in some way awful to listen to, or maybe the sound quality is really bad. It would be nice to have a an option for an automatically redubbed audio.
by dgellow on 9/10/2024, 5:47:17 PM
Really impressive. The podcasting spam we will get from this will be a pain, but really impressive demo
by keyle on 9/11/2024, 2:37:05 AM
I listen to 5 mins of this and all I can feel is sadness and how cringe it is.
Please do not replace humanity with a faint imitation of what makes use human, actual spontaneity.
If you produce AI content, don't emulate small talk and quirky side jabs. It's pathetic.
This is just more hot garbage on top of a pile of junk.
I imagine a brighter future where we can choose to turn that off and remove it from search, like the low quality content it is. I would rather read imperfect content from human beings, coming from the source, than perfectly redigested AI clown vomit.
Note: I use AI tools every day. I have nothing against AI generated content, I have everything against AI advancements in human replacement, the "pretend" part. Classifying and returning knowledge is great. But I really dislike the trend of making AI more "human like", to the point of deceiving, such as pretending small talk and perfect human voice synthesis.
by smusamashah on 9/10/2024, 5:37:57 PM
Is that audio all generated? All the pauses, breaths, speed ups and everything?
by simon_kun on 9/11/2024, 6:39:02 PM
Google launched similar functionality in NotebookLM today. You can generate podcasts from a wide range of sources: https://blog.google/technology/ai/notebooklm-audio-overviews...
Looks like you can generate from Website URLs if you add them as sources to your notebook, as well as Slides, Docs, PDFs etc. Anything NotebookLM supports.
by aanet on 9/10/2024, 7:57:35 PM
What a fantastic idea! Great way to learn about those pesky research papers I keep downloading (but never get to reading them). I tried a few, e.g. Attention is All You Need, etc. The summary was fantastic, and the discussion was, well, informative.
Does anyone know how the summary was generated? (text summarization, I suppose?) Is there a bias towards "podcast-style discussion"? Not that I'm complaining about it - just that I found it helpful.
by antirez on 9/10/2024, 8:25:27 PM
Related: [rumors] Audible is starting a pilot project to do just that with the ebooks.
by oidar on 9/10/2024, 5:28:26 PM
The voice models for this are very good. I'd love to have granular control over the output of a model like this locally.
by maxglute on 9/11/2024, 5:02:40 AM
AI voices sound particularly good at higher playback rates, with silence removal. Which is granted is an acquired taste, but common feature for podcast players so there's audience for it. Fast talkers feel more competent and one kind of stops interrogating on quality of speech.
by bogwog on 9/10/2024, 6:12:50 PM
What does this accomplish? Who does this help? How does this make the world a better place?
This only seems like it would be useful for spammers trying to game platforms, which is silly because spam is probably the number one thing bringing down the quality of Google's own products and services.
by throwaway81523 on 9/11/2024, 1:16:27 AM
How about making the program work in the other direction. It could take one of those 30 minute youtube tutorial videos that is full of fluff and music, and turn it into an instructables-like text article with a few still pictures.
by tambourine_man on 9/11/2024, 2:56:18 PM
This is as impressive as it is scary and creepy.
It also tells us something about humans, because it really does feel more engaging having two voices discussing a subject than simple text-to-speech, even though the information density is smaller.
by theage on 9/11/2024, 4:16:31 AM
The choice of intonement even mimics creatives which I'm sure they'll love. The vocal fry, talking through a forced smile, bumbling host is so typical. Only, no one minds demanding better from a robot so it's even more excruciating fluff with no possible parasocial angle.
Limiting choice to frivolous voices is really testing the waters for how people will respond to fully acted voice gen from them, they want that trust from the creative guild first. But for users who run into this rigid stuff it's going to be like fake generated grandma pics in your google recipe modals.
by Analemma_ on 9/10/2024, 8:36:40 PM
Books I can understand, but I'm genuinely curious: would anyone here find it useful to hear scientific papers as narrated audio? Maybe it depends on the field, but when I read e.g. an ML paper, I almost always have to go through it line-by-line with a pen and scratchpad, jumping back and forth and taking notes, to be sure I've actually "got it". Sometimes I might read a paragraph a dozen times. I can't see myself getting any value out of this, but I'm interested if others would find it useful.
by yencabulator on 9/11/2024, 5:14:11 PM
Maybe I'm the odd one out but "That's interesting. Can you elaborate more?", "Good question", "That sounds like a clever way" etc were annoying filler.
by fabmilo on 9/10/2024, 7:18:07 PM
so much pleasantry so much fluff. reduce the noise. get to the point.
by C-Loftus on 9/11/2024, 1:21:11 AM
Synthesized voices are legitimately a great way to read more and give your eyes a break. I personally prefer just converting a page or book to an audiobook myself locally. The new piper TTS models are easy to run locally and work very well. I made a simple CLI application and some other folks here liked it so figured I post it.
https://github.com/C-Loftus/QuickPiperAudiobook
by SeanAnderson on 9/10/2024, 7:46:17 PM
I'm fairly excited for this use case. I recently made the switch from Audible to Libby for my audiobook needs. Overall, it's been good/fine, but I get disappointed when the library only has text copies of a book I want to listen to. Often times they aren't especially popular books so it seems unlikely they'll get a voiceover anytime soon. Using AI to narrate these books will solve a real problem I experience currently :)
by colesantiago on 9/10/2024, 5:38:51 PM
So podcasts are now automated, anything with a speaker or a screen is now assumed to be not human.
Is this supposed to be a good thing that we want to accelerate (e/acc) towards?
by hiby007 on 9/11/2024, 4:14:53 PM
Why I feel this will end up on https://killedbygoogle.com/
by israrkhan on 9/11/2024, 12:20:35 AM
Great... a new era of autogenerated podcasts is here.
by timonoko on 9/10/2024, 6:20:48 PM
Works surprisingly well. I actually bothered to listen "discussions" about these boring-looking papers.
English is particularly bad to read aloud because it is like programming language Fortran based on immutable tokens. If you want tonal variety, you have to understand the content.
Some other languages modify the tokens themselves, so just one word can be pompous, comical, uneducated etc.
by ancorevard on 9/11/2024, 2:38:57 PM
Are there any services like this that exist with an API?
I would like to send a text and then get back a podcast dialog between two people.
by layman51 on 9/10/2024, 9:00:30 PM
Did anyone else notice that according to the generation info, each recording was created on 12/31/69 at 4:00 PM?
by e12e on 9/10/2024, 9:00:40 PM
Interesting - listening to the first example (Attention is all you need)[1] - I wonder what illuminate would make of Fielding's REST thesis?
[1] https://illuminate.google.com/home?pli=1&play=SKUdNc_PPLL8
by marviel on 9/11/2024, 2:33:36 AM
I'm bullish on podcasts as a Passive learning counterpart to the Active learning style in traditional educational instruction. Will be releasing a general purpose podcast generator for educational purposes in reasonote.com within the next few days, along with the rest of the core featureset.
by bluelightning2k on 9/10/2024, 5:40:13 PM
This is really cool. Although I wouldn't put money on a Google project sticking around even if it was a full fledged product!
More of a tech demo than anything else.
What's wild about this is that the voices seem way better than GCP's TTS that I've seen. Any way to get those voices as an API?
by srameshc on 9/10/2024, 6:18:06 PM
We are working on something content driven (for an ad or subscription model) with lot of effort and time and I am concerned how this technology will affect all that effort and eventually monetization ideas. But I can see how helpful this tool can be for learning new stuff.
by oulipo on 9/10/2024, 8:02:05 PM
Why not, if you could also interject with questions, remarks, or "cut the chase" like remarks.
Also it's weird that they focus only on AI papers in the demo, and not more interesting social stuff, like environment protection, climate change, etc
by ants_everywhere on 9/10/2024, 7:38:18 PM
This is a good idea and well executed. I think the hard part now is pointing it in an appropriate direction.
If it's just used for generating low quality robo content like we see on TikTok and YouTube then it's not so interesting.
by ElijahLynn on 9/10/2024, 10:01:20 PM
I've been meaning be the all you need is attention paper for yours and never have. And I finally listened to that little generated interview as their first example. I think this is going to be very very useful to me!
by greesil on 9/11/2024, 3:37:58 AM
Can't wait to hear some hallucinated alternative facts in a hot new podcast.
by Ninjinka on 9/10/2024, 6:27:15 PM
the Lexification/Roganization/Dwarkeshing/Hubermanning of reading
by yismail on 9/10/2024, 10:00:06 PM
I got in the beta a couple weeks ago and tried it out on some papers [0]
[0] https://news.ycombinator.com/item?id=41020635
by SpencerBratman on 9/11/2024, 2:47:17 PM
founder of podera.ai here, we're building this right now (turn anything into a podcast) with custom voices, customization, and more. would love some hn feedback!
by surfingdino on 9/11/2024, 9:07:19 AM
Amazing. I see great future ahead. We are already able to turn audiobooks into eBooks and Illuminate finally completes the circle of content regurgitation.
by yunohn on 9/10/2024, 10:33:03 PM
I listened to multiple demos, the pauses and vocal intonations sound so fake. They’re inserted at odd times that a real human speaker would not.
by dpflan on 9/11/2024, 4:16:59 PM
Why is this appealing?
Why would one prefer this AI conversation to the actual source?
Can these be agents and allow the listener to ask questions / interact?
by jamalaramala on 9/11/2024, 10:29:52 AM
By now, we can find thousands of hours of discussions online about popular papers such as "Attention is All You Need". It should be possible to generate something similar without using the paper as a source -- and I suspect that's what the AI does.
In other words: I suspect that the output is heavily derivative from online discussions, and not based on the papers.
Of course, the real proof would be to see the output for entirely new papers.
by GaggiX on 9/11/2024, 4:05:09 AM
Did they removed the book section? I can only find the "papers" section now.
by ansk on 9/10/2024, 5:50:15 PM
Imagine reading a math or programming textbook where each statement was true with probability 0.95.
by WalterBright on 9/11/2024, 6:28:58 PM
Didn't Amazon get in trouble for Kindles that read books out loud?
by motoxpro on 9/10/2024, 8:38:16 PM
This is insane! To be able to listen to a conversation to learn about any topic is amazing. Maybe it's just me because I listen to so many podcasts but this is Planet Money or The Indicator from NPR about anything.
Definitely one of the coolest things I have seen an LLM do.
by Animats on 9/11/2024, 5:03:21 AM
Why did they have to call an audio system "Illuminate"?
by srik on 9/10/2024, 6:29:53 PM
Nothing is real anymore.
by RobMurray on 9/10/2024, 7:39:28 PM
I couldn't listen for more than a couple of minutes. It's the usual repetitive, over wordy llm generated drivel.
by MailleQuiMaille on 9/11/2024, 1:43:32 AM
How long until you are part of the conversation...?
by OutOfHere on 9/10/2024, 8:33:13 PM
Can it make something bigger than 5 minutes?
by alganet on 9/10/2024, 5:42:56 PM
Cool tech. Now we know that very soon no one will be able to trust podcasts or video narration.
by danesparza on 9/10/2024, 5:55:45 PM
I wonder how soon until this waitlisted service eventually gets thrown on the trash heap that Google Reader is on.
Building trust with your users is important, Google.
by albert_e on 9/10/2024, 6:25:29 PM
the player always starts at 30:00 for me and plays a 4 to 7 minute cllip that seems complete but very brief
by alenwithoutproc on 9/10/2024, 8:52:31 PM
it would be really cool if we’d have a clubhouse-style gen-ai feed for hn or reddit comments to listen to.
to me
by belval on 9/10/2024, 8:54:43 PM
I guess I am in my grouchy old person phase but all I could think of what the Gilfoyle quote from Silicon Valley when presented with a talking refrigerator.
> "Bad enough it has to talk, does it need fake vocal tics...?" - Gilfoyle
Found it: https://youtu.be/APlmfdbjmUY?si=b4-rgkxeXigU_un_&t=179
by richardreeze on 9/13/2024, 1:03:27 AM
This is something I don't get about Google.
I saw they launched NotebookLM Audio Overview today: https://blog.google/technology/ai/notebooklm-audio-overviews...
So what the heck is illuminate and why would they simultaneously launch a competing product?
by CatWChainsaw on 9/10/2024, 9:23:22 PM
So it will immediately be trashed by GenAI bullshit and killedbygoogle within three years, right?
by nonrandomstring on 9/10/2024, 6:14:06 PM
I think I just discovered a new emotion. Simultaneous feelings of excitement and disappointment.
No matter how great the idea, it's hard to stay excited for more than a few microseconds at the sight of the word "Google". I can already hear the gravediggers shovels preparing a plot in the Google graveyard, and hear the sobs of the people who built their lives, workflows, even jobs and businesses around something that will be tossed aside as soon as it stops being someone's pet play-thing at Google.
A strange ambivalent feeling of hope already tarnished with tragedy.
by franze on 9/10/2024, 7:41:33 PM
Oh, another Google Waitlist...