Hacker News Clone

Tackling multiple tasks with a single visual language model

by Ftuuky on 4/28/2022, 3:17:00 PM with 14 comments

by Jack000 on 4/28/2022, 5:18:50 PM
2022: Deepmind releases paper on bootstrapped meta-learning and scaling RL agents
2023: RL agent trained for multi-task learning solves majority of perfect information games. It's a scaled up decision transformer. Scaling laws for RL agents are discovered, similar to language models.
2024: Large scale RL agents are combined with frozen vision and language models via cross-attention, can be prompted one-shot with language/vision tokens to solve novel tasks.
2025: RL agents enter the real world - first pre-trained in diverse synthetic environments, then via imitation learning from youtube videos, and finally in an online fashion via realtime human interaction.
timeline might be optimistic, but one can hope!
by maxwells-daemon on 4/28/2022, 7:41:39 PM
Wow! The ability to ingest the "cross product" of data on the internet and in the real world is huge; I bet a lot of what LMs don't know yet lives in that space. This seems a lot more general-purpose than CLIP, so I'm hopeful for even more impressive downstream applications, eg robotics.
by goldenkey on 4/29/2022, 1:42:19 AM
"I am not affected by this difference" - What The Fuck?!
by bobbylarrybobby on 4/28/2022, 4:34:25 PM
The conversations are scary. They almost don't seem believable -- did I miss the part where they say they're just an example of what a conversation might look like?
by jcims on 4/28/2022, 4:43:11 PM
I would love to hear some of the spine tingling moments these researchers experience when developing and interacting with large models.
by razodactyl on 4/28/2022, 3:39:14 PM
AI. Just casually evolving alongside and using us as their conduit. Lol