Hacker News Clone

Visual ChatGPT

by debdut on 3/10/2023, 3:55:12 AM with 228 comments

by p-e-w on 3/10/2023, 5:20:23 AM
The "memory usage" section of the README highlights the surprising fact that image generation models need much less memory than text-based language models. ChatGPT itself is by far the most resource-hungry part of the system.
Why is that so? It seems counterintuitive. A single picture snapped with a phone takes more space to store than the text of all the books in a typical home library, yet Stable Diffusion runs with 5 GB of RAM while LLAMA needs 130 GB.
Can someone illuminate what's going on here?
by harveywi on 3/10/2023, 2:08:31 PM
Meta will probably soon release a competing technology. It will be called "DALL-E LLaMA".
by iandanforth on 3/10/2023, 1:32:39 PM
This feels like it owes more to LangChain than a link at the bottom of the page.
Compare their prompt:
https://github.com/microsoft/visual-chatgpt/blob/main/visual...
With that of the LangChain ReAct conversational agent:
https://github.com/hwchase17/langchain/blob/master/langchain...
Also it seems appropriate to cite the original ReAct paper (from Google mainly)
https://arxiv.org/abs/2210.03629
by spaceman_2020 on 3/10/2023, 5:03:32 AM
Man, Microsoft is kicking ass at AI. Maybe the others have great AI models too but haven’t seen any large company release product after product with AI.
by spagoop on 3/10/2023, 5:00:39 AM
Very cool. It's almost as if that chat session is a terminal, but instead of running commands you run prose. Very much a new HCI paradigm.
by pedrovhb on 3/10/2023, 11:48:21 AM
That's neat, but it's not doing anything in the latent space of ChatGPT, is it? As I understand, it basically teaches the assistant to use SD for generating images/descriptions, but comes with all the limitations of the image model being used (as opposed to a leap in results quality such as GPT 3.5 itself was). Teaching it to use tools is of course an interesting concept itself, though.
by swyx on 3/10/2023, 8:55:21 AM
i have been trying for an hour and am completely unable to run this project. currently facing a "Building wheel for numpy (pyproject.toml) did not run successfully." error.
the state of python dependency management and project distribution is just abjectly horrible.
---
update: perhaps spoke too soon. just made it work! https://github.com/microsoft/visual-chatgpt/issues/37
by sharkjacobs on 3/10/2023, 5:11:54 AM
We're at the point where these generative AIs are good enough that they're doing things which are really surprising and unexpected and kind of exciting, but they're bad enough that almost everything they create falls somewhere between mediocre and dogshit.
I really hope, if these this stuff is going to be ubiquitous, that there are big strides made in improving the quality of the output, very soon. The novelty of seeing fake screencaps of Disney's Beauty and the Beast directed by David Cronenberg is wearing off fast, and aside from some very niche use cases (write some boiler plate code for this common design pattern in this very popular language) I haven't found much it's actually useful for
by osigurdson on 3/10/2023, 2:02:17 PM
I think GPT is super useful but can't seem to eke any value out of DAL-E. Yes, it can draw a bear in a business suit on the beach well, which is impressive but I can't think of how to utilize this.
As an example, I've tried to get it to draw architecture diagrams, it draws a few boxes but then places the strangest text on those boxes.
by doctoboggan on 3/10/2023, 5:57:30 AM
Wow, this is very timely! I just finished up a script that uses ChatGPT (via openAI APIs) to read my customer support messages on Etsy and generate a response. Since I often send and receive images via Etsy support (my customers can customize the product with images) I have been searching for a way to let ChatGPT "know" what the image is. Current the script just inserts the text "<uploaded image>", but I was just hacking together something using stable-diffusion-webui's API (interrogate using CLIP), but was struggling with a few things. I took a break to browse HN and this pops up!
I will definitely be taking a look to see how this works and will try to get it integrated with my script.
by tuanx5 on 3/10/2023, 5:38:16 AM
This reminds me of Christina's workstation in Westworld Season 4
by iamflimflam1 on 3/10/2023, 4:05:51 AM
Linked paper is available here: https://arxiv.org/abs/2303.04671
by mmq on 3/10/2023, 11:23:59 AM
I think the chat interface is a bit restrictive when it comes to multimodal models. A much cleaner interface would be an "AI notebook" where the user can move, compare, rerun blocks. Also sharing, versioning and collaborating with others on notebooks is more straightforward.
by userbinator on 3/10/2023, 5:01:30 AM
"ChatGPT, I meant a desk with legs."
For a second, I thought this was a Visual Studio-related plugin.
by aaronrobert on 3/10/2023, 12:10:51 PM
ChatGPT now is not only a simple standalone AI model, but a powerful AI core engine, and more and more people or companies will develop more and more interesting things based on ChatGPT. Like this awesome visual ChatGPT.
by est on 3/10/2023, 5:53:46 AM
Microsoft is releasing second toy while Google had trouble launching its first.
by lwneal on 3/10/2023, 5:21:05 AM
The most incredible thing about this system is that it uses Stable Diffusion (the open source AI art generator), rather than DALL-E (the proprietary closed art generator owned by OpenAI).
The fact that even Microsoft, which partially owns OpenAI, is giving up on DALL-E shows the power of building an open-source community around models with published, downloadable weights.
by gavi on 3/10/2023, 2:04:04 PM
If you are trying to run this on a single GPU, please be aware the models take up a lot of memory. You can reduce the number of tools by modifying the self.tools portion of the python script
by tomohelix on 3/10/2023, 5:08:31 AM
I guess one of the advantage of being early is that Microsoft get to pick all the low hanging fruit first.
All of these products are very useful and interesting by itself but it is still too early to know if MS can continue to refine and maintain a competitive edge. Dall-E basically died in a few months, unable to compete. Hopefully these other stuff will have better fate.
by shp0ngle on 3/10/2023, 11:51:13 AM
I think they ate using StableDiffusion and not Dall-E? Which makes it kind of funny
by zhangyiwu on 3/14/2023, 9:30:21 AM
I have tried to use my Macbook pro(M2 pro) to run it out, but failed to download the massive file。
by hackerlight on 3/10/2023, 5:26:28 AM
There are more examples in the paper:
https://arxiv.org/pdf/2303.04671.pdf
by yazzku on 3/10/2023, 4:53:16 PM
The shit has an MIT license... then requires an API key. Open source all the way, guys! Microsoft loves Open Source!
by razodactyl on 3/13/2023, 4:36:52 AM
The comprehension thrown around in this thread is beautiful. Love the passion.
by totetsu on 3/10/2023, 7:02:07 AM
Are there any recommendable resources for learning about designing these kind of system architectures?
by qntmfred on 3/10/2023, 4:04:23 PM
hmmm can I use this to see how far away we are now
https://karpathy.github.io/2012/10/22/state-of-computer-visi...
by Havoc on 3/10/2023, 1:05:32 PM
Happy that this is <8gb vram. Neatly fits into medium/highish consumer GPUs
by golol on 3/10/2023, 5:27:09 PM
future AI systems based on LLMs and other foundation models might think less like individuals and more like companies. Ironically, LLMs might finally make symbolic AI possible! The way I see it, symbolic AI was always missing a small sprinkle of "general intelligence" too amooth things out, to grease the gears and connect interfaces. I feel like LLMs have that little bit of magical "generality" so we can start building "symbolic" AI systems which produce work by managing a number of black box models. It is like a company: protocols and management structures are a sort of symbolic AI that connects black box humans to eachother.
by amccloud on 3/10/2023, 4:39:17 PM
Ive created a little api to grab images from pages to embed in chats. Was surprisingly easy to control with natural language.
https://aimgsrc.com
by pmarreck on 3/10/2023, 3:24:35 PM
the pace of all this is astonishing, this is amazing
by trompetenaccoun on 3/10/2023, 10:53:57 AM
Endless new possibilities for online scammers. Bright times ahead.
by kilgnad on 3/10/2023, 5:31:37 AM
Now is a really good time to make a start up called skynet.