Hacker News Clone

Show HN: Vomitorium – all of your project in 1 text file

by jwally on 9/8/2024, 1:45:33 PM with 53 comments

by ghgr on 9/10/2024, 8:03:36 AM
As an alternative to (npm -g)'ing here some potentially useful coreutils one-liners I've been using for a similar purpose:
- Dump all .py files into out.txt (for copy/paste into a LLM)
> find . -name "*.py" -exec cat {} + > out.txt
- Sort all .py files by number of lines
> find . -name '*.py' -exec wc -l {} + | sort -n
by c-fe on 9/10/2024, 8:47:28 AM
Love this. I created (half-jokingly, but only half) the concept of a monofile (inspired by our monorepo) in our team. I have not managed to convince my colleagues to switch yet, but maybe this package can help. Unironically, I find that in larger python projects, combining various related sub 100 loc files into one big sub 1000 loc file can do magic to circular import errors and remove 100s of lines of import statements.
by __MatrixMan__ on 9/10/2024, 12:48:28 PM
I've been dreaming of a tool which resembles this, at least in spirit.
I want to figure out how to structure a codebase such that a failing test can spit out a CID for that failure such that it can be remotely recreated (you'd have to be running ipfs so that the remote party can pull the content from you, or maybe you push it to some kind of hub before you share it).
It would be the files relevant to that failure--both code files and data files, stdin, env vars... a reproducible build of a test result.
It would be handy for reporting bugs or getting LLM help. The remote party could respond with a similar "try this" hash which the tooling would then understand how to apply (fetching the necessary bits from their machine, or the hub). Sort of like how Unison resolves functions by cryptographic hash, except this is a link to a function call, so it's got inputs and outputs too.
Of course that's a long way from vomiting everything into a text file, I need to establish functional dependency at as small a granularity as possible, but this feels like the first step on a path that eventually gets us there.
by orlp on 9/10/2024, 9:40:45 AM
With fd: https://github.com/sharkdp/fd
```
    fd [filter options] -X cat
```
E.g. to combine all .js files into combined.js:
```
    fd -e js -X cat > combined.js
```
by scioto on 9/10/2024, 10:31:58 AM
It'd be nice if something similar were available to traverse, say, directories of writings in Markdown, Word, LibreOffice, etc., and output a single text file so I have all my writings in one place. Plus allow plug-ins to extract from more exotic file types not originally included.

by vdm on 9/10/2024, 12:41:45 PM

    shopt -s globstar
    tail -n+1 **/*.py | pbcopy

by Charon77 on 9/10/2024, 10:01:22 AM
Isn't this a tar file?
by mosselman on 9/10/2024, 11:31:37 AM
I can imagine the token counts to be off the charts. How would an llm handle this input? Llm output quality already drops quite hard at a out 3000 tokens let alone 128k
by rectalogic on 9/10/2024, 11:50:25 AM
Similar https://github.com/simonw/files-to-prompt
by wilsonzlin on 9/10/2024, 8:39:32 AM
Similar project: https://github.com/yamadashy/repopack
by samrolken on 9/10/2024, 9:57:02 AM
I have a bash script which is very similar to this, except instead of dumping it all into one file, it opens all the matched files as tabs in Zed. Since Zed's AI features let you dump all, or a subset, of open tabs into context, this works great. It gives me a chance to curate the context a little more. And what I'm working on is probably already in an open tab anyway.
by breck on 9/8/2024, 10:36:26 PM
This made me laugh. Thanks!
Can you go 1 more step? Is there a way to not just dump someone's project into a plain text file, but sometime intelligently craft it into a ready to go prompt? I could use that!
Here's my user test: https://www.youtube.com/watch?v=sTPTJ4ladiI
by theviolacode on 9/11/2024, 4:50:49 PM
Cool! I'd like to see an indication of the total number of tokens in the output, so I know right away on which LLM I can use this prompt or, if it's too large, I can relaunch the script by excluding other files to reduce the number of tokens in the output
by mp5 on 9/11/2024, 10:26:17 AM
One feature you could add is allowing the user to map changes in the concatenated file back to the original files. For example, if an LLM edits the concatenated file, I would want it to return the corresponding filenames and line numbers of the original files.
by turblety on 9/10/2024, 12:33:24 PM
Really nice! I made a small cli tool that has an extra step of basically printing out a tree, so you can ask the ai what files you want to output:
https://github.com/markwylde/ai-toolkit
by locallost on 9/10/2024, 10:54:57 AM
Why do we need modules at all? [1]
[1] https://erlang.org/pipermail/erlang-questions/2011-May/05876...
by leovailati on 9/10/2024, 1:01:38 PM
We use a C compiler for embedded systems that doesn't support link time optimizations (unless you pay for the pro version, that is). I have been thinking about some tool like this that merges all C source files for compilation.
by jonplackett on 9/10/2024, 11:19:34 AM
This is really helpful. I immediately thought I’d be useful for sending off to ChatGPT and then saw that’s what it’s actually for. Thank you!
by aetherspawn on 9/10/2024, 11:30:23 PM
Surely with storage being pretty slow and everything it would be better to compress it into an archive with really basic compression?
by gopi on 9/10/2024, 3:28:02 PM
Shouldn't this work?
find /path/to/directory -type f -exec cat {} + > output.txt
by lynx23 on 9/10/2024, 11:24:33 AM
vim-ai basically supports this use case out of the box. All you need is your a index file listing all the files you want included, starting with
>>> include
by guidedlight on 9/10/2024, 9:54:57 AM
This is probably very useful for use with LLM’s.
by istvanmeszaros on 9/10/2024, 7:19:54 AM
Love the name :D.
by ziofill on 9/10/2024, 10:59:07 AM
the .sick file extension is a nice touch ^^
by donw on 9/10/2024, 10:19:21 AM
Be careful with the name, McDonald’s might sue you for copyright infringement.
by inciampati on 9/10/2024, 12:38:40 PM
find ... | xargs head -n -0
by frereubu on 9/10/2024, 11:04:03 AM
The name links up nicely with AI enshittification. Although if you wanted to be pedantic, for that metaphor to work you'd really want to call it "gorge" or something more related to ingestion rather than vomiting. (I'm aware that a vomitorium was the exit from a Roman stadium, so it's not really about throwing up either).