• by lunarcave on 1/10/2025, 8:54:21 PM

    I have a hosted code-first agent builder platform in production, so I respond these question a lot from our customers.

    1. Probably the best is fly.io IMHO. It has a nice balance between running ephemeral containers that can support long running tasks, and quickly booting up to respond to a tool call. [1]

    2. If your task is truly long running, (I'm thinking several minutes), probably wise to put trigger [2] or temporal [3] under it.

    3. A mix of prompt caching, context shedding, progressive context enrichment [4].

    4. I'm building a platform that can be self-hosted to do a few of the above, so I can't speak to this. But most of my customers do not.

    5. To start with, a simple postgres table and pgvector is all you need. But I've recently been delighted with the DX of Upstash vector [5]. They handle the embeddings for you and give you a text-in, text-out experience. If you want more control, and savings on a higher scale, have heard good things about marqo.ai [6].

    Happy to talk more about this at length. (E-mail in the profile)

    [1] https://fly.io/docs/reference/architecture/

    [2] trigger.dev

    [3] temporal.io

    [4] https://www.inferable.ai/blog/posts/llm-progressive-context-...

    [5] https://upstash.com/docs/vector/overall/getstarted

    [6] https://www.marqo.ai/