Hacker News Clone

Introducing S2

by brancz on 12/21/2024, 3:11:19 PM with 195 comments

by animex on 12/21/2024, 10:45:34 PM
IANAL,but naming your product S2 and mentioning in the intro that AWS S3 is the tech you are enhancing is probably looking for a branding/copyright claim from Amazon. Same vertical & definitely will cause consumer confusion. I'm sure you've done the research about whether a trademark has been registered.
https://tsdr.uspto.gov/#caseNumber=98324800&caseSearchType=U...
by myflash13 on 12/21/2024, 7:08:27 PM
This is a really good idea, beautiful API, and something that I would like to use for my projects. However I have zero confidence that this startup would last very long in its current form. If it's successful, AWS will build a better and cheaper in-house version. It's just as likely to fail to get traction.
If this had been released instead as a Papertrail-like end-user product with dashboards, etc. instead of a "cloud primitive" API so closely tied to AWS, it would make a lot more sense. Add the ability to bring my own S3-Compatible backend (such as Digital Ocean Spaces), and boom, you have a fantastic, durable, cloud-agnostic product.
by solatic on 12/21/2024, 6:33:07 PM
Help me understand - you build on top of AWS, which charges $0.09/GB for egress to the Internet, yet you're charging $0.05/GB for egress to the Internet? Sounds like you're subsidizing egress from AWS? Or do you have access to non-public egress pricing?
by masterj on 12/21/2024, 6:43:59 PM
So is this basically WarpStream except providing a lower-level API instead of jumping straight to Kafka compatibility?
An S3-level primitive API for streaming seems really valuable in the long-term if adopted
by iambateman on 12/21/2024, 6:22:06 PM
These folks knowingly chose to spend the rest of their careers explaining that they are not, in fact, S3.
by pram on 12/21/2024, 7:08:01 PM
It looks neat but, no Java SDK? Every company I've personally worked at is deeply reliant on Spring or the vanilla clients to produce/consume to Kafka 90% of the time. This kind of precludes even a casual PoC.
by karmakaze on 12/21/2024, 8:53:25 PM
I do like this. The next part I'd like someone to build on top of this is applying the stream 'events' into a point-in-time queryable representation. Basically the other part to make it a Datatomic. Probably better if it's a pattern or framework for making specific in-memory queryable data rather than a particular database. There's lots of ways this could work, like applying to a local Sqlite, or basing on a MySQL binlog that can be applied to a local query instance and rewindable to specific points, or more application-specific apply/undo events to a local state.
by jgraettinger1 on 12/22/2024, 7:01:29 PM
Roughly ten years ago, I started Gazette [0]. Gazette is in an architectural middle-ground between Kafka and WarpStream (and S2). It offers unbounded byte-oriented log streams which are backed by S3, but brokers use local scratch disks for initial replication / durability guarantees and to lower latency for appends and reads (p99 <5ms as opposed to >500ms), while guaranteeing all files make it to S3 with niceties like configurable target sizes / compression / latency bounds. Clients doing historical reads pull content directly from S3, and then switch to live tailing of very recent appends.
Gazette started as an internal tool in my previous startup (AdTech related). When forming our current business, we very briefly considered offering it as a raw service [1] before moving on to a holistic data movement platform that uses Gazette as an internal detail [2].
My feedback is: the market positioning for a service like this is extremely narrow. You basically have to make it API compatible with a thing that your target customer is already using so that trying it is zero friction (WarpStream nailed this), or you have to move further up to the application stack and more-directly address the problems your target customers are trying to solve (as we have). Good luck!
[0]: https://gazette.readthedocs.io/en/latest/ [1]: https://news.ycombinator.com/item?id=21464300 [2]: https://estuary.dev
by Scaevolus on 12/21/2024, 6:55:47 PM
This is a very useful service model, but I'm confused about the value proposition given how every write is persisted to S3 before being acknowledged.
I suppose the writers could batch a group of records before writing them out as a larger blob, with background processes performing compaction, but it's still an object-backed streaming service, right?
AWS has shown their willingness to implement mostly-protocol compatible services (RDS -> Aurora), and I could see them doing the same with a Kafka reimplementation.
by evantbyrne on 12/21/2024, 10:17:15 PM
Seems like really cool tech. Such a bummer that the it is not source available. I might be a minority in this opinion, but I would absolutely consider commercial services where the core tech is all released under something like a FSL with fully supported self-hosting. Otherwise, the lock-in vs something like kafka is hard to justify.
by throwawayian on 12/22/2024, 12:37:39 PM
I look at the egress costs to internet and it doesn’t check out. It’s a premium product dependent on DX, marketed to funded startups.
But if I care about ingress and egress costs, which many stream heavy infrastructure providers do.. This doesn’t add up.
I wish them luck, but I feel they would have had a much better chance from the start by getting some funding and having a loss leader start, then organising and passing on wholesale rates from cloud providers once they’d reached critical mass.
Instead they’re going in at retail which is very spicy. I feel like someone will clone the tech and let you self host, before big players copy it natively.
It’s a commodity space and they’re starting with a moat of a very busy 2 weeks from some Staff engineers at AWS.
by h05sz487b on 12/21/2024, 6:46:24 PM
Just you wait, I am launching S1 next year!
by Lucasoato on 12/22/2024, 4:52:55 PM
Wow, imagine Debezium offering native compatibility with this, capturing the changes from a Postgres database, saving them as delta or iceberg in a pure serverless way!
by bushido on 12/21/2024, 6:37:21 PM
I wish more dev-tools startups would focus on clearly explaining the business use cases, targeting a slightly broader audience beyond highly technical users. I visited several pages on the site before eventually giving up.
I can sort of grasp what the S2 team is aiming to achieve, but it feels like I’m forced to perform unnecessary mental gymnastics to connect their platform with the specific problems it can solve for a business or product team.
I consider myself fairly technical and familiar with many of the underlying concepts, but I still couldn’t work out the practical utility without significant effort.
It’s worth noting that much of technology adoption is driven by technical product managers and similar stakeholders. However, I feel this critical audience is often overlooked in the messaging and positioning of developer tools like this.
by CodesInChaos on 12/22/2024, 5:40:03 PM
1. Do you support compression for data stored in segments?
2. Does the choice of storage class only affect chunks or also segments?
To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so. But I assume that would require significant engineering effort for applications that require data to be replicated to several AZs before acknowledging them. Though some applications might be willing to sacrifice 1s of writes on node failure, in exchange for cheap and fast writes.
3. You could be clearer about what "latency" means. I see at least three different latencies that could be important to different applications:
a) time until a write is durably stored and acknowledged
b) time until a tailing reader sees a write
c) time to first byte after a read request for old data
4. How do you handle streams which are rarely written to? Will newly appended records to those streams remain in chunks indefinitely? Or do you create tiny segments? Or replace and existing segment with the concatenated data?
by johnrob on 12/21/2024, 6:49:16 PM
This is a very interesting abstraction (and service). I can’t help but feature creep and ask for something like Athena, which runs PrestoDB (map reduce) over S3 files. It could be superior in theory because anyone using that pattern must shoehorn their data stream (almost everything is really a stream) into an S3 file system. Fragmentation and file packing become requirements that degrade transactional qualities.
by bdcravens on 12/21/2024, 7:16:52 PM
My first thought: "introducing? The S2 has been out for a while!"
https://www.sunlu.com/products/new-version-sunlu-filadryer-s...
by nextworddev on 12/21/2024, 10:38:00 PM
This is cool but I think it overlaps too much with something like Kinesis Data Streams from AWS which has been around for a long time. It’s good that AWS has some competition though
by jcmfernandes on 12/21/2024, 6:06:48 PM
In the long-term, how different do you want to be from Apache Pulsar? At the moment, many differences are obvious, e.g., Pulsar offers transactions, queues and durable timers.
by behnamoh on 12/21/2024, 6:38:09 PM
so the naming convention for 2024-25 products seems to be <letter><number>.
o1, o3, s2, M4, r2, ...
by bawolff on 12/21/2024, 8:56:11 PM
In terms of a pitch, i'm not sure i understand how this differs from existing solutions. Is the core value proposition a simpler api?
by adverbly on 12/21/2024, 6:45:03 PM
Seems really good for IoT no? Been a while since I worked in that space, but having something like this would have been nice at the time.
by cultofmetatron on 12/22/2024, 10:00:32 AM
I had an idea like this a few years ago. basicly emitting a stream interface to a cloud based fs to enable random access seeking on bystreams. I envisioned it to be useful for things like loading large files. would be amazing for enabling things like cloud gaming, images processing and CAD
kudos for sitting down and makin it happen!
by siliconc0w on 12/21/2024, 9:55:10 PM
Definitely a useful API but not super compelling until I could store the data in my own bucket
by ComputerGuru on 12/21/2024, 7:04:56 PM
So is this a "serverless" named-pipe-as-a-service cloud offering? Or am I misreading?
by nyclounge on 12/22/2024, 3:47:32 AM
How is this compare to https://github.com/deuxfleurs-org/garage ?
Seems like there are a lot of more lite weight self-hosted s3 around now days. Why even use S3?
by unsnap_biceps on 12/21/2024, 9:59:01 PM
I really liked the landing page and the service, but it took me a while to realize it wasn't a AWS service with a snazzy landing page.
by dragonwriter on 12/22/2024, 5:59:21 PM
Apparently this is “S2, a new S3 competitor” not “S2, the spatial index system based on heirarchical qaudrilaterals”.
by zffr on 12/22/2024, 4:31:55 AM
How does this compare to Kafka? Is the primary difference that this is a hosted solution?
by tdba on 12/21/2024, 5:56:39 PM
Is it possible to bring my own cloud account to provide the underlying S3 storage?
by rswail on 12/22/2024, 9:01:18 AM
Really interesting service and bookmarked.
I'd really love this extending more into the event sourcing space not just the log/event streaming space.
Dealing with problems like replay and log compaction etc.
Plus things like dealing with old events. Under GDPR, removing personal information/isolating it from the data/events themselves in an event sourced system are a PITA.
by kdazzle on 12/21/2024, 7:03:26 PM
Would this be like an alternative to Delta? Am I thinking about that right?
by nikolay on 12/23/2024, 4:36:12 AM
Pretty bad branding! It should have at least been S4!
by BaculumMeumEst on 12/21/2024, 6:45:10 PM
S2 is, in my opinion, the sweet spot of PRS's lineup.
by ThinkBeat on 12/21/2024, 10:38:39 PM
This would sell much better is was S5 or S6 next level thing.
Wow man are you stil stuck on S3?
by locusofself on 12/21/2024, 11:04:26 PM
"Making the world a better place through streamable, appendable object streams"
by somerando7 on 12/22/2024, 12:09:17 AM
Scribe aaS? ;)
by aorloff on 12/21/2024, 10:03:08 PM
Kafka as a service ?
by ms7892 on 12/21/2024, 6:05:06 PM
Can someone tell me what does this do? And why its better.
by revskill on 12/21/2024, 6:38:04 PM
Serverless pricing to me is exactly like the ETH gas pricing !