Hacker News Clone

Comparison of Data Lake Table Formats (Iceberg, Hudi and Delta Lake)

by anhldbk on 6/13/2022, 9:41:00 AM with 38 comments

by henrydark on 6/13/2022, 4:10:08 PM
A major problem with these table formats that will surface soon enough is that they use serial numerical ordering for versions.
It's like inventing SVN for data. Soon enough git will have to be invented as well.
by evilturnip on 6/13/2022, 3:31:17 PM
We're currently looking into datalake implementations. Right now, we only have 1 or 2 data sources. Current thinking is reading them on the fly, combine them using pandas dataframe and query that. Anyone have experience with doing something similar?
by anonymousDan on 6/13/2022, 3:51:31 PM
How does the concept of a table here differ from that of a standard relational table (if at all)? Is it that the table is a logical abstraction over a distributed set of files?
by ttunguz on 6/13/2022, 2:04:48 PM
Does anyone have experience running either of these three in production?
by divbzero on 6/13/2022, 5:41:42 PM
Does anyone have good real life stories of how data from a data lake made a real difference in a product or a business?
by venki80 on 6/13/2022, 2:32:09 PM
Wondering if this is basically what all data lakes will look like in the future. All data stored in these table formats…
by pid-1 on 6/13/2022, 11:32:55 AM
The repo comparison was really cool. I guess that could be made into a product.
by diptnt on 6/13/2022, 3:35:12 PM
Thanks for bringing this comparison out!
by hrosen on 6/13/2022, 2:54:44 PM
Helpful to see a concise comparison!
by ajantha on 6/13/2022, 2:49:02 PM
Nicely summarised and visualised :+1
by broberts2261 on 6/13/2022, 7:25:12 PM
Great comparison!