• by VirusNewbie on 12/14/2023, 8:14:46 PM

    You can do quite a lot with cloud. A lot of regular company data engineering is not steady state load, it's bursty with having to churn through ancient data on occasion for financial reasons, audit reasons, migration to new systems, etc.

    My previous company did a lot of work to move to BigQuery, which really does work quite well for data we needed to regularly access, and for things that were more rare we'd just store in GCS.

    We used Apache Beam/Dataflow to do the imports/exports and the occasional custom script for data munging when necessary.

    At one point we needed hundreds of nodes to do some data transformation from on prem to cloud, but on average we only needed a handful of nodes running much smaller jobs.

  • by ensemblehq on 12/14/2023, 1:13:17 PM

    Neither. Although less hyped, at the enterprise level, there are still plenty of Hadoop/Spark implementations. Organizations are just trying to decide on whether to move these implementations to the cloud, retain some form of on-prem to retain control, or migrate to other platforms like Snowflake.

  • by hnthrowaway0315 on 12/14/2023, 12:14:25 PM

    I think it's less hyped but still widely used. There are new tools such as flink as well.

    For analytic transformation Snowflake, BigQuery and other modern column based DB are fitting the bills too so that's probably why they are the new hype.