• by hodgesrm on 2/3/2022, 2:46:40 AM

    Try ClickHouse. You can set it up on a laptop, in a VM, or get a trial on one of the cloud services. You didn't say which RDS database you are using but ClickHouse can read data directly from MySQL and PostgreSQL. Pull it into tables on ClickHouse and try out your analytic queries to see if it seems reasonable.

    100s of millions of rows is relatively small for ClickHouse, so you don't need to worry about advanced optimizations like compression codecs or materialized views. Standard datatypes and the default LZ4 compression should be just fine. Y

    Here's quickstart video that will teach you most of what you need to get started: https://www.youtube.com/watch?v=phTu24qCIw0

    I work on ClickHouse at Altinity and created the education video. I'm kind of biased. ClickHouse is not the only OSS data warehouse. There is also Druid and Pinot. I think, however, for this particular case where you are pulling from RDS that ClickHouse will turn out to be a good solution.

    Good luck whatever approach you choose!

  • by edmundsauto on 2/2/2022, 11:46:05 AM

    Clickhouse for storage, airflow for orchestration. Dbt for reusability in your models and transforms. There are a bunch of decent extractors that have prebuilt connectors to pull data from third parties.

    Most of all, though, you should think through how users will discover data, and how you will check quality. Once your transforms start getting complicated, you’ll want this (and maybe a metric store).

    Since your data isn’t huge, build things so you can rerun in a different architecture, should you outgrow your setup. Keep a certain level of raw data where possible in AWS, make sure your dbt build works from scratch.

    While your data size is small, focus on delivering value from as simple an architecture as you can get away with. If you can rebuild your warehouse because you’re using standards, the underlying tech doesn’t matter as much as the processes and culture you establish.

  • by brudgers on 2/2/2022, 4:29:44 PM

    Have you considered hiring a consultant experienced in solving this class of problem? This is ordinary business process stuff, not your company's core competence.

    And it is not something your team has experience doing (presumably since the question).

    Because it's all AWS, this is commodity work. Go to Linkedin and find someone to talk to. Good luck.

  • by seektable on 2/4/2022, 7:49:16 PM

    > - Reporting (Metabase, Superset, Quicksight?)

    SeekTable may be useful for tabular reports, also it is good for embedding.