Hey all, we are launching Terraform Provider Iterative (TPI).
It was designed for machine learning (ML/AI) teams and optimizes CPU/GPU expenses:
1. Spot instances auto-recovery (if an instance was evicted/terminated) with data and checkpoint synchronization
2. Auto-terminate instances when ML training is finished - you won't forget to terminate your expensive GPU instance for a week :)
3. Familiar Terraform commands and config (HCL)
The secret sauce is auto-recovery logic that is based on cloud auto-scaling groups and does not require any monitoring service to run (another cost-saving!). Cloud providers recover it for you. TPI just unifies auto-scaling groups for all the major cloud providers: AWS, Azure, GCP and Kubernetes. Yeah, it was tricky to unify all clouds :)
Hey all, we are launching Terraform Provider Iterative (TPI).
It was designed for machine learning (ML/AI) teams and optimizes CPU/GPU expenses:
1. Spot instances auto-recovery (if an instance was evicted/terminated) with data and checkpoint synchronization
2. Auto-terminate instances when ML training is finished - you won't forget to terminate your expensive GPU instance for a week :)
3. Familiar Terraform commands and config (HCL)
The secret sauce is auto-recovery logic that is based on cloud auto-scaling groups and does not require any monitoring service to run (another cost-saving!). Cloud providers recover it for you. TPI just unifies auto-scaling groups for all the major cloud providers: AWS, Azure, GCP and Kubernetes. Yeah, it was tricky to unify all clouds :)
We'd love to hear your feedback!