I'm looking to set up some virtual machines to host data gateways for refreshing my data. Is there an optimal set up that keeps costs down with consistent performance? I plan on refreshing maybe 5 times a day? Up to 8.
5 to 8 refreshes a day? You barely need a single VM for that.
Here's what we use in our company (in addition to all the rogue single-VM gateways):
- gateway cluster with four VM members, geographically dispersed for BCP reasons
- each VM has 8 cores, 32GB RAM, 100GB free disk space
- all VMs are running Server 2012 R2 (meh) with all the latest and greatest patches applied and all required database drivers installed (Oracle, MySQL, PostGres, Vertica, HortonWorks etc etc.)
- extensive monitoring to make sure all cluster members get their fair share of work and that refresh queries complete in reasonable amount of time
This setup supports about 150 connections and about 500 refreshes per day with some breathing space. The Gateway Admin console is a surprising bottleneck as it gets really slow after you add 100 or so connections.
It can be difficult to accurately estimate the right size. We recommend that you start with a machine with at least 8 CPU cores, 8 GB of RAM, and multiple Gigabit network adapters. You can then measure a typical gateway workload by logging CPU and memory system counters. For more information, see Monitor and optimize on-premises data gateway performance.
You could try starting with a D2 v3 which is about $150/month. Might be OK. Worst case, go with a D8 v3 for $600/month. If you only run it during business hours (8 hours a day) those costs are 1/3rd or less (no weekends)