AWS have their own network interface for applications requiring high inter-node bandwidth - Elastic Fabric Adapter or EFA - which they use instead of infiniband. In December AWS launched the 'n' versions of the c6g instances, representing enhanced networking, with the c6gn.16xlarge (64 core) instance being EFA enabled and offering 100Gbps network bandwidth.
Just over a month ago I took a quick look at the relative performance of the c6g instances to see if there was any value in running more smaller instances compared to fewer larger ones. The test case was a simple ~33 million cell case of the OpenFOAM motorbike tutorial model run on 256 cores for 500 iterations, using parallel cluster (version 2.8.1 which may be important later) to spin up and shutdown compute nodes as needed. The results are summarized below - it seemed there was extra value to be had from running on the c6g.2xlarge (8 cores) instances as these provided a ~30% reduction in solve time and therefore cost. Maybe that there would be more inter-node communication on the smaller instances was offering some benefits over the 8 memory channel limitation on the 64 core nodes, but it was odd that the performance was the same for all instance types except a step change for the cg6.2xlarge nodes.
AWS have their own network interface for applications requiring high inter-node bandwidth - Elastic Fabric Adapter or EFA - which they use instead of infiniband. In December AWS launched the 'n' versions of the c6g instances, representing enhanced networking, with the c6gn.16xlarge (64 core) instance being EFA enabled and offering 100Gbps network bandwidth. With this I took a slightly deeper look at the scaling performance of the standard and 'n' instances to see how much EFA helped the scaling, and to see if there was still something in the use of more smaller (lower core count) instances, this time concentrating on the 16xlarge and 2xalrge instance types, but looking at performance over a range of core counts using the same setup as before but with parallel cluster 2.10.1 (needed to access EFA functionality on the c6gn.16xlarges). The instances are summarized below.
Instance vCPUs Memory(GB) Network Bandwidth(Gbps) Spot Price($)
c6g.16xlarge 64 128 25 1.2
c6g.2xlarge 8 16 <=10 0.15
c6gn.16xlarge 64 128 100 (EFA) 1.2
c6gn.2xlarge 8 16 <=25 0.15
The performance scaling with core count is shown below. Without EFA the 16xlarge instances (64 cores) fall off ideal scaling sharply above 256 cores, but with EFA the performance is close to ideal out to 1024 cores. The 2xlarge instances fall off their ideal curve quickly, but give noticeably better performance for the same core count at 64 and 128 cores. Interestingly, the situation at 256 cores is very different from that seen in the previous tests, where the 2xlarge instances are now slower than the 16xlarge.
Looking from a cost perspective we see that the c6gn.16xlarge with EFA are essentially the same cost all the way out to 1024 cores - if you are looking for fast turnaround, these are clearly the best option. The c6gn.2xalrge looks to be half the cost at 64 cores so if you don't need the speed then these look good value.
It's interesting to note the performance of the smaller instances. There's a significant shift between the tests - this may be due to the variable network bandwidth for these instances and it could have been significantly different in each test. That the c6gn.2xlarge has a higher limit (25Gbps) compared to the c6g.2xlarge (10Gbps) was reflected in slightly faster performance and thus lower cost. The lower core count instances may not be consistent in performance terms but they are certainly worth a look from a cost perspective.
To finish, it's worth noting that the spot price is currently the same for the c6g.16xlarge and the c6gn.16xlarge, making them cost equivalent. The AWS blog linked at the top shows the c6g.16xlarge to be slower but up to 37% better for price performance than the x86 based c5n.18xlarge - it would be neat to see how the c6gn.16xlarge competes.