Adrian Cantrill’s SAA-C02 study course, 50 minutes: Network Storage section: ‘EFS Architecture’
Elastic File System
This product provides network based file systems which can be mounted within Linux EC2 instances and used by multiple instances at once. The way we were looking to apply this in our upcoming demo would be to store the media for posts outside of the individual EC2 instances. This means that the media wouldn’t be lost when the instances are added and removed. This provides significant benefits in terms of scaling as well as self-healing architecture. It was considered that by doing this we were moving the EC2 instances to a point where they’re closer to being stateless.
Looking at the architecture, it was explained that the EFS service is an AWS implementation of a fairly common shared storage standard called NFS, the Network File System, specifically version four of the Network File System. With EFS you create file systems which are the base entity of the product, and these file systems can be mounted within EC2 Linux instances. Linux uses a tree structure for its file system. Devices can be mounted into folders in that hierarchy. An EFS file system, for example, could be mounted into a folder called /NFS/Media. Also, EFS file systems can be mounted on many EC2 instances, so the data on those file systems can be shared between lots of EC2 instances. EFS storage exists separately from an EC2 instance, just like EBS exists separately from EC2. EBS is block storage, whereas EFS is file storage, but Linux systems can mount EFS file systems as though they are connected directly to the instance.
EFS is a private service by default, it’s isolated to the VPC that it’s provisioned into. Architecturally, access to EFS systems is via mount targets, which are things inside a VPC. Even though EFS is a private service, you can access EFS via hybrid networking methods. So if your VPC is connected to other networks then EFS can be accessed over those. So using VPC, pairing VPN connections or AWS Direct Connect, which is a physical private networking connection between a VPC and your existing on premises networks. So, EFS is accessible outside of a VPC using these hybrid networking products as long as you configure this access.
EFS runs inside of a VPC. Inside EFS you create file systems and these use POSIX permissions, which is a standard for interoperability which is used in Linux. So a POSIX permissions file system is something that all Linux distributions will understand. The EFS file system is made available inside a VPC via mount targets, and these run from subnets inside a VPC. The mount targets have IP addresses taken from the IP address range of the subnet that they’re inside. To ensure high availability, you need to make sure that you put mount targets in multiple availability zones. Just like NAT gateways for a fully highly available system, you need to have a mount target in every availability zone that a VPC uses. It these mount targets that instances use to connect to the EFS file system.
If you have an on-premises network, this would generally be connected to a VPC using hybrid networking products such as VPNs or direct connect, and any Linux-based server that’s running on this on-premises environment can use this hybrid networking to connect through to the same mount targets and access EFS file systems.
Some things to remember:
EFS is for Linux only; it’s only officially supported using Linux instances.
There are two performance modes: General Purpose and Max I/O performance modes
General purpose is ideal for latency sensitive use cases, web servers, content management systems, home directories, or general file sharing as long as you’re using Linux instances.
Max I/O can scale to higher levels of aggregate throughput and operations per second but it does have a tradeoff of increased latencies. Max I/O suits applications that are highly parallel. So if you’ve got any applications or any generic workload such as big data, media processing, scientific analysis, anything that’s highly parallel, then it can benefit from using Max I/O. For most cases, go with general purpose.
There are two different throughput modes: bursting and provisioned.
Bursting mode works like gp2 volumes inside EBS, is has a burstball, but the Throughput of this type scales with the size of the file systems. So, the more data that you store in the file system the better performance that you get.
With provisioned you can specify throughput requirements separately from size. So this is like the comparison between gp2 and io1. With provisioned you can specify Throughput requirements separate from the amount of data you store, so that’s more flexible, but it’s not the thing that’s used by default. Generally you should pick bursting.
There are also two storage classes available: Standard and Infrequent Access (IA) Classes
IA is a lower cost storage class which is designed for storing things that are infrequently accessed. So if you need to store data in a cost-effective way but you don’t intend to access it often then you can use Infrequent Access
Next there is standard and the standard storage classes are used to store frequently accessed files, and it’s also the default; you should consider it the default when picking between the different storage classes. Conceptually these mirror the tradeoffs of the S3 object storage classes. Use standard for data which is used day-to-day and infrequent access for anything which isn’t used on a consistent basis.
Lifecycle policies can be used with classes, and to move data between classes.