Data Stores in AWS

Yogendra H J
5 min readDec 26, 2021

--

I have already discussed the storage solutions that AWS provides in my previous articles. Here, I am trying to put all storage services on a single page and help my fellow learners to compare between each service and choose the best for your use case.

What are the storage solutions AWS provides?

AWS S3 - Simple Storage Service, Glacier, EBS - Elastic Block Storage, EFS -Elastic File System, Storage Gateway, WorkDocs, EC2 Databases, RDS -Relational Database Service, DynamoDB, Redshift, Neptune, and Elasticache.

Let’s talk a bit about Data Persistence:

Persistent Data Store - Data is durable, not lost after reboot or power cycles. Ex: S3, Glacier, RDS.

Transient Data Store - Data is just temporarily stored. Passed along to another process or persistent store. Ex: SNS, SQS.

Ephemeral Data Store - Data is lost when stopped. Ex: EC2 Instance Store, Memcached.

Moving ahead with each of the Storage solutions

  1. Amazon S3 - S3 is Object-based storage. It supports a maximum object size of 5TB. S3 supports Cross-Region replication that is useful when you have a security concern, compliance requirement, and reduced latency.

S3 offers different storage classes such as Standard, Standard Infrequent Access, One Zone IA, Intelligent tier, and others. Use Intelligent tiering to make the best use of storage class and reduce S3 cost.

S3 provides data encryption while data is in transit and at rest. You can protect data in transit using Secure Socket Layer/Transport Layer Security (SSL/TLS) or client-side encryption. Protect data in rest using SSE-S3(existing S3’s encryption), SSE-C(customer key), SSE-KMS(AWS managed key management service).

2. Elastic Block Storage - Amazon Elastic Block Store (Amazon EBS) is an easy-to-use, scalable, high-performance block-storage service designed for Amazon Elastic Compute Cloud (Amazon EC2).

AWS EBS is just virtual hard drives that can be used with EC2, and it is tied to a single AZ. EBS provides snapshots that are point in time back up of data. EBS is cost-effective, has an easy backup strategy, can migrate it to a new AZ or Region, and can convert unencrypted volume to encrypted volume.

Data Lifecycle Manager - using this you can schedule Snapshots for volumes or instances every x hours. Allows you to delete Snapshots based on a schedule.

3. Elastic File Storage - EFS is an equivalent service of NFS i.e. Network File Share that we use in our on-prem environment. EFS has more than NFS has EFS is highly scalable, cost-effective, can be mounted from on-prem systems.

Create and configure shared file systems simply and quickly for AWS compute services and do not worry about provisioning, deploying, and patching. Scale your file systems when files are added, removed, and burst to higher throughputs when necessary.

4. Database on EC2 - When you have a requirement to run your own Database and need full control of DB then choose Database of EC2. You can run any Database with full control and flexibility. But, you have the hassle of managing backups, redundancy, patching, and scaling. The best option is when you want to run a Database not yet supported by RDS.

5. Relational Database Service - Amazon RDS is a managed database service for MySQL, Maria, PostgreSQL, Microsoft SQL Server, Oracle, and MySQL-compatible Aurora. This is best suited for the structured and relational data store.

You can use the AWS Database Migration Service to easily migrate or replicate your existing databases to Amazon RDS. Provides automated backups and patching in customer-defined windows.

6. DynamoDB - Managed, multi-AZ NoSQL data store with cross-region replication option. DynamoDB offers built-in security, continuous backups, automated multi-region replication, in-memory caching, and data export tools.

DynamoDB is a key-value and document database that can support tables of virtually any size with horizontal scaling. This enables DynamoDB to scale to more than 10 trillion requests per day with peaks greater than 20 million requests per second, over petabytes of storage.

7. Redshift - Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more.

DynamoDB is extremely cost-effective compared to some other on-premises data warehouse platforms. It has the option to query directly from data files on S3 via Redshift Spectrum.

DynamoDB is the best fit for OLAP - Online Analytics Processing.

8. Elasticache - Amazon ElastiCache is a fully managed, in-memory caching service supporting flexible, real-time use cases. You can use ElastiCache for caching, which accelerates application and database performance, or as a primary data store for use cases that don’t require durability like session stores, gaming leaderboards, streaming, and analytics. ElastiCache is compatible with Redis and Memcached.

Elasticache has two types Memcached and Redis.

Prefer Memcaached when you need a simple and straightforward caching service. When you need to scale out and in when demand changes. You need to run multiple CPU cores and threads.

Go for Redis when you need encryption, complex data types, Pub/Sub capability, Geospatial indexing, Backup and restore.

Conclusion

AWS offers multiple services for your data storage and optimization but you being a Cloud Engineer or Solutions Architect pick the right service which meets your basic requirements. It should be fulfilling the AWS Well architected framework, be cost-effective, and most importantly highly available. You should always design your environment considering it will fail and it should be able to run without any interruption.

— — — — — — — — — — — — — — — — — — — — — — — — —

Continuous learning is the minimum requirement for SUCCESS in any field.

I am happy to hear your feedback, suggestions, requests on any topics in my coming blogs.

LEARN and BE CURIOUS!!!!!

Happy Learning,

Yogendra

--

--

Yogendra H J

Learning and Sharing knowledge || Cloud Computing evangelist || AWS SAPro || Azure Admin || Exploring DevOps