Amazon’s AWS S3 is a versatile, economical, and safe way of storing data objects in the cloud. The name stands for “Simple Storage Service,” and it provides a simple organization for storing and retrieving information. Unlike a database, it doesn’t do anything fancy. It does one thing: letting you store as much data as you want. Its data is stored redundantly across multiple sites. That makes the chances of data loss or downtime tiny, far lower than they would be if you used on-premises hardware. It has good security, with options to make it still stronger.
S3 vs. other services
S3 isn’t a database, in the sense of a service with a query language for adding and extracting data fields. If that’s what you want, you should look at Amazon’s RDS. With RDS, you can choose from several different SQL engines. Alternatively, you can host a database on your own servers, with all the responsibility that entails. S3 is more economical than RDS if you don’t need all the features of a database.
S3 also isn’t a full-blown file system. It consists of buckets which hold objects, but you can’t nest them inside other buckets. For a general-purpose, hierarchical file system, you should look at Amazon’s EFS or set up a virtual machine and use its file directories. If you set up a cloud VM using a service like EC2, you pay for storage as part of the VM’s ongoing costs.
AWS S3 is optimized for “write once, read many” operation. When you update an object, you replace the whole object. If your data requires constant modifications, it’s better to use RDS, EFS, or the local file system of a VM.
The basics of S3
The organization of information in S3 is very simple. Information consists of objects, which are stored in buckets. A bucket belongs to one account. An object is just a bunch of data plus some metadata describing it. Metadata are key-value pairs. S3 works with the metadata, but the object data is just a collection of bytes as far as it’s concerned.
You can save multiple versions of an object, letting you go back to an earlier version if you change or delete something by mistake. Every object has a key and a version number to identify it uniquely across all of S3.
You can specify the geographic region a bucket is stored in. That lets you keep latency down, and it may help to meet regulatory requirements.
Normally S3 reads or writes whole objects, but S3 Select allows retrieving just part of an object. This is a new feature available to all customers.
Uses for S3
Wherever an application calls for retrieving moderate to large units of data that don’t change often, S3 can be a great choice.
- Backup: S3 can hold a backup copy of a website, a database, or a whole disk. With very high durability, it gives confidence your data won’t be lost.
- Disaster recovery: A complete, up-to-date disk image can be stored on S3. If a disaster makes a primary server unavailable, the saved image is available to launch another server and keep business operations going.
- Application data: S3 can hold large amounts of data for use by a web or mobile application. For instance, it could hold images of all the products a business sells or geographic data about its locations.
- Website data: S3 can host a complete static website (one which doesn’t require running any code on the server). To set it up, you tell S3 to configure a bucket as a website endpoint.
Access control and security
Buckets and objects are secure by default, and you can make them more secure by applying the right options. You have control over how they’re shared, and you can encrypt the data.
The system of bucket policies gives you detailed control over access. You can limit access by account, IP address, or membership in an access group. Multi-factor authentication can be mandated. Read access can be open to everyone while write access is restricted to just a few users. If you prefer, you can use AWS IAM to manage access.
For additional protection of data, you can use server-side or client-side encryption. That way, even if someone steals a password and gets access to your objects, they won’t be able to do anything with them.
Pricing
The cost of S3 storage depends on how much you use, and it varies by region. New AWS customers can use the Free Tier to get 5 GB of storage, 20,000 get requests, 2,000 put requests and up to 15 GB of data retrieval. The Free Tier is available for one year.
In the United States, the first 50 terabytes of S3 standard storage are available for $0.023 to $0.026 per GB per month, depending on the region. The price per GB drops slightly for higher usage levels. Taxes are additional.
There is a cost for reading and writing data in addition to the storage cost. In accordance with the “write once, read many” philosophy, requests that retrieve data are much less expensive than ones that write it. Retrieving costs just $0.0004 per thousand requests in most of the US, while writing data costs $0.005 per thousand requests. An infrequent access option is available, which costs less for storage but more for access.
Getting started
If you have an AWS account, setting up S3 usage is straightforward. From the console, select the S3 service. You’ll be given the option to create a new bucket. You need to give it a unique name and select a region. There are a number of options you can then choose, including logging and versioning. Next, you can give permission to other accounts to access the bucket. The console will let your review your settings, after which you confirm the creation of the bucket.
Next, you can upload objects to the bucket and set permissions and properties for them. If you’re using S3 through other AWS services, you may never need to upload directly. You’ll still want to check the S3 console occasionally to verify that your usage and costs are in the range you expected and that bucket authorizations are what they should be.
When deciding whether S3 is the best way to handle the storage for your application, evaluate how it stacks up against your needs. If you don’t require a full file system and you don’t need to rewrite data often, S3 can be a very cost-effective choice. It provides high data availability and security at a very reasonable price.