Case Study Overview
Our team designed a cloud-based solution using AWS Lambda, AWS S3, Docker Containers on AWS Fargate, AWS Step Functions, AWS API Gateway, AWS RDS, and Slack integration in order to allow for intermittent, heavy compute by a company analyzing data collected via drone for the agriculture industry.
Project Background
Our team recently put together a design to solve a pretty interesting problem that an innovative business sent our way.
This firm specializes in analyzing data collected by remote sensors and drawing conclusions from it. They primarily serve the agriculture sector.
An example of the conclusions they can draw using their proprietary algorithms is an index of plant growth on a farm. Using sensor data, they do image analysis to come up with a measure of how crop growth is coming along during the growing season.
They have a number of such algorithms that they have developed in-house. When they develop an algorithm, they first do it on their own computers. Then, when they feel it is ready for a test run, they publish it to shared resources and run it with client data. It is this process they were looking to automate.
They did not have an easy way to publish their algorithms, upload client data, and see the results. They wanted to be able to do this regardless of whether they were on or off network. Additionally, they wanted to spend as little time and money as possible on buying hardware, installing and patching operating systems, and maintaining servers. Finally, they expected that their algorithms would take a lot of computing power to run, but they didn’t run them all that often. This all pointed to a design using cloud services.
Design
AWS is our cloud service provider of choice since they hold the market lead in the space and their services are varied, reliable, and work well together. Since our client needed intense but intermittent computing power, we decided AWS Lambda and containerization would be at the center of our design. If you need a more thorough explainer of what AWS Lambda is and the purpose it serves, check out our earlier blog “What is AWS Lambda?“.
The design, pictured below, involves AWS Lambda as the glue that looks for new data, executes a containerized algorithm, and distributes the results to a couple of other services.
The diagram above was generated using Cloudcraft, our go-to tool for visualizing cloud architecture built around AWS.
Use of the system goes like this:
- The data scientist for the drone company publishes his or her algorithm inside a docker container hosted on AWS Fargate, shown as “C4” on the diagram
- They then dump client data into an S3 bucket, which triggers an AWS Lambda job that is monitoring the bucket for changes.
- This feeds into AWS Step Functions that execute the algorithm’s steps, outputting data after each portion of the algorithm (this was a client requirement; the algorithms generate intermediate states and the data from those states need to be recorded).
- The data is outputted to a destination S3 bucket, a Postgres database, an API for extensibility, and a Slack channel to keep the team up to date on the state of the jobs as they are running.
There are several advantages of this design:
- Assuming that the S3 buckets are cleared out, this system incurs very little cost except when it is running. The AWS RDS PostgreSQL database will cost money, assuming you exceed the free tier, but the cost when not running is many times less than you would pay for a virtual machine hosting these services. Amazon provides a pricing calculator for RDS.
- Overall pricing should be much lower than when using a VM. Since price is so low when the system is not running, the overall price will also be lower than the price of a server-oriented architecture, which accrues costs as long as the VM is running. See more details on pricing below.
- Unlike a VM, once these services are connected they should not have to be reworked as long as Amazon supports backwards compatibility between the services. Amazon routinely versions its APIs so that the system should continue to work even as the API evolves. This means for the foreseeable future the thing will run without patching, updating, upgrading, or doing any of the other annoying maintenance tasks that you might be forced to do in a more traditional server-based architecture.
- The services are decoupled and communicate with each other through defined APIs. This means if you wanted to swap out a section of the system with another service, you would be able to do this. For instance, if you wanted to replace AWS Lambda in this diagram with Azure Functions, you would be able to do so with little rework.
Digging Into the Pricing Details
Let’s assume the following parameters for our system:
- 4 data scientists on the team each running a test algorithm every 2 hours every workday of the month
- 60 seconds to run the algorithm
- Consumes 8 GB of memory while running
- 1 GB of data going into and out of the S3 buckets and the RDS instance per run
Our Design:
AWS Lambda provides 400,000 GB-Seconds of compute for free every month, and we fall within that limit, meaning we’ll pay nothing for that. We’ll need about 6 compute hours per month of AWS Fargate, which costs us about $2. The only real significant costs comes from the input/output of S3 and RDS, which totals about $200 per month. Total costs would come in at less than $250/month to host the application.
Alternative Design Using Virtual Machines:
A virtual machine on Amazon EC2 with 4 virtual cores, 16 GB of RAM, and a pre-loaded relational database runs you $3.708 per hour. Note that you need the extra RAM because we’re assuming we need 8GB of compute RAM when running the algorithms, plus we need extra to run the databases and Docker. There are 730 hours in your average month, meaning the virtual machine approach will run you $2,707 per month.
Conclusions
The results are clear. For this instance, where a number of cloud services can be glued together to satisfy an intermittent, heavy compute load, the purely serverless design is superior. It takes less time to maintain, requires no patching, and is about 10x less expensive.
We hope that this example design shows how AWS Services can be stitched together to create flexible, affordable solutions. If you would like to learn more, feel free to contact us.