Guest blog post.
Today, people are no longer looking to reduce their data consumption. In fact, if anything, they want more data originating from more sources and with more diversity than anyone could have ever imagined. As we pioneer a world where data can be digested easily, software solutions need to be engineered so they can expand to meet the customers demand. Increasingly, and because of this trend, more and more software developers are creating solutions using an elastic cloud infrastructure. In a report entitled, Elasticity in Cloud Computing: What It Is, and What It Is Not, authors Nikolas Roman Herbst, Samuel Kounev, and Ralf Reussner define cloud computing elasticity as “the degree to which a system is able to adapt to workload changes by provisioning and deprovisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible.” Put simply, this flexibility allows software teams to change the state of their deployment in AWS (Amazon Web Services) so it can mirror their specific requirements, whether it is to spin up instances for a new deployment, bringing back “Spark Workers” to increase the power of query engines, or to shut down a dev-only deployment at the end of the day. This is primarily motivated by providing customers an optimal experience with respect to performance and allocating resources as efficiently as possible. From a business perspective, the elastic cloud can expand or contract as data consumers do their work meaning the organization can add users, increase projects, dump in more data, remove users, and change sources without worrying about things such as licensing, capacity and cost.
When it comes to deciding if an elastic cloud infrastructure is the right approach for your development team, it is best to start by making some key decisions well before writing any code and answer questions such as what language do we use, and is the app specific to one use case, or should it be general to any interaction we want to have with AWS? Questions should also address whether it should live as a long-running service or an on-demand app that runs the command it is given? We recently embarked on similar project where we opted to apply this elegant design to Paxata’s adaptive data preparation solution. In a production environment that stringently employs the use of unit tests and continuous integration practices, here are some of the things I learned along the way:
1. Develop for today, but think for tomorrow.
Previously, all of our AWS-facing code had been written in Python, so originally the thought was to write General Cluster with Python. AWS’s boto library is widely used with a very active community, so this seemed like an enticing option.
However, a couple considerations took precedence in the final decision:
A Python (or any other non-compiled) app could be easily modified, but not as easily tested. Since General Cluster might have control over the entire cloud infrastructure that a solution runs on, it was prudent to make sure any significant changes to the app had to go through a couple of hoops to be deployed. Writing General Cluster in Java was another option as it could easily integrate with other applications that were written in some combination of Scala and Java. This gave us the option of re-using any relevant code that had already been written.
Eventually, we settled on a Scala implementation on top of AWS’s Java API. Amazon’s Java developer community is pretty strong, and even though it didn’t afford some of the conveniences of boto’s implementation, it was easy to write an abstraction on the Java API in Scala which provided us the opportunity to extend capabilities in the future.
2. In the absence of good documentation, code can be just as good.
Like the early pioneers who managed to get to the West without GOS we soon learned that the AWS API isn’t terribly well documented. Some of the most common use cases are described, but there wasn’t much written help to address the granularity of control that we wanted,. There was an API reference, which proved marginally useful, but what really helped was that the code was written very explicitly.
Every command to EC2 takes one object, a “request” object, and these are easy enough to make. This means that instead of passing any of the parameters into the command you want to execute, you should set your parameters as attributes on the object, and send it to the command. Similarly, every AWS Command - even those that didn’t have any useful return value - returns a “result” object, from which you can extract either the information you requested, or an indication of what request you made was actually executed with the right behavior. The API itself is stateless, so if you want to change an attribute of an Instance you created you must send that Instance (or its ID) as part of another request to the system.
While it might seem verbose, it is actually an ideal way to understand what is really going on under the hood. Every piece of information that could be sent to a command is enumerated right there for the development team to inspect and play with, and being able to play with each command on the fly can help the team understand all the different ways that command could be interpreted in the cloud environment. This is especially useful when writing error-handling code, because IT is aware of all the potential ways their requests could error so they are able to account for each of them.
But let’s face it, this is no substitution for good, thorough documentation. While the way the code was structured can allow you to build General Cluster without written help, there will always be occasions where good documentation is required.
3. Don’t Be Afraid to Deploy (as you’ll get a better understanding of what to do next when you can see it in action)
My company deployed the General Cluster only one month after starting development. It lived as a small command-line JVM app that took very basic commands, most of which fit into only a couple of use cases. However, it did work. In the spirit of the motivation for this project, we knew that we weren't done; the first pass of the app was messy and untested, and in a couple of minor cases, it flat out did the wrong thing. But we were able to use it within our own operation team, and within the first week or so it was easy to see that choosing to write and deploy General Cluster as a production application was paying off.
Although it doesn’t hurt to validate the approach, deploying sooner rather than later made it much easier to see and triage future priorities. Some bugs and pain points were far more important than others, but our past prioritization had no context. Fortunately, running in the cloud as an internal tool provided that.
General Cluster has grown since then. After a rigorous QA process, it is now a persistent service running on Typesafe’s Play Framework. It keeps tabs on clusters of instances that it has spun up, and it’s resilient to the various unexpected behaviors we’ve grown to expect from the Cloud. But like the term elastic suggests, over time we expect to continue development so that we can do even more interesting things within our cloud-based Infrastructure.
About the Author: Rishikesh Tirumala is an Infrastructure Engineer at Paxata.