Archive for the ‘infrastructure’ Category

Big Data: SQL Planning & Migration to Spark and Hadoop

without comments

I was in a meeting the other day discussing a problem that a client keeps running into. They need a platform to analyze trends in a rapidly growing data set, where the criteria is changing as fast as their business is changing, which as it turns out, is pretty fast. Right now they are storing the data in a relational database and writing complex SQL queries to mine information from it. The DBA told us that he would run a query and then go to lunch, hoping it would be done by the time he gets back. They need the results faster, and they know that their problem is just going to get worse as the data grows.

The kneejerk reaction to a problem like this is to get a bigger database server. Sure, this may help right now when the data is only a few hundred gigabytes, but what happens when we are dealing with a few hundred terabytes? A few hundred petabytes? This kind of solution just does not scale.

The real answer here is to step back, examine the problem, understand what the goal is, and then design a process that can achieve that goal. In this case, the problem is that a business needs to be able to understand patterns and trends in a rapidly growing data set. The goal is to be able to do this quickly and consistently even as the data grows. One process that can achieve this is by using something like Hadoop or Spark to build a cluster that can scale as the data scales.

There were concerns as soon as I brought this up; What about the schema? How do you write SQL for that? Why not just shard the database? Some of these concerns may be valid, but I feel we must evaluate this without emotion. Do people want to use the relational database because it is a better solution for the problem or because they feel comfortable with it?

I’m not sure it’s accurate to say that we are facing new problems these days, but the shape and size of our problems have changed. Now even the smallest company has something to gain from working with big data– anyone with a credit card can spin up a compute cluster. We should not be afraid to change our tools as our challenges change.

Technology is continuously evolving. This means our tools are continuously changing and so must our processes for tackling new challenges. I believe that the system we came up with in that meeting will be the one to solve our client’s problem. If someone gave us the same problem five years ago or five years from now we would probably have wildly different suggestions, but we would come to those suggestions in the same way: through deep understanding of both the problem and the technology available.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

October 11th, 2011 at 9:19 am

Moving Beyond MDM for Custom iOS Solutions

with 2 comments

iOS logoI’m really excited about several new iOS development and deployment projects that we’ve been working on at CG. We’re working closely with Apple on a bunch of solutions: at the most basic level, we’re building solutions for security and management of employee iPad and iPhone use; at the other end of the spectrum, we’re helping to realize visions such as a kiosk-like platform of thousands of iPads deployed in retail environments around the country.

We’ve learned a ton about what is and isn’t possible as we strategize ways to scale to thousands of units. Here are some of the challenges we’ve come across:

  • How do we deploy and support iPads – whether ten or ten thousand – in a secure, efficient, and centralized way?
  • How can we architect kiosk-like application experiences on the iPad, enabling us to design and curate the customer experience, while also allowing a true iPad experience complete with app-switching, web browsing, Facebook-checking, game-playing, and movie-watching?
  • What kind of network and server architecture is needed to support a platform of iOS devices across the globe? How do we enable caching and pushing of dynamic data to the devices – particularly large amounts of media content?

Centralized deployment and support of iOS devices

How do we deploy and support thousands of iPads or iPhones in a secure, efficient, and centralized way? Mobile Device Management (MDM) platforms like AirWatch, Casper, MobileIron – and soon, OS X Lion Server – allow us to push XML configuration profiles to iOS devices. This enables centralized inventory and basic management of the devices: from what version of iOS they have installed, to some security control over how/if users can install and delete apps. For many enterprise customers, these tools are useful for administering security policies on employee-owned iOS devices. But for custom platforms like kiosks and retail experiences, MDM is not ideal due to the need for end-user interaction. What we need is a way to easily restore iOS devices back to their “golden” state in a centrally managed way.

We’re excited about the potential of over-the-air restores and software updates coming in iOS 5, but as of today, iTunes is the only game in town for this. Working within this limitation, we’ve architected some innovative solutions that enable iOS devices to connect to iTunes virtually over USB to IP converters and a content distribution infrastructure. Until iOS 5, this is a good option to have, and I haven’t heard of anyone else embracing this approach.

Rearchitecting Apple’s iOS user experience

Put an iPad in front of someone and they’re going to tap, scroll, pinch, and squeeze the user interface. The user experience is still the leader in the tablet space – though we’ve been recently impressed by the BlackBerry PlayBook. For a project we’re working on now, we want to encourage this user experimentation and interaction, while locking down some important components of the UX. Things like App Store purchases, iTunes downloads, deleting apps, rearranging icons, and changing the home screen wallpaper will quickly affect the kiosk experience. MDM solutions can help disable some of these features, but the aforementioned need for user interaction just doesn’t work for specialized user environments.

One solution we’ve had success with is a combination of custom code to disable user customization of the Springboard, plus a WebKit-based Safari replacement for browsing that enables us to prevent user download of unauthorized content. Combine these with some configuration profile-based customization of iOS and we have a good solution for locking a customer experience down and reducing the frequency of unit restores or reimaging.

The CG approach to iOS projects

Part of what makes CG stand out as a solution provider is our deeply embedded collaboration between our application development team and our infrastructure team. As the Enterprise’s appetite for customized mobile platforms and experiences grows, we’re uniquely suited as a technology partner to build and innovate on our customers’ vision. iOS is at the core of this vision and I couldn’t be more excited to be working with these technologies today. Plus, iOS 5 is on its way and it’s shaping up to be a giant leap forward!

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Charlie Miller

June 13th, 2011 at 8:00 am

Configuring Machines in the Cloud: Our Approach

without comments

We’ve done a lot of work recently to revamp the way we deploy computers in the cloud and I wanted to share a little bit about how we’re doing this at a pretty low level to give you an idea of how we are approaching this. Our software and processes are cloud agnostic, but we mostly work with Amazon Web Services because we feel that they offer the best solution for most of our clients at this time.

We maintain two base Linux images as part of our cloud toolkit. The only difference between the two images is their architecture. One is 64bit and the other is 32bit. The images are minimal– they have just enough software and configuration to get them off the ground and configured. We have copies of the images in each region in Amazon, but when it comes to maintenance and upgrades we really only deal with the two master images. All of the computers that we deploy in EC2 come from these two images.

The base image by itself is not very useful. When a computer is instantiated from one of the images, our toolkit combines it with our Puppet repository and some instance specific configuration. The Puppet repository contains the Puppet manifests for how we deploy software. The repository is where we store our collective knowledge around deploying successful software. The instance specific configuration is crafted by the developers and operations teams to pick and choose the appropriate things from the Puppet repository provide the very specific configuration about how to deploy the server and the application that will run on it. As the instance boots, it configures itself, installing the software and making the changes required to bring it into service.

This is all pretty low level, but it provides some capabilities that makes our solution very flexible:

  • With only two images to maintain, keeping software up to date is simple. We anticipate that we will be releasing new images about once a quarter to capture any updates to the packages in the base system.
  • Everything is version controlled. It is easy for us to see what a machine looked like on a specific date or understand the changes that have been made to how the software is configured on an instance.
  • The instances are very self sufficient. There is no single point of failure that would prevent instances from starting correctly.
  • This is all very portable. With just a little bit of work we can deploy things in a different region of Amazon. Also, our Puppet code and instance specific configurations can work in more places than just Amazon. With a little bit of work to recreate the base images in another platform we can consistently and predictable recreate infrastructure anywhere, giving our clients the ability to choose the right solution for them.

This last item is something that should be on everyone’s mind (especially considering the outage at Amazon last week). As Steve said last week, everything fails and you need to design your infrastructure and applications around that. A process for redeploying your infrastructure in another AWS region or a different cloud is an important part of building a very reliable service in the cloud. It is hard to say what the next kind of failure in the cloud will be like, but with a process like ours we can be ready to deal with outage when it happens.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

April 25th, 2011 at 9:00 am

Everything Fails Sometime

with 2 comments

Control Group designs cloud-based solutions with the philosophy that every system fails at some point. Embrace this chaos and build for the rainy day. Today we are seeing some major outages on Amazon’s us-east-1 region. Reddit and Quora are two of the high profile victims, but this is affecting everyone in a very popular data center.

You can design around regional performance degradation though. Years ago, having global traffic management in place was an expensive pipe dream. Today you can easily turn up another EC2 region and use a service like Dynect or Akamai GTM to provide failover and/or load balancing. Even better, consider making your systems portable so you can have multiple cloud providers and maintain your machines and applications with Puppet.

3-5 years ago this would have taken a year of planning, purchasing and hands-on labor to implement two data centers. Earlier this year we were able to create two data centers with complex infrastructure on EC2 and active/active load balancing in under two months and for a fraction of the cost.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stephen Croll

April 21st, 2011 at 2:13 pm

The Public/Private Debate

without comments

I thought Phil Wainwright’s most recent article on private clouds (as well as the first in the series) was an interesting perspective. I share many of the sentiments, but can’t go quite so far as to say the idea of the private cloud is discredited. In the end it depends on the business and it depends on the applications you are hosting. Even Adrian Cockcroft, the writer of the blog that Phil cites as final proof, has updated his post to say:

“…to clarify, that doesn’t mean that I’m against private clouds or don’t think they exist, because $, FUD and internal politics are a fact of life that constrain what can be done.”

Private clouds, whether hosted or self-hosted, can be useful as stepping stones for organizations that have existing applications that may not fit into the public cloud architecture. Some of these applications may require better performance SLA’s. Also, private cloud providers are more amenable to custom arrangements. Try hosting a specialized device like an IPS or IDS in a public cloud where all traffic is guaranteed to only be delivered to the target device.  Many enterprise organizations have decades of IT security policies that won’t and perhaps shouldn’t simply go away in favor of adopting a public cloud.  Public clouds are secure solutions, but some organizations will have additional requirements, like the ability to discern rogue traffic patterns from typical spikes in demand. You can build this into the individual instances and applications, but that isn’t what most companies have done.

We tend to work with a client to find out what their requirements are and stay away from radical statements. The cloud, public or private, is just one more tool and can’t be seen as a solution in and of itself.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stephen Croll

April 17th, 2011 at 10:07 am

Adventures with Enterprise Firewalls, Elastic IP’s and Auto Scaling

without comments

One distinction between our startup and enterprise clients is that enterprise typically brings the baggage of legacy systems. While a startup is designing for a cloud architecture, a company that has a technology history sometimes needs to integrate new systems with existing services.

In a recent engagement Control Group needed to work with a client to have application instances on EC2 communicate with a secured web service in a traditional data center. Typically we would work with a client to move this service to EC2. In this case, because the service is considered to be shared infrastructure that is used and funded by existing applications we needed to design the infrastructure and application to make a call back to a traditional data center.

On a side note, mixed infrastructure approaches are not ideal, but common when migrating complex organizations to IAAS solutions. Most mature IT organizations will shy away from forklifting a company’s technology platforms wholesale into the cloud. The larger the migration, the bigger the bang when something is overlooked. Change too much in an environment and you won’t know where the problems are coming from, so a major part of moving an enterprise customer to the cloud is planning the roadmap of the migration carefully and not being greedy.

One of the technical challenges in this particular project was that the service that we were integrating with requires that traffic originate from a known and registered IP address. Although EC2 will provide an instance with a public IP address, there is no way to know what that address will be ahead of time. We decided to use Elastic IP (EIP) addresses to solve this problem. An EIP functions like a NAT on a traditional firewall. You can allocate the EIP and then associate it with an instance as needed.

EIP’s worked well until we implemented auto-scaling. Auto-scaling groups have no support for associating a pre-allocated EIP to an instance. To implement this we created some scripts that would make the API calls to determine a free EIP and associate it to the instance. (This means that the instance will have temporary access to execute API commands. We’ve designed a fairly secure take on temporarily providing AWS API tools to an instance, but that is a different blog post. Coming soon.)

Here is the real problem with the approach. The script to associate the EIP worked perfectly, so long as multiple machines weren’t executing it at once. The problem is that the Elastic IP API commands do not support a transactional assignment. Worse yet, at least in our use case, it is the last instance requesting the EIP and not the first that gets associated to the IP. This is a major problem if you want to associate EIP’s with members of an auto-scaling group that need to scale up by more than one instance at a time. It will leave you with members of the group that could possibly not have an Elastic IP.

There are a myriad of ways to tackle this issue. We considered options for programatically brokering the IP’s by building an application that would manage the EIP resources. The application would provide an IP on request and then return IP’s that were no longer in use back into the system through a background recovery process. Such a service is pretty easy to write, but it wasn’t in scope for the current project. Also, there are longer-term solutions that we can consider with the launch of the new and improved VPC with NATing.

Interim Solutions

Interim Solutions

The current favored approach is to use a proxy server like Squid to limit the number of servers that require IP addresses. Two or more instances with Squid configured as a forward proxy distributed across multiple availability zones and traffic managed by an Elastic Load Balancer to provide HA would provide a redundant and fairly high performance solution. For now, as a work around we have implemented some staggering of the auto-scaling policies as a way to mitigate against multiple instances spinning up at the same time. Staggering is a serviceable solution for testing, but not for production where auto scaling multiple farms of servers that will need access to the client’s data center tier is a requirement. Eventually, we will move forward with the proxy or VPC solution.

In summary, enterprises with complex interdependent applications can lead to interesting challenges when migrating to the cloud. Resources, as simple as IP addresses, can function in a fundamentally different way than a typical IT organization is used to. Oftentimes this can lead to fear, uncertainty, and doubt, but the benefits of Infrastructure as a Service are clear: Ease of provisioning, demand-based resource allocation rather than over provisioning, etc.  As long as proper planning, system architecture, implementation, and testing are performed, a complex enterprise can begin making its way to the cloud and begin to eliminate the FUD on the ground.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stephen Croll

March 31st, 2011 at 2:23 pm

Enterprise Clients Continue To Warm To The Cloud

without comments

Lately we’ve been working with clients that haven’t been the typical EC2 infrastructure consumer. Historically, it has been the startup companies that we work with that have been interested in AWS for all the expected reasons: flexibility, pay-for-what-you-need, access to higher end services like load balancing and HA database deployments, etc. Recently we have been noticing that our more established enterprise clients have taken interest in these capabilities and for largely the same reasons.

Large enterprises looking at cloud infrastructure bring their own requirements and challenges. We plan to write a series of blog posts about Control Group’s experiences with these types of clients and what we learned. Some of the posts will be about the projects and their politics, and some will be about technology approach. There are some interesting technology and organizational challenges that we will discuss, so stay tuned.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stephen Croll

March 23rd, 2011 at 5:25 pm

Posted in infrastructure

Tagged with , , , ,

Rapidly Prototyping Tagatag on Google App Engine

with one comment

Google App Engine is Google’s platform-as-a-service for developing web applications. There’s been some people saying goodbye to GAE, and perhaps in response Google has announced several enhancements to the service.

In the midst of all of this, a few of us at Control Group have been developing Tagatag: an Android and iPhone application for commenting on barcodes that uses web services running on Google App Engine.

Scan this QR code with Tagatag to join the conversation!

Barcodes are everywhere around us. You can find them on advertising, products, places and even people. Tagatag provides you with a virtual paint marker to let you make your mark on all of these codes anonymously. Download the Tagatag app and give it a try. Scan a barcode to see comments people have left for you and then leave some for them.

We chose Google App Engine for the back end of Tagatag for a few reasons:

  • It’s quick – You sign up for an account, download the SDK and you’re developing. The development server in the SDK lets me run the application on my desktop and interact with the code as I’m writing it.  Uploading new versions, rolling back old ones, or performing maintenance is a snap with the GAE dashboard.
  • It’s simple – There’s not much to the web service. It’s small and simple. We used the webapp framework because we didn’t feel we needed anything else. It makes for a very concise application. Believe it or not, there are about 300 lines of code for the GAE part of Tagatag.
  • It’s scalable – We don’t have to worry about what we do when Tagatag becomes popular. We’ll just raise our billing quotas in GAE and let them handle spinning up new instances or expanding the datastore. Knowing that you don’t have to be concerned about scaling makes things a lot more fun.

I’m happy that GAE let us bring Tagatag to you so quickly. So, when it’s available at the end of the week, be sure to download the app, tag a tag and make your mark!

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

December 6th, 2010 at 8:00 am

Goodbye Xserve. Now what?

without comments

Today, Apple announced that the Xserve will no longer be available for purchase after January 31, 2011. What does this mean for existing and future infrastructure that relies on Mac OS X Server and Xsan?

For existing Xserve environments, Apple will continue to provide warranty service and complimentary technical support for the product. This means that all AppleCare service and support agreements should be honored until they expire.

Apple is providing an Xserve Transition Guide with information on options moving forward. They suggest that customers looking for Mac OS X Server solutions move to Mac Pro or Mac mini hardware solutions. We have had great success with these solutions for providing basic services such as file sharing, directory services, and calendaring to small- to medium-sized workgroups.

But what about Xsan environments? Xsans could be built using Mac Pros for metadata controllers, with a few serious considerations — we lose the power redundancy and lights out management (LOM) that Xserve provides. Also, this solution will require 12U of rack space for two Mac Pro servers instead of 2U for two Xserves, which is not very appealing to customers designing server room rack elevations.

This is also an opportunity to discuss alternative SAN solutions, such as Quantum StorNext, which is compatible with Xsan. Control Group has had recent successes in deploying StorNext as an alternative to Xsan, allowing users and organizations to continue to use the Apple tools they are familiar with, such as Final Cut Pro, while leveraging a robust, Linux-based infrastructure in the server room. StorNext has a very rich feature set and does some things that are not possible with Xsan, such as hierarchical storage management.

If you remember, a few years ago Apple discontinued the Xserve RAID storage solution, the IT world panicked, and then Apple announced a partnership with Promise and the Promise Vtrak for Mac solution. Maybe Apple has similar plans for a replacement for the Xserve. Whether they do or not, there are great alternatives to discuss, so if you have questions or concerns, give us a shout.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Charlie Miller

November 5th, 2010 at 11:42 am

Crunched for time? Get in the cloud.

without comments

I am really busy these days, but a bunch of things have just been in the news that I need to comment on.

I’m working on the infrastructure for a new phase of QA testing that we are doing on a product. The infrastructure consists of a variety of physical computers, about fifty in all. Managing and maintaining them is more time consuming than the cloud-based computers I work with. The increased amount of attention and time that physical computers take is why I wonder about these things that I’ve read.

First, New York City has entered a “money-saving partnership” with Microsoft, signing up for some massive licensing. Fortunately this includes some cloud-based infrastructure, but it’s unfortunate that the city did not compare the Microsoft solution with something like Google Apps, or with open-source solutions like Libre Office. Since we are paying the taxes that are being used to pay for these services, shouldn’t we be getting the best deal? So, NYC, please call me when you’re ready to talk about your infrastructure.

Have you ever been shivering from the cold in a data center while waiting on hold for the URL to a service pack because everyone’s email is down? I have, and I never want to do it again. I’m sure no one in the city wants to do it either. Why not let Google freak out about keeping your systems up all of the time so you can do some things that really matter. That’s what the cities of Los Angeles and Washington DC do (along with a lot of other people).

Microsoft is also in the news for something else too: Ray Ozzie, their chief software architect, is stepping down. Ozzie seems like a sharp guy and was behind a lot of good things at Microsoft (yes, this is one of the few times you will hear me complimenting Microsoft). He’s asking his colleagues to “close our eyes and form a realistic picture of what a post-PC world might actually look like, if it were to ever truly occur.’’ Guess what dude — we are in a post-PC world already.

Can I say that  more people are interacting with technology that’s in the cloud via their cellphones than through their PCs? Probably not, but I will tell you that what’s going on in the cloud and mobile space is a lot more interesting than the PC space. Will PCs even be relevant in a few years? We’ll see. Also interesting to note is that these articles indicate that no one will take Ozzie’s place as chief software architect. That makes me wonder about who’s driving the bus there. This probably doesn’t mean MS is going to just dry up and disappear, but will they ever be innovators again?

Well, enough pondering for now, I have to get back to punching power buttons and checking for failed hard drives — things that you never have to do in the cloud.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

October 28th, 2010 at 10:00 am

services people careers press blog contact follow us