Control Group Blog

Author Archive

Is H.264 the right choice for online video?

with one comment

I wanted to add some thoughts to Chris’s post about Flash and HTML5. However I should preface this post by saying that HTML5 supporting video is really cool, both technically and because HTML5 is an open standard that anyone can implement for free. As we all know, for the last several years, Flash has been the de facto choice for online video delivery. Flash support on different platforms has been pretty good, but end users still don’t have total flexibility depending on their OS. Until recently, Flash on Linux has been about a version behind the release for Windows or OS X. Even now, Adobe only releases a player for x86, and the x86_64 version is unsupported beta software.

Everyone seems to be touting HTML5 video as the “open” alternative to the proprietary Flash plugin required for .flv playback in the browser. But how open is H.264, the codec that powers HTML5 video, and the current pick for encoding video for online delivery? Using H.264 as the codec behind HTML5 video sours things a bit for me. H.264 is encumbered by software patents; to develop or distribute a player or encoder for H.264 you might have to pay a licensing fee to MPEG-LA. Even though MPEG LA announced last week (PDF) that H.264 will remain fee-less for free internet video through 2016, this is not the same as being free or open. MPEG-LA can still go after people that produce the software to encode or decode H.264. And MPEG-LA is not just one organization, it’s a collection of patent holders that have their own agendas.

All this is a bit of a slap in the face to the open standards that power the web. Imagine if you had to pay a half million dollars to create or display JPEGs, GIFs, or HTML… The only people that would be able to afford to make software for the web would be huge companies. But what are our alternatives? Beyond Ogg Theora and Matroska, the pickings are slim. These codecs are open and free, but not necessarily better than H.264. Plus it would be next to impossible to compete with the marketing machine of Apple behind H.264.

Open and free standards have been what has made the Internet successful since its inception. I think it’s important that users understand this so that the Internet of the future cannot be controlled by corporations with enough cash to cover licensing fees.

Written by David Rocamora

February 15, 2010 at 10:41 am

A Look at Amazon’s Elastic Load Balancer

with one comment

The result of Amazon's Elastic Load Balancing?

We have been doing some work with with Amazon’s Elastic Computing Cloud (EC2) which allows us to create virtual machines in the cloud in a few seconds. These are great for hosting websites, and what’s cool about them is that if you get Slashdotted or experience a similar unexpected spike in traffic you can create new hosts immediately. Recently Amazon added a new service called Elastic Load Balancing (ELB) which can distribute load across hosts. We’ve been looking at this for some of our recent development and infrastructure projects.

I just read this description of how ELB works by Shlomo Swidler from his Cloud Developer Tips blog. It’s a great reference.

You pay for ELB by usage just like everything else at AWS. From Amazon: “You are charged at $0.025 per hour for each Elastic Load Balancer, plus $0.008 per GB of data transferred through an Elastic Load Balancer.” For reference, on a deployment project in 2008 our Engineering team used a Cisco load balancer which I imagine cost a few thousand bucks.

Cost isn’t the only advantage. These can be created and destroyed quickly and remotely, allowing us to work more efficiently and spend less time visiting data centers in the middle of nowhere. This leads to improved quality of service for our clients as we can spend more time consulting on future technology growth plans and less time troubleshooting servers in cold, loud data centers.

This blog post brought to you by the iced coffee I am enjoying in the comfort and quiet of my office while deploying virtual machines!

Written by David Rocamora

August 7, 2009 at 11:17 am

Testing Storage Performance with iozone

without comments

As I’ve mentioned in previous posts about testing storage performance with lmdd and bonnie++, different applications require different characteristics from storage to provide the best performance. I’ve highlighted some tests that are good for large streaming files like video, and small file transactions like databases or mail servers. Today I want to look at a tool that runs a series of tests in many different ways to provide you with a holistic view of what the storage can and can’t do.

This tool is called iozone. iozone is open source and runs on a ton of operating systems (including Windows). It runs several tests which can take some time to complete but provide the best overall view of the capabilities of a piece of storage. For instance, iozone runs a write test with files of different sizes and with different size records (the amount of data written at a time). It does this over and over again with writes, reads, random writes, random reads, and so forth. Since it’s running all these tests you can see what sorts of operations will have good performance and which ones will not perform so well. Check out the iozone documentation here.

One really great thing about iozone is that the output it generates can be easily placed in a spreadsheet program like Excel to generate a great 3d diagram describing your storage. Here’s a diagram I generated from some tests on a Linux server.

Results of a write test with iozone

Results of a write test with iozone

This particular server performed quite well with large files and a record size around 1 MB (interesting to note, this is the same storage from the lmdd post. Notice that the parameters I tested with there are the same as the best write that this disk can do according to iozone!).

If you’ve been following my posts on storage performance testing I hope you’ve learned about some new tools that you can use to see what’s going on. I use these on every deployment to make sure we’re giving our clients solutions that they can depend for performance and reliability. As always, let me know if you have any questions about these tools. Happy testing!

Written by David Rocamora

August 3, 2009 at 3:28 pm

Testing Storage Performance with bonnie++

without comments

Last time I posted about checking disk performance with lmdd. lmdd is great for checking streaming throughput, but what if you have a different kind of application? Every application accesses storage in different ways: with video we need to be able to provide constant throughput when writing a lot of data to the disk, but other applications may have different storage needs. For example, a database can make lots of very small changes to the data on disk in a short period of time. The best performing disk for a database will probably need to have very low seek time and good transactional performance.

bonnie++ is a series of file system tests that focuses on small files. It was designed to behave like a mail server does, creating and dealing with lots of small files (emails). bonnie++ is easy to run and outputs a CSV file that you can view with something like Excel. With the bon_csv2html command you can quickly generate html pages from the CSVs.

Here’s the output from bonnie++ running on a server:

The HTML output of bonnie++ on a Linux Server

The HTML output of bonnie++ on a Linux Server

At first glance the output can seem quite cryptic, but if we look close we can see that this provides us a great amount of information about latency and speed on different filesystem operations. I generally run this several times as I make changes to verify that the storage is providing the right performance characteristics. Tweaking a file system to make file system operations happen a few milliseconds faster may seem ridiculous, but in some environments it can make a huge difference.

Next time I’ll post about a tool that’s new to me but can test a disk in so many different ways I’m planning to run it on every system we install from now on.

Written by David Rocamora

July 21, 2009 at 11:07 am

On Being Connected

without comments

Outside My Hotel in Malaysia – Do Not Feed The Monkeys!

I know the last time I posted here I said I’d be following up with another technical post, but instead I thought I’d share an experience I just had as I took a last minute trip for a client.

Normally if I take a trip it’s no big deal. I can write a blog post from where ever I go. My email is online, this blog is online, if I need to access something in my office I can just use our VPN to get connected. To use any of these I’d just need to have an Internet connection. Unfortunately, this wasn’t the case when last week I went to a fairly remote part of Malaysia.

A few coworkers and I were trying to make last minute adjustments to a product that one of our clients is launching. At first I wondered why even send us out there when we can get remote access or talk someone through it on the phone. When I arrived onsite I realized why this wasn’t an option — getting connected is near impossible there. We could head to a coffee shop and get some free WiFi, but with over twenty hops to servers in the United States and a twelve hour time difference, getting anything done was difficult.

The lack of connectivity was challenging. One of my responsibilities was interfacing with the local IT department and writing some scripts to integrate the client’s system with existing systems and processes. I quickly realized how much I depend on online references and documentation. When you can barely get connected to look up the answer to a question about syntax you really have to use your head. Not to mention, each software build for the project is about 300 megabytes and getting this from our office in New York was difficult and time consuming.

The idea of ubiquitous Internet connectivity is something that we take for granted. As connection speeds get faster and more reliable we lose efficiencies that we once had. I learned that the Internet is really an extension of my knowledge and a valuable tool that I need to do my job. Being cut off from it was an interesting and overall positive experience. Solving every problem by thinking and working it through was difficult and took more time, but genuinely figuring things out for myself was very rewarding.

Towards the end of my time there we found a cell phone store that sold GSM modems and prepaid 3G SIM cards that allowed us to get connectivity. While this does make the job a lot easier, I’m glad I had the experience of being mostly cut off from the rest of the net — something that will surely happen less often as the world becomes better connected.

Written by David Rocamora

July 13, 2009 at 4:28 pm

Posted in general

Tagged with , ,

Testing Storage Performance for Video with lmdd

with 7 comments

One of the unique things about how Control Group works is that our focus is much more involved than simply putting in a solution for a client and then moving on. We work with our clients to determine how they work, so we can design IT solutions that really fit their needs. Since we have partnerships with a variety of vendors, we work with our clients to arrive at the best solutions for their business. This means we do quite a bit of research and planning before we begin a project — and then a great deal of testing during and after we install new hardware or software.

I do some work on implementing storage systems for our clients, and we’ve found that different applications have different storage requirements. For example a video post production facility — like the facility at WWE — generally needs lots of disk space that is very good at reading and writing large files at high speeds. The storage here needs to provide good streaming throughput, because high quality video files generally have high bit rates, and are being stored or played back from the disk in real-time for ingesting, editing, or playout. If the storage system is not fast enough to read or write the file in real-time, frames will be dropped. This can cause unsatisfactory media files, programs to crash, or audio and video to become out of sync.

A Sun Fire X4150 I recently configured. That's some serious storage.

10,000 RPM SAS disks. That's some serious storage.

Suboptimal read/write performance can become a huge problem. When we put in a new system this is something we need to test. I usually do the test with a tool called lmdd.

lmdd comes from the lmbench tools which are provided by Bitmover for benchmarking systems. lmdd is great for testing streaming bandwidth. In most of our engagements with video, we install a Stornext or Xsan filesystem so we’ll run our tests against this. lmdd will probably work on any filesystem that you can mount on your Mac or Linux computer (Leave a comment if you need a version for Mac OS X, I have one compiled).  lmdd lets us verify exactly what the maximum number of megabytes per second we can push through the storage and point us to where we need to make changes to the hardware or software configuration. I use lmdd like this :

lmdd of=/path/to/test_file count=1g

lmdd if=/path/to/test_file

The first tests write performance and the second tests read performance. More information about the syntax is available in the manual page for lmdd. The results of the command from a server I was testing looked like this:

2147.4755 MB in 6.8003 secs, 315.7914 MB/sec

lmdd is great because it’s easy to read. This result shows I could write to the filesystem at 315 megabytes per second. That’s really fast! This is from a test with a server with a lot of RAM and a special filesystem that took advantage of that cache. When I run it on my Macbook, I get a result like this:

18342.6171 MB in 376.7685 secs, 48.6841 MB/sec

So the next time you’re interested in how your storage is performing give lmdd a shot and let me know how it goes. If you’re looking for more information about storage performance testing then stay tuned; I’ll be posting about testing storage with tools that benchmark small reads and writes next.

Written by David Rocamora

June 8, 2009 at 9:00 am

Advantages of Storage Networking

with 4 comments

I was recently having a conversation with a friend and we both laughed when we thought back to the first five hundred megabyte hard drives that we had owned. Back then, the half-gigabyte drive was ridiculously expensive and physically huge. We both thought that it would be impossible to fill these drives up.

This of course was not the case. Now you’re lucky if an application can be installed in less than 500 MB, and as hard disk sizes grow, we find new ways to fill them up with applications, documents, and media. Digital files have become the most valued assets for most of our customers, so the organization, storage, and archiving of data is a serious concern.

I find that the best way to evaluate storage technologies solutions for our clients is to step back and take a look at the problems the client is looking to solve and the priorities dictated by their business needs. Usually, our clients’ storage needs require a combination of performance, reliability, disaster recovery, scalability, and manageability. Fortunately technology has stepped up to the challenge to handle the increased need for larger, faster, and more reliable storage.

Storage networking is a general term that encompasses many different technologies that provide excellent solutions to modern storage problems. A storage area network (SAN) is an architecture in which storage devices are connected in a high-speed, dedicated network and are presented to computers that are part of the same network. Using storage networking, we can accommodate our clients’ performance and reliability needs: by abstracting groups of hard drives as logical units (LUNs) we can stripe data across disks to increase speed and add redundancy by storing parity on the disks. This configuration will allow us to rebuild the LUN when a disk fails, without causing downtime or data loss.

Example SAN Configuration for Final Cut Pro Editing

Example SAN Configuration for Final Cut Pro Editing

A storage network abstracts the underlying hardware that provides storage services, providing some great advantages for disaster recovery. When we add tape libraries to a SAN we can make backups quickly and efficiently without slowing down the network or computers on it. We can also connect a SAN to another SAN that’s in a different building or even a different state. This allows us to easily replicate data to a secondary location so our clients can be up and running quickly if there is some kind of catastrophe in the data center.

Even the largest SANs will eventually get filled up with data. What happens when it’s time to increase capacity? With traditional storage, the system is shut down, new equipment is installed, and the data is migrated. This typically involves downtime and runs the risk of data loss if something goes wrong. With a SAN expansion is no problem. Since the storage services are abstracted from the storage hardware it’s easy to add capacity or replace older equipment, in many cases involving no downtime.

A SAN also provides centralized management for storage: administrators can look in one place to see the status of all storage in a data center.  This allows businesses to evaluate storage health and utilization, which can prevent problems and help plan for future growth.

As data becomes a more and more important part of business strategy, it becomes critical for businesses to have larger, faster, and more reliable storage services to keep things operating smoothly. Storage networking is a core component of these strategies. I’ll continue posting about our thoughts and experiences with SAN solutions, and try to shed some light on the storage ecosystem as new technologies emerge.

Written by David Rocamora

April 15, 2009 at 5:36 pm