Archive for the ‘development’ Category

Adobe to stop development on Flash: Nostalgia vs. Nausea

without comments

Today we got news that Adobe was ceasing development on Flash and focusing on HTML5.  We’ve all worked with or consumed Flash in our personal and professional lives over the years, so we have a a few comments on the matter:

Charlie: I’m not a developer, but from the perspective of a web end user I’d be happy to experience less Flash in my day to day browsing.  Does this lend some validation to Steve’s “Thought on Flash”? http://www.apple.com/hotnews/thoughts-on-flash/

Nick: I’ve been Flash averse since 1997.  Just sayin’

Michael W:  I’ve noticed that all of the Flash haters I know (not naming names!) are all Mac faithful and honed their hate ages ago. Still relatively new to the mac world myself, I found flash to be an annoyance when using Safari, and since then, realized it’s only in Safari I had those problems. Having moved on from Safari to Chrome, I have no need for extensions such as ClickToFlash (a thousand thanks to Ivan for that little beauty.) It certainly felt like it had been poorly executed there, but anywhere else I’ve used it, be it Chrome, Firefox, on both Mac and PC, on a handful of Android phones, I don’t see what all the fuss is about. Trying to load a page as simple as a Gothamist post in Safari chugged and made my (at the time) shiny new laptop moan and seemingly beg to be put out of it’s misery. ClickToFlash eased that pain since it would only load flash objects after I approved them, but I don’t see the same problem with other browsers/OSs/etc.

Applying a general troubleshooting logic to the situation, if you’re only having a problem with an app or plugin (Flash) in a specific instance (in Safari) and it works fine the rest of the time, it’s not the problem with the app or plugin…

Sure HTML5 is the future, it’s open, but give it enough time, everyone will look back at HTML5 with the same ire they are giving Flash right now when the next big leap comes forth. Just because the new hotness is here doesn’t mean the old one was never worth anything. Without Flash, we’d have no Home Star Runner, TROGDOR!, Super Mario Twins, NewGrounds, LineRider, Youtube, etc.

I for one am glad to have The Burninator in my life.  <Nostalgia>

Dan: Maybe Flash haters all live in thatched-roof cottages….

Will: I’ve disliked Flash since the days my computer still had a floppy drive. Slow and resource-consuming while rarely adding any useful functionality to whatever site it’s maligned.  While I have ethical qualms with how Steve Jobs leveraged iOS’ success against Flash, I think the ultimate goal of replacing a bad, proprietary technology with an open one is a good one. Now if only Adobe had a competitor so that they’d be motivated to fix the plethora of bugs in their products.

Ivan: Well, not for nothing but the development of Flash for OS X lagged behind the Windows version for years. Any institutional dislike for Flash has been taught to Mac users by Adobe themselves. Heck, I don’t think Adobe even bothered to match versions between the OSs until Apple started being vocal about not wanting Flash on iOS.

Otherwise, flash video was great for the Mac because we finally didn’t have to worry about having a WMV codec installed. That said, I’m glad we got away from it for obvious reasons.  I can’t say why Safari doesn’t work as well with Flash now, except to note that Safari isn’t the most plugin-friendly browser out there. I’m not big on them myself, so it’s no loss to me.

Nick (again):  To be clear, I was a Mac hating Windows douche until around 2002.  And I too loved Homestarrunner et al, and accept the necessity of Flash in very specific instances which are becoming less and less frequent.  The problem is, aside from the resource hogging bloated piece of dung arguments that everyone is tired of, Flash is completely over- and inappropriately-used in the vast majority of instances. And often ugly.

And I really hate ugly and unnecessary things.

Colin: It’s really weird to watch the rebirth of everything again “isn’t it amazing I can do x with HTML5?!” yeah, it is super cool, but it was cool in flash in 1999 too.  it was the coolest thing ever, then it got old and crusty and a pain in the ass to deal with…. like a lot of people I know :-)

Thanks flash, on to the next, next thing.

Will (again): I haven’t used Safari in five or six years but still find I have to use Flashblock in Firefox and Chrome to keep runaway Flash apps from maxing out one of my cores when I’m not paying attention. Before its redesign, I found trying to load a MySpace Music page to be a guaranteed way to crush any and all browsers because of the obscene amount of Flash.  I see no reason why we’ll look back on HTML5 the same way. Its implementation is determined by the browsers, which are varied and competitive. (Also, I find HTML5 rendering engines already perform better than Flash ever has.) Flash, on the other hand, has only one implementation developed by one company that until recently had no competition. It took the largest mobile platform out there blacklisting it to make Adobe even start looking at improving its performance.

Michael W (again): To Nick’s point, Flash got out of control in regards to bad design because they made it so damn easy for anyone to pick up a (likely pirated) copy and start banging out animations within a couple hours. Suddenly Geocities exploded with wizz! bang! websites that had 7,392,103,134 things flying around. Same thing happened with animated gifs…

http://www.myspaceantics.com/images/myspace-comments/words/juicy-lips-blacks.gif

Chris R: I just hope that in all this opinionated side choosing, people realize that it is really about poor code from developers using the technologies. You’ll still see a bunch of crappy resource hogging crap using HTML5 technologies as well.  I prefer to just say, I like properly written software…I don’t care if you use AS3, HTML5 tech, C++, etc.

Ivan:  Damn it, where’s the Like button? :)


Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stacey Levine

November 9th, 2011 at 2:03 pm

Node.js – The future of web development?

with one comment

Okay, so the name is terrible, but the excitement around Node is very real and it’s not difficult to unpack why. For one thing, Node apps are written in Javascript and that brings some very real benefits. Not the least of which is the fact that almost all web developers already speak the language. Doesn’t matter if your language of choice is Ruby, PHP, Python, etc… chances are you have at least a modest (and probably an excellent) understanding of Javascript. As a software developer of more than a decade, I can tell you that not having to learn an entirely new syntax and the idiosyncrasies of a new language in order to get started and be productive is extremely appealing. Of course, if this was all Node was offering, that in and of itself would not be reason to consider using Node. Of course, this is not all Node is offering. Node apps by their very nature (being written Javascript) allow developers to write both back-end and front-end code in the same language – something that no other environment can offer. What this will really means is yet to be seen (and depends a lot on what tools will emerge to take advantage of that fact), but it certainly feels powerful.

So developers are excited – now what?

The truth is I don’t really know. So far, a lot of what I have seen in Node falls under the heading of “cute.” “Check out this chat room I wrote with 100 lines of code,” or “Check out this cool thing I did with web sockets.” I’m not picking on Node, so save your hate mail. In fact, I developed a “cute” project in Node myself with web sockets for remote control of Youtube videos (which also had a chat component). I had a lot of fun and the project was done in two days. I felt the power and ease of Node and could see how great it could be, but that is a long way from the types of complex applications I have written in PHP or Ruby. This makes me wonder what it would be like to build something like an enterprise caliber CMS using Node. Could I do it? Sure! Would I want to? No way! At least not now – and the answer has less to do with Node and more to do with tools/frameworks. Let’s face it, as modern developers we rely a lot on tools and the useful abstractions offered by our frameworks of choice. While Node does have a few promising MVC frameworks like Express and Grasshopper, they are still pretty green and can’t yet provide developers with same kind of productivity increases that other modern frameworks like Rails or Yii (my PHP framework of choice) offer. As long as this remains the case, it seems likely that Node will remain in the “cute” zone. Sure, a lot of cool projects will be done in Node, but it won’t be making the kind of huge unifying impact it seems capable of.

In a lot of ways, Node has already done the hard part. It’s easy to use and understand, it’s execution time is fast, and perhaps most amazingly, both PHP and Ruby developers agree that Node is a really cool idea. The challenge for Node is to take advantage of the excitement in the developer community, and bring them out of the diaspora to a centralized community website that doesn’t suck. Then use that community to get behind a MVC framework, with the goal of providing the kinds of tools PHP and Ruby developers need to be productive. Do this without just copying functionality and take advantage of Javascript as both a frontend and backend language, thus creating something that only Node can provide. When that happens few will doubt that Node is the future!

– Evan Frohlich

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stacey Levine

November 8th, 2011 at 10:47 am

Posted in development

Tagged with , , , ,

Climbing out of the Window and into the cloud

without comments

I’ve deployed Windows in every version since 3.0, and I looked at those early Windows through a greyscale monitor on a 286. I’ve used Microsoft Office in every version up to 2010 and hooked an old Neanderthal smartphone up to a hosted Exchange server and tapped my way through my emails with my little stylus.  I’ve installed and configured Windows server back-ends from Windows NT up to 2008r2. I’ve worked in IT rooms that were so full of loud server iron, that it was like being in the engine room of a submarine. I’ve installed or worked on every version of Exchange from version 5.0 up to 2007. I spent thousands of dollars of my own money on Microsoft manuals and certification exams in order to stay up to speed on developments.

And then it all began to change, and for me, it began with Exchange. Exchange 2007 was the last version I deployed, and the last version my own email account was connected to. I’ve been a Google Apps user for two years now, and I’ve also helped transit some of our clients over to Google Apps also.  My personal transition away from a Microsoft-centric working experience started with email, and has continued on through the rest of what I do. I feel like I’m boarding the Google boat and there are just a few bags left on the MS dock.

Google Apps has been offering full-featured email and collaboration services for a while now. An Exchange server can be replaced in a cost-effective fashion, providing shared calendars, spam filtering, message archiving, chat services, access anywhere from a web browser, and almost no mailbox size limits– starting at $50 per mailbox per year. And while you can certainly use Outlook, Apple Mail and iCal to connect to Google, you can use nothing but your web browser if you want and have full functionality. About the only major feature of Exchange not replicated in Google Apps is Public Folders. But then again, Microsoft isn’t providing for that either going forward.

Replacing email servers is one thing, but replacing those desktop applications is another story altogether. Microsoft has held a monopoly on workstation software and business productivity applications for years. The fact is, Windows and Office work well enough for the majority of users out there. They’re easy to use and familiar. A real challenger has to offer a better way of doing something the average user is already doing. And Cloud computing’s Software As A Service  is finally maturing to that point.

Speaking as a heavy Microsoft user, I’ve personally been anchored to Windows and Office, primarily by Visio and Project.  Microsoft Word is still critical for final document drafts and printing, since the offerings on Google Apps just aren’t there yet in terms of refinement and features. But it’s just a matter of time. The recent addition of Smartsheet to our toolkit has now removed our reliance on Microsoft Project except for our very largest initiatives. And we are eagerly awaiting the evolution of Google Drawings to allow us to build the type of schematics we’re creating with Visio.

Microsoft of course has its own Cloud offering, Office 365, and naturally it’s tied into licensing of their existing products. They have a tiered licensing model that’s much more complex than Google’s. But it’s also a much deeper system than Google Apps.  It goes without saying that Google has no legacy revenue streams to drag into the 21st century. Even so, any company can make a migration decision based on what functionality works best for them, but it’s nice to be able to start fresh at a very aggressive price-point.

For further comparison, since Google Apps is platform agnostic, Mac users don’t draw the short straw yet again when it comes to software and collaboration with their Windows brethren. For remote users in a company that’s migrated to Google Apps, all they need is an Internet connection and a web browser, and they will have exactly the same experience as they have sitting in their office. While the individual feature-set of Google Apps isn’t as elegant or robust as Microsoft Office or Office 365, Google Apps is radically simpler in that you get full access to it’s features with just a web browser. This makes it a serious competitor for today’s geographically diverse, small to mid size business.

Regarding remote workers, smartphones deserve a mention. I used a Blackberry for years, but opted for a Droid recently. You probably knew this was coming, but Google Apps on the Droid takes only minutes to configure, and works smoothly. Google Apps functionality includes: Gmail, Calendar and Contact sync, Push support, Google Docs,  Enterprise Admin controls, and 2-way verification for extra security. This feature set is available for almost all platforms, including Windows:  http://www.google.com/apps/intl/en/business/mobile.html

By contrast, Office 365 applications for iPhone and Android are not coming anytime soon. Mobile access is limited unless you’re using Windows Phone, which let’s be honest, almost nobody is.

Handheld prevalence as per end-of-year 2010:

Android #1 – 33.3 milllion

Symbian #2 – 31 million

Apple #3 – 16.2 million

RIM #4 – 14.6 – million

MS #5 – 3.1 million

Source: http://on.mash.to/rG6bfd

As I write this, I have my email, shared documents, and a Project Plan open on my Windows workstation and on my Linux laptop. I use these applications every single day. My documents appear and function identically across these two computers. My email appears and functions identically. The Gannt chart appears and functions identically. On my Droid phone I have read and write access to my calendar, documents, and email. With the exception of Visio and Word, I can be anywhere with an Internet connection, on a borrowed computer running OSX, Linux or Windows and be fully functional. The last of my bags on the dock contain Word and Visio. I can’t leave them behind just yet, but I’m waiting. I want to get going.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stephen Cheevers

November 3rd, 2011 at 1:44 pm

New Hadoop Spin-offs: Meh.

without comments

People are crazy about Hadoop. I think that this is the fastest that I’ve ever seen a technology go from competitive advantage to commodity. This technology is so new to organizations, but also so well deployed and understood by technologists, that we are in some kind of strange no-man’s land.

I think that the real issue may be more that no one knows what to do with Hadoop, not how fast it is or which version is better. I mean really, who cares if your HDFS implementation is like 10% faster when you can just spin up 10% more Elastic MapReduce instances.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

November 1st, 2011 at 11:40 am

Posted in development

Tagged with ,

CG R&D Meetings

without comments

We love what we do.  So much so that we work on pet projects together outside of the normal course of business.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stacey Levine

October 21st, 2011 at 10:24 am

Best Command Ever

without comments

Josh: Best command ever! sudo slapconfig – destroyldapserver

Evan: sudo make me a sandwich

Mark: Typo, should be: sudo make me a sammich :-)

Charlie: I once overheard someone on a cellphone calmly explaining: “Ok, just type ‘sudo’, then the letters ‘rm’ then space, then dash ‘rf’, then a forward slash. Ok, press return.”

Peter: $ sudo make me a sammich :-)

Peter (again): Keyword: Calmly

Dan: Two things:

1) I had a feeling I knew where this was going at “then the letters…”
2) That reminds me of a story (an apocryphal one) about some fancy demo somebody (possibly Microsoft) was doing for their speech recognition software. They turned it on and said that it could do any command you could type. So some wag in the audience shouted out “format c colon return”.

P.S. I’m totally sending that story to some of my sysadmin friends.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stacey Levine

October 14th, 2011 at 5:34 pm

Moving Beyond MDM for Custom iOS Solutions

with 2 comments

iOS logoI’m really excited about several new iOS development and deployment projects that we’ve been working on at CG. We’re working closely with Apple on a bunch of solutions: at the most basic level, we’re building solutions for security and management of employee iPad and iPhone use; at the other end of the spectrum, we’re helping to realize visions such as a kiosk-like platform of thousands of iPads deployed in retail environments around the country.

We’ve learned a ton about what is and isn’t possible as we strategize ways to scale to thousands of units. Here are some of the challenges we’ve come across:

  • How do we deploy and support iPads – whether ten or ten thousand – in a secure, efficient, and centralized way?
  • How can we architect kiosk-like application experiences on the iPad, enabling us to design and curate the customer experience, while also allowing a true iPad experience complete with app-switching, web browsing, Facebook-checking, game-playing, and movie-watching?
  • What kind of network and server architecture is needed to support a platform of iOS devices across the globe? How do we enable caching and pushing of dynamic data to the devices – particularly large amounts of media content?

Centralized deployment and support of iOS devices

How do we deploy and support thousands of iPads or iPhones in a secure, efficient, and centralized way? Mobile Device Management (MDM) platforms like AirWatch, Casper, MobileIron – and soon, OS X Lion Server – allow us to push XML configuration profiles to iOS devices. This enables centralized inventory and basic management of the devices: from what version of iOS they have installed, to some security control over how/if users can install and delete apps. For many enterprise customers, these tools are useful for administering security policies on employee-owned iOS devices. But for custom platforms like kiosks and retail experiences, MDM is not ideal due to the need for end-user interaction. What we need is a way to easily restore iOS devices back to their “golden” state in a centrally managed way.

We’re excited about the potential of over-the-air restores and software updates coming in iOS 5, but as of today, iTunes is the only game in town for this. Working within this limitation, we’ve architected some innovative solutions that enable iOS devices to connect to iTunes virtually over USB to IP converters and a content distribution infrastructure. Until iOS 5, this is a good option to have, and I haven’t heard of anyone else embracing this approach.

Rearchitecting Apple’s iOS user experience

Put an iPad in front of someone and they’re going to tap, scroll, pinch, and squeeze the user interface. The user experience is still the leader in the tablet space – though we’ve been recently impressed by the BlackBerry PlayBook. For a project we’re working on now, we want to encourage this user experimentation and interaction, while locking down some important components of the UX. Things like App Store purchases, iTunes downloads, deleting apps, rearranging icons, and changing the home screen wallpaper will quickly affect the kiosk experience. MDM solutions can help disable some of these features, but the aforementioned need for user interaction just doesn’t work for specialized user environments.

One solution we’ve had success with is a combination of custom code to disable user customization of the Springboard, plus a WebKit-based Safari replacement for browsing that enables us to prevent user download of unauthorized content. Combine these with some configuration profile-based customization of iOS and we have a good solution for locking a customer experience down and reducing the frequency of unit restores or reimaging.

The CG approach to iOS projects

Part of what makes CG stand out as a solution provider is our deeply embedded collaboration between our application development team and our infrastructure team. As the Enterprise’s appetite for customized mobile platforms and experiences grows, we’re uniquely suited as a technology partner to build and innovate on our customers’ vision. iOS is at the core of this vision and I couldn’t be more excited to be working with these technologies today. Plus, iOS 5 is on its way and it’s shaping up to be a giant leap forward!

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Charlie Miller

June 13th, 2011 at 8:00 am

Rapidly Prototyping Tagatag on Google App Engine

with one comment

Google App Engine is Google’s platform-as-a-service for developing web applications. There’s been some people saying goodbye to GAE, and perhaps in response Google has announced several enhancements to the service.

In the midst of all of this, a few of us at Control Group have been developing Tagatag: an Android and iPhone application for commenting on barcodes that uses web services running on Google App Engine.

Scan this QR code with Tagatag to join the conversation!

Barcodes are everywhere around us. You can find them on advertising, products, places and even people. Tagatag provides you with a virtual paint marker to let you make your mark on all of these codes anonymously. Download the Tagatag app and give it a try. Scan a barcode to see comments people have left for you and then leave some for them.

We chose Google App Engine for the back end of Tagatag for a few reasons:

  • It’s quick – You sign up for an account, download the SDK and you’re developing. The development server in the SDK lets me run the application on my desktop and interact with the code as I’m writing it.  Uploading new versions, rolling back old ones, or performing maintenance is a snap with the GAE dashboard.
  • It’s simple – There’s not much to the web service. It’s small and simple. We used the webapp framework because we didn’t feel we needed anything else. It makes for a very concise application. Believe it or not, there are about 300 lines of code for the GAE part of Tagatag.
  • It’s scalable – We don’t have to worry about what we do when Tagatag becomes popular. We’ll just raise our billing quotas in GAE and let them handle spinning up new instances or expanding the datastore. Knowing that you don’t have to be concerned about scaling makes things a lot more fun.

I’m happy that GAE let us bring Tagatag to you so quickly. So, when it’s available at the end of the week, be sure to download the app, tag a tag and make your mark!

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by David Rocamora

December 6th, 2010 at 8:00 am

Why All the Fuss Over Angry Birds?

without comments

Source: Rovio Mobile.

Angry Birds, the mobile game from Rovio Mobile that allows players to “dish out revenge on the green pigs who stole (their) eggs,” has been making a lot of headlines lately, most recently for racking up 1 million downloads on Android in a single day.

That’s obviously a lot of downloads. But what’s the big deal? What’s so compelling about this game? And what can brands looking to develop a comparable mobile experience learn from its success?

For starters, Angry Birds was a solid success on the iPhone. Once it got publicity, its sales continued to grow and the PR continued.

In my opinion, the initial spark that got it the coverage that started the snowball effect was the choice of gameplay and presentation: The game isn’t complex. And the greatest attraction by far is that it is easy to pick up game play. It also has a simple concept of trajectory-based strategy, puzzle elements in the simplest incarnation, cute characters, fun audio, and an addictive level progression system that has you replaying boards to earn “all 3 stars.”

The gaming space on Android has been severely lacking, but sales are soaring. There was an ever-increasing demand for games on the platform and, as such, the developer recognized the demand immediately and worked on the Android version. And, again, due to the nature of the game, it works well on Android and its capabilities as a gaming platform in all its OEM configurations. So — boom — one million plus in sales right off the bat.
If there is a lesson to be learned out of this for developers, it’s the importance of “not missing the boat” as you saw the same thing with the iPhone OS when it first supported games. There was a sleeper success to spark everyone else jumping on board, saturating the market and thus watering it down and ending the age of prosperity for most developers save for the large studios. This is the landscape of the mobile market, and with big players still on the way — like the Windows 7 Phone, webOS 2.0 and Blackberry OS 6.x — there are going to be many repeats to come.
Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Chris Ross

October 22nd, 2010 at 11:00 am

Using Hadoop and Amazon Elastic MapReduce to Process Your Data More Efficiently

with 4 comments

If the amount of data in your enterprise is overwhelming and/or you’re looking for ways to process said data more efficiently, then Hadoop and Amazon Elastic MapReduce may be your answer.

MapReduce frameworks allow developers without much knowledge on distributed computing to write applications that take advantage of distributed resources. Hadoop MapReduce is an implementation of such a model.

Background:

Recently, we developed a web asset delivery service for one of our clients that would allow businesses to display high quality assets from a CDN in their websites for a monthly fee. Users’ accounts would be associated with bandwidth limits based on different account levels associated with a pricing model. This meant that we needed a way to provide users with information on monthly bandwidth utilization across their websites in order to defend the pricing model. The solution to this was to implement web log parsing using Hadoop’s MapReduce framework in conjunction with Amazon’s cloud-based Elastic MapReduce service.

Here’s how Hadoop MapReduce works:

Hadoop MapReduce is a Java-based framework that allows you to write applications that process high volumes of data in parallel clusters. Hadoop uses a distributed file storage system called Hadoop Distributed File System (HDFS) to store large amount of data across multiple nodes. It supports most major platforms and MapReduce programs can be written in Python, Ruby, php, Pig, etc., in addition to Java.  Using Hadoop, we were able to write a simple Java program that could easily parse through raw data in log files collected by the CDN and filter relevant bandwidth utilization information.

The basic idea around a map-reduce model is that you write two functions — map() and reduce() — to divide up your programming tasks and let the framework manage most of the crunching. Map and reduce functions take in key-value pairs (using data types that implement Hadoop’s Writable interface) as the input and output. When you start a map reduce process, you pass in a data file in HDFS as the input. Hadoop divides up the inputs into smaller pieces that the map function can consume. Likewise, the outputs of the map function are grouped together in logical chunks by Hadoop and sent to the reduce function for processing.  Both map and reduce functions can run in parallel — Hadoop can distribute the tasks across various clusters of nodes.

In our case, we simply pass the log file(s) (copied to HDFS) as input to the map reduce program.  Hadoop merges all log files specified and serializes each log entry to the datatype expected (Text) before passing them as inputs to the map reduce tasks. Our map function then parses each long entry individually and stores the relevant data (bandwidth info) in a HashMap type object (MapWritable) which is then sent as another key-value pair (<asset path – MapWritable object>) for the reduce function to work with. The reduce function then aggregates the data based on user accounts, date, user agent, etc. and saves it to a database (Amazon RDS Database). We can then query the database to pull all types of information around utilization and send out notifications to users, for example, if their account is over the monthly cap, etc.

Below is the structure of a sample map reduce program written in Java:

public class LogProcessor {

  public static class LogMap
            extends Mapper<LongWritable, Text, Text, MapWritable> {
    public void map( LongWritable key, Text value, Context context ) {
      MapWritable logEntry = new MapWritable();
      //parse log file
      ...
      Text key = new Text();
      //key = resource-path;

      context.write( key, logEntry);
    }
  }

  public static class LogReduce
            extends Reducer<Text, MapWritable, DBWritable, NullWritable> {
    public void reduce( Text key, Iterable<MapWritable> values, Context context ) {
      while(values.iterator().hasNext()) {
        MapWritable entry = values.iterator().next();
        //process entry and write to db
        ...
      }
    }
  }

  public static void main(String[] args) {
    // Set up a new mapreduce job
    Job job = new Job();
    job.setJarByClass(LogProcessor.class);    //register the main class

    FileInputFormat.addInputPath( job, new Path(<Input file path>) );
    FileOutputFormat.setInputPath( job, new Path(<output file path> ) );

    job.setMapperClass( LogMap.class );
    job.setReducerClass( LogReduce.class );

    job.setOutputKeyClass( Text.class );
    job.setOutputValueClass( MapWritable.class );

    job.waitForCompletion(true) ? System.exit(0) : System.exit(1);
  }
}

The program is packaged in a jar file (with dependencies) that Hadoop can run.

And here’s how to utilize Amazon’s Elastic MapReduce service to run the program:

At Control Group, we leverage Amazon’s cloud based infrastructure heavily in lots of projects. It basically allows us to cost effectively (pay by usage) deploy applications that need to scale up very easily. Amazon’s Elastic MapReduce service is the perfect fit for running our MapReduce application described above. It’s easy to set up, and it also shields off some of the infrastructure/maintenance issues around running Hadoop.

In a nutshell, the Elastic MapReduce service runs a hosted Hadoop instance on an EC2 instance (master), and it’s able to instantly provision other pre-configured EC2 instances (slave nodes) to distribute the MapReduce process, which are all terminated once the MapReduce tasks complete running. Amazon allows us to specify up to 20 EC2 instances for data intensive processing. It also provides the option to upgrade your Elastic MapReduce to increase EC2 instance count.

So to run the map reduce service, we create a new “Job Flow” via the AWS console, the command line utility (ruby based) or an API provided by Amazon. A job flow is a set of steps that Elastic MapReduce runs. You basically provide some configuration information (number of EC2 instances to use and bootstrap actions) and the location of your map reduce program ( usually an Amazon S3 bucket path). Job flow records/logs can be viewed at the AWS console. You can also explicitly instruct Elastic MapReduce to keep the master EC2 instance alive for debugging purposes – you can then ssh into the instance to check the log files created by Hadoop, etc.

In summary, Hadoop’s MapReduce framework allows us to write simple applications that process high volumes of data in a distributed computing environment while Amazon’s MapReduce service provides a cost-effective means of implementing such a solution.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

services people careers press blog contact follow us