Archive for the ‘PHP’ tag

Deploying PHP applications as phar archives

without comments

Deploying code is a big part of our job and we’re always looking to increase our efficiency when deploying applications. Recently, we decided that our goal is to package every application as a single file archive that can be easily built and deployed. We want our continuous integration system to spit out a single file per project that can be used to deploy the everything. PHP offers a way to store PHP apps into one single file, a PHP Archive or “phar” file, so we began our experiments with phar archive deployment.

To test deployments of PHP apps in a phar archive, we generated a very basic Yii Framework-based web application for testing: a “yii/” directory with the Yii Framework files and a “webapp/” directory with the web application files (e.g. “index.php” and “protected/”). We also protected the “yii/” directory with an “.htaccess” file and deleted some runtime data to save up space in the phar archive we wanted to build.

We modified our configuration to serve phar files with the PHP module and whitelisted phar files in the Suhosin PHP extension configuration. We generated a testing “index.phar” archive and put it in the DocumentRoot along with a bootstrap “index.php” file with the following content:

<?php
include ‘phar://index.phar/webapp/index.php’;
__HALT_COMPILER();

An error occurred when the application loaded in the browser: realpath() was not able to determine the location of the “protected/runtime/” directory in the web application. This function seems to be having issues when used inside phar archives and there was no point in storing runtime or user data inside of it.  So we needed a real directory outside the phar file for that. We then overrode realpath() in the bootstrap file with the “runkit” PHP extension.

In the overridden function, we expunged the “phar://” and the “index.phar/webapp/” path components and returned the results when the Yii Framework was trying to determine its runtime directory. If a path was beginning with “phar://” we simply returned it, and if none of those conditions were met, we simply returned the value returned by the original realpath() we made a copy of in the bootstrap file. To correctly display css files stored in the phar archive, we also used “mod_rewrite” to redirect requests to “/index.phar/webapp/css/”. We created the “protected/runtime/” and “assets/” directories outside the phar archive in the DocumentRoot, and we protected the newly created “webapp/protected/” directory with an “.htaccess” file.

We also noticed that captcha images were not being displayed because a needed “ttf” font that ships with the Yii Framework was not found at runtime: dirname() was not able to return/determine the whereabouts of the directory inside the phar archive where that font was. We overrode dirname() to extract that file at runtime from the “index.phar” archive into a temporary location, if not already there; the overridden dirname() was coded to return this new path, or the value returned by the original dirname() function in all the other cases.

As you can see, there are a lot of overrides required just to make a simple application work. We’ve stopped our work on phar archive deployment because managing all of these overrides is unworkable. We also have no assurances that the overrides will be appropriate for a more complicated application.

We’re going to try some other experiments to get closer to our goal of a single file deployment for our applications. Our next experiments will be around automation the creation of tarballs with custom code to deploy them appropriately.

Is anyone else using phar archives to package their applications? We’d be curious to know if anyone else has had better luck. Any comments and ideas are welcome!

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Matteo Rinaudo

December 13th, 2011 at 9:48 am

Node.js – The future of web development?

with one comment

Okay, so the name is terrible, but the excitement around Node is very real and it’s not difficult to unpack why. For one thing, Node apps are written in Javascript and that brings some very real benefits. Not the least of which is the fact that almost all web developers already speak the language. Doesn’t matter if your language of choice is Ruby, PHP, Python, etc… chances are you have at least a modest (and probably an excellent) understanding of Javascript. As a software developer of more than a decade, I can tell you that not having to learn an entirely new syntax and the idiosyncrasies of a new language in order to get started and be productive is extremely appealing. Of course, if this was all Node was offering, that in and of itself would not be reason to consider using Node. Of course, this is not all Node is offering. Node apps by their very nature (being written Javascript) allow developers to write both back-end and front-end code in the same language – something that no other environment can offer. What this will really means is yet to be seen (and depends a lot on what tools will emerge to take advantage of that fact), but it certainly feels powerful.

So developers are excited – now what?

The truth is I don’t really know. So far, a lot of what I have seen in Node falls under the heading of “cute.” “Check out this chat room I wrote with 100 lines of code,” or “Check out this cool thing I did with web sockets.” I’m not picking on Node, so save your hate mail. In fact, I developed a “cute” project in Node myself with web sockets for remote control of Youtube videos (which also had a chat component). I had a lot of fun and the project was done in two days. I felt the power and ease of Node and could see how great it could be, but that is a long way from the types of complex applications I have written in PHP or Ruby. This makes me wonder what it would be like to build something like an enterprise caliber CMS using Node. Could I do it? Sure! Would I want to? No way! At least not now – and the answer has less to do with Node and more to do with tools/frameworks. Let’s face it, as modern developers we rely a lot on tools and the useful abstractions offered by our frameworks of choice. While Node does have a few promising MVC frameworks like Express and Grasshopper, they are still pretty green and can’t yet provide developers with same kind of productivity increases that other modern frameworks like Rails or Yii (my PHP framework of choice) offer. As long as this remains the case, it seems likely that Node will remain in the “cute” zone. Sure, a lot of cool projects will be done in Node, but it won’t be making the kind of huge unifying impact it seems capable of.

In a lot of ways, Node has already done the hard part. It’s easy to use and understand, it’s execution time is fast, and perhaps most amazingly, both PHP and Ruby developers agree that Node is a really cool idea. The challenge for Node is to take advantage of the excitement in the developer community, and bring them out of the diaspora to a centralized community website that doesn’t suck. Then use that community to get behind a MVC framework, with the goal of providing the kinds of tools PHP and Ruby developers need to be productive. Do this without just copying functionality and take advantage of Javascript as both a frontend and backend language, thus creating something that only Node can provide. When that happens few will doubt that Node is the future!

– Evan Frohlich

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Stacey Levine

November 8th, 2011 at 10:47 am

Posted in development

Tagged with , , , ,

Using Hadoop and Amazon Elastic MapReduce to Process Your Data More Efficiently

with 4 comments

If the amount of data in your enterprise is overwhelming and/or you’re looking for ways to process said data more efficiently, then Hadoop and Amazon Elastic MapReduce may be your answer.

MapReduce frameworks allow developers without much knowledge on distributed computing to write applications that take advantage of distributed resources. Hadoop MapReduce is an implementation of such a model.

Background:

Recently, we developed a web asset delivery service for one of our clients that would allow businesses to display high quality assets from a CDN in their websites for a monthly fee. Users’ accounts would be associated with bandwidth limits based on different account levels associated with a pricing model. This meant that we needed a way to provide users with information on monthly bandwidth utilization across their websites in order to defend the pricing model. The solution to this was to implement web log parsing using Hadoop’s MapReduce framework in conjunction with Amazon’s cloud-based Elastic MapReduce service.

Here’s how Hadoop MapReduce works:

Hadoop MapReduce is a Java-based framework that allows you to write applications that process high volumes of data in parallel clusters. Hadoop uses a distributed file storage system called Hadoop Distributed File System (HDFS) to store large amount of data across multiple nodes. It supports most major platforms and MapReduce programs can be written in Python, Ruby, php, Pig, etc., in addition to Java.  Using Hadoop, we were able to write a simple Java program that could easily parse through raw data in log files collected by the CDN and filter relevant bandwidth utilization information.

The basic idea around a map-reduce model is that you write two functions — map() and reduce() — to divide up your programming tasks and let the framework manage most of the crunching. Map and reduce functions take in key-value pairs (using data types that implement Hadoop’s Writable interface) as the input and output. When you start a map reduce process, you pass in a data file in HDFS as the input. Hadoop divides up the inputs into smaller pieces that the map function can consume. Likewise, the outputs of the map function are grouped together in logical chunks by Hadoop and sent to the reduce function for processing.  Both map and reduce functions can run in parallel — Hadoop can distribute the tasks across various clusters of nodes.

In our case, we simply pass the log file(s) (copied to HDFS) as input to the map reduce program.  Hadoop merges all log files specified and serializes each log entry to the datatype expected (Text) before passing them as inputs to the map reduce tasks. Our map function then parses each long entry individually and stores the relevant data (bandwidth info) in a HashMap type object (MapWritable) which is then sent as another key-value pair (<asset path – MapWritable object>) for the reduce function to work with. The reduce function then aggregates the data based on user accounts, date, user agent, etc. and saves it to a database (Amazon RDS Database). We can then query the database to pull all types of information around utilization and send out notifications to users, for example, if their account is over the monthly cap, etc.

Below is the structure of a sample map reduce program written in Java:

public class LogProcessor {

  public static class LogMap
            extends Mapper<LongWritable, Text, Text, MapWritable> {
    public void map( LongWritable key, Text value, Context context ) {
      MapWritable logEntry = new MapWritable();
      //parse log file
      ...
      Text key = new Text();
      //key = resource-path;

      context.write( key, logEntry);
    }
  }

  public static class LogReduce
            extends Reducer<Text, MapWritable, DBWritable, NullWritable> {
    public void reduce( Text key, Iterable<MapWritable> values, Context context ) {
      while(values.iterator().hasNext()) {
        MapWritable entry = values.iterator().next();
        //process entry and write to db
        ...
      }
    }
  }

  public static void main(String[] args) {
    // Set up a new mapreduce job
    Job job = new Job();
    job.setJarByClass(LogProcessor.class);    //register the main class

    FileInputFormat.addInputPath( job, new Path(<Input file path>) );
    FileOutputFormat.setInputPath( job, new Path(<output file path> ) );

    job.setMapperClass( LogMap.class );
    job.setReducerClass( LogReduce.class );

    job.setOutputKeyClass( Text.class );
    job.setOutputValueClass( MapWritable.class );

    job.waitForCompletion(true) ? System.exit(0) : System.exit(1);
  }
}

The program is packaged in a jar file (with dependencies) that Hadoop can run.

And here’s how to utilize Amazon’s Elastic MapReduce service to run the program:

At Control Group, we leverage Amazon’s cloud based infrastructure heavily in lots of projects. It basically allows us to cost effectively (pay by usage) deploy applications that need to scale up very easily. Amazon’s Elastic MapReduce service is the perfect fit for running our MapReduce application described above. It’s easy to set up, and it also shields off some of the infrastructure/maintenance issues around running Hadoop.

In a nutshell, the Elastic MapReduce service runs a hosted Hadoop instance on an EC2 instance (master), and it’s able to instantly provision other pre-configured EC2 instances (slave nodes) to distribute the MapReduce process, which are all terminated once the MapReduce tasks complete running. Amazon allows us to specify up to 20 EC2 instances for data intensive processing. It also provides the option to upgrade your Elastic MapReduce to increase EC2 instance count.

So to run the map reduce service, we create a new “Job Flow” via the AWS console, the command line utility (ruby based) or an API provided by Amazon. A job flow is a set of steps that Elastic MapReduce runs. You basically provide some configuration information (number of EC2 instances to use and bootstrap actions) and the location of your map reduce program ( usually an Amazon S3 bucket path). Job flow records/logs can be viewed at the AWS console. You can also explicitly instruct Elastic MapReduce to keep the master EC2 instance alive for debugging purposes – you can then ssh into the instance to check the log files created by Hadoop, etc.

In summary, Hadoop’s MapReduce framework allows us to write simple applications that process high volumes of data in a distributed computing environment while Amazon’s MapReduce service provides a cost-effective means of implementing such a solution.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Zend Server, PHP, RIAs and Flex

without comments

I recently attended an Adobe Flex user group meeting here in New York where the title of the presentation was “Zend Server: A Flex Perspective”. I knew that earlier this year, Zend officially announced the Beta release of their new PHP application server product, and as a developer of large scale RIA web applications using PHP and Flex, I was anxious to learn how this new product might impact our next project. The presentation was a good, albeit brief, overview of Zend Server. However, despite the title of the presentation and the theme of the user group, no connection was made between this new product and Flex. I thought I’d try to make that connection here.

A Little Background: Zend?, Zend Server?, Flex?
Zend is known as “The PHP Company”. Their founders are key contributors to the core PHP language and the company focuses on creating products to help improve the entire PHP application development life-cycle experience. They provide products and services to help with configuration and installation, development, deployment and with production application administration and maintenance.

Zend Server dashboard

Zend Server dashboard

Zend Server is one of Zend’s most recent products and is a package of several different Zend offerings. On one hand, Zend Server is a certified PHP distribution that includes the most reliable and up-to-date version of PHP, tested PHP extensions, database drivers and comes bundled with Apache. It also wraps a nice, user-friendly interface around the configuration management of PHP, Apache and all these extensions to provide ease of initial environment configuration and maintenance. On the other hand, it is a suite of development components providing tools to ease development and deployment, optimize application performance by speeding up PHP execution and by providing data caching options, and assist in monitoring and debugging multiple environments running remotely. Zend Server comes in two flavors: A free community version and a commercial version. The commercial version has extra features as well as full support from Zend.

Flex is an Adobe development framework that assists in the creation of cross-platform rich internet Flash applications (RIAs). Flex has really opened-up the Flash door to non-Flash developers. It removes the need to work within the esoteric Flash movie “timeline” and allows traditional programmers a more familiar environment in which to build applications. You use the ActionScript scripting language and an XML-based markup language called MXML to build Flex applications.

Adobe Flex Builder 3

Adobe Flex Builder 3

Okay, so what does one have to do with the other?
Well, as developers are turning to Flex as a presentation tier to help meet the ever growing demands of Web applications to manage and deliver rich media and deliver rich interactive user experiences, they have to turn somewhere for the application tier to deliver the services and data management that drive these flashy front-ends. To date, Java has been by far the most popular choice for this tier. So much so, that some claim there are no other “real” options. I would never argue against a decision to use Java as the application tier in an n-tier Web application environment. But I do think there are options. And I do believe PHP is one of those options.

Among many professional software developers, PHP has a reputation for not being particularly well-suited to large or extremely complex site implementations. Some even believe that PHP is nothing but a simple templating language, only to be used for initial mockups and quick demonstration POCs, and has no role in serious, production, “Enterprise” applications. I don’t want to go down the long path of refuting such misconceptions. Please take a look at Zend’s own John Coggeshall’s rebuttal of such claims. But one fair criticism of PHP, also acknowledged in Mr. Coggeshall’s article, is that PHP has been weak in “Enterprise” tooling. Java has been in this space for a while, and has several free and commercial application servers from which to choose that provide a wealth of tools and functionality to support serious, enterprise-grade applications. Zend Server is striving to fill this gap in PHP and move PHP onto the short-list of viable options when CTOs, CIOs and Managers are choosing the technology stack on which to run their next next big RIA project.

One last point, with regard to Abode Flex in particular, is that previously there has been no official supported implementation of Adobe’s Action Message Format (AMF) integration with PHP. The teaming of Adobe and Zend to back Zend_Amf, which is part of the Zend Framework bundled with Zend Server, has changed this fact. With the release of Zend_Amf, PHP can now officially speak in the native tongue of Flex’s ActionScript, making integration fast and seamless. This, along with the introduction of Zend Server, goes a long way in support of using PHP as the application server tier behind an Adobe Flex UI.

Share this: Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this with Linked in

Written by Jeff Winesett

April 2nd, 2009 at 5:36 pm

Posted in development

Tagged with , , ,

services people careers press blog contact follow us