Summer Internship 2014: Visualizing Workplace Data

This is a post by Samuel Lalrinhlua, a student at Syracuse University in the Master of Science in Information Management (2015) program. He was also a summer intern on our Enterprise Architecture team. 

I first came across Control Group when I read the ‘Best Places to Work 2012’ list published by Crain’s. I was immediately drawn in by the photo of their StarTrek-esque hallways and thought to myself “that would be a cool place to work”.  But I never thought in a million years that I would actually get an opportunity to work for this company and would be writing about my internship experience on their blog.

When I arrived in June I was given a detailed description of the projects that I would be working on this summer: add visualizations of CG data on the monitors that hang above the Support Center and find other interesting ways to show data around the office. My fellow intern, Soohyun Park, and I were asked to collaborate and create visualizations that used and displayed dynamic data.

Conf. Room Availability

I worked on several applications, such as Talend and PostgreSQL, to extract relevant internal data such as Personal Time Off (PTO) status, work anniversaries, timesheet usage and project status, among other things. All of this data was used in creating the visualizations that are now shown on the big screens in the office.  Many of these technologies were new to me and it took some troubleshooting along the way to see results. Soohyun and I also developed an iPad visualization that displays the status of the conference rooms. Red shows “booked” and green shows “available”. App development was new to me and I learned a lot from this experience.

I am glad that I got to spend my summer with CG. I have gained an invaluable experience, both professionally and personally. Thank you all for your support– and for the coffee (I’m going to miss that!). And thank you for making me a part of the Control Group team this summer

“Live long and prosper.”

Summer Internship 2014: Designing for Scalability

This is a post by Alex Daley, a student at Elisabeth Irwin High School (Class of 2015) and a summer intern on our DevOps team.

I have been highly interested in mixed types of engineering for most of my high school career. On the hardware side, I have launched Arduinos into the stratosphere, led our robotics team to victory, and led seminars for teachers interested in 3D printing. On the software side, I have built apps and designed websites. The common theme has been that these projects have been pretty “hacked together”. I quickly built things that worked, but probably weren’t scalable and were rarely reusable.

I came to CG this summer to work with cloud services and sensor networks. I was tasked with the design and implementation of a highly-scalable, real-time sensor network. I had worked with sensors in the past, but building something on a large scale that had to be solid enough to expand was an interesting challenge. The goal was to have a number of sensors report data to a central location.

A few years ago, that central location would have probably been an SQL database. Before starting at CG, that is definitely how I would have implemented it. Instead, David introduced me to a service from Amazon called Kinesis. Kinesis is a “data stream” that allows really large amounts of data to be collected and retrieved with no latency. Not only was it immensely scalable, it was also ridiculously simple. I had no experience with Amazon Web Services when I came to CG, but I was able to get Kinesis working in a few hours. Just like that, the entire backend for the sensor network was taken care of.

For my first shot at the actual network, I started somewhere familiar: the Arduino. I was convinced that the small, inexpensive board was perfect for every application, as I had used them on everything from automatic fish feeders to weather balloons. I hooked up a temperature sensor, plugged it into a computer, and used a python script to parse the serial data and send it to a PHP site that would send the readings to Kinesis.

Arduino

If this sounds ridiculously complicated and prone to failure, that’s because it was. Like I said, I was used to hacking things together and getting them to quickly work. This setup did work. However, it was almost completely limited to this one case. If I added another sensor, I would need to modify almost every single step in the pipeline. It also only handled one type of sensor, and was completely susceptible to data corruption and had a number of bottlenecks. It wasn’t even close to what the project needed to be. But it was a start.

The first improvements I made were meant to simplify the pipeline. If I was going to have thousands of sensor locations, I shouldn’t need a laptop at every one of them to connect the sensors to the internet. There are a number of cheaper, faster, and more direct ways of getting online. In addition, the PHP gateway would have to go, as it acted as a severe handicap on the much faster Kinesis service. I would have to access Kinesis directly from the Python code.

Solving the first problem was simple. An Arduino, paired with an ethernet shield, can connect to the internet. The second problem posed a serious issue: Arduinos don’t run python. An ethernet-enabled Arduino could send requests to the PHP site all day, but writing a C library to talk directly to Kinesis was impractical, considering I had six weeks.

I went in search of another board. The Raspberry Pi is a similar price and size, but has built in ethernet and runs Python. It was perfect.

The Pi had one flaw, however. Unlike the Arduino, which has hundreds of well documented and easy to use sensors available, the Pi was harder to integrate with the physical world. I had been using the Grove System with the Arduino, a selection of sensors designed to have plug-and-play functionality. One of the key goals of the project was to give other people the capability to add onto it after I left, and the Grove System was perfect for this. However, it only worked with the Arduino. Or so I thought.

The great thing about open source hardware is that when enough demand for a new feature exists, someone in the community builds it. Such was the case with the Grove System and the Raspberry Pi, which were linked by a project called GrovePi, a shield like device that essentially acts as an Arduino. It reads the sensor values, and then translates them into data that the Pi can understand.

Raspberry Pi

I had a solution to the problems of reading data and getting it to Kinesis reliably. However, the system still lacked the efficiency and scalability that it needed. I was still just sending streams of sensor readings that didn’t have meaning to anyone who didn’t know the exact setup. There was no way to tell what type of sensor data was being sent, and whether or not the data was intact or valid. To solve this, I initially put together a basic protocol, that basically looked like this:

{1.24.2, “Temp”, 57}

This piece of data meant that the temperature sensor at 1.24.2 (more on the sensor ID system later) had a reading of 57 degrees. Once again, this system worked, but strictly within this context. It had a number of problems. Defining sensor types by strings is really unnecessary and prone to errors, a ton of extra data is in there, and if part of that update was corrupted, the code would have no idea, and send it to Kinesis anyway.

The solution to all of these problems came in the form of Protocol Buffers. Developed by Google, Protocol Buffers allow you to define data models in an external file. For example, I had a model for a sensor report, which had fields for type of sensor, reading, timestamp, and more. The file is used to encode the data, making an update only a few bytes long. After it is received, if it is intact, the data is reconstructed into the easily-accessible object.

While I was building this system, my initial feeling was that this was overkill. I thought, those few bytes didn’t really matter for what I was building, it wouldn’t make a difference. But David continued to tell me to consider the potential scale. If I had tens of thousands of sensors, it would matter.

Now that I had a super-efficient sensor sending data to the cloud, I needed to think about expandability. The first thing I did was define a few concepts that the sensor network would be built around. There would be clusters, each one based on a Raspberry Pi. Each cluster would have sensors, which were individual data producers. Clusters were each part of a network. Defining the structure let me create a protocol for identifying each sensor, that looked like this:  (Network ID).(Cluster ID).(Sensor ID)

Initially, I had configured the cluster to send a Kinesis record for every single sensor update. This meant if I wanted the state of a cluster once per second, and there were 10 sensors on the cluster, I would need 10 Kinesis requests per second, which would become impossible quickly, as our stream was limited to 1,000 write operations per second. In addition, the requests took time to send. The data size was not the bottleneck here; the actual connection was. The solution was combining sensor reports into single cluster reports, so that when a cluster wanted to send out an update, it would gather all of its data, package it up, and send it all along in one request to Kinesis. This approach saved a few tenths of a second, an amount of time I would have considered meaningless a few months ago. My early experiences with this project made me realize how critical that amount of time could be. When scaling to thousands of units, every microsecond counts.

The Raspberry Pi was initially difficult to manage, because to control it, I needed to know the IP address, which I couldn’t get without hooking it up to a display. This was not practical for configuring a large number of Pis. The solution was putting a startup script on the Pi that emailed the IP address to me every time it started up. This simple fix made SSHing easy.

The startup script option also allowed me to get the data reading and Kinesis updating to start immediately, without any commands. This allowed for literal plug-and-play functionality; after simply plugging the cluster into the wall, records appear in the Kinesis stream.

With the prototype cluster done, I turned my focus to appearance. I had never worried much about making my projects look nice, but the cluster looked much more sinister than it should have, sitting in a corner of a conference room:

Conference Room Sensor

I set out to design an enclosure. I went to staples and bought a large piece of foam board and a bottle of glue. I was thoroughly in arts and crafts territory. Using a cardboard pi case as a template, I starting cutting away until I had an enclosure that held the Pi and the sensors. It was less than attractive:

Ugly Pi Case

But that is what prototyping is all about. After deciding some changes to port locations, etc, I jumped into the absurdly simple online 3D app Tinkercad to design a 3D-printable case. Getting it printed was as simple as taking a train up to the MakerBot store in Greenwich Village and dropping off the file. A few hours later, I had a respectable, CG-themed enclosure:

CG Pi Case

After a three week break, I was back at CG to see if this thing was really as scalable as I thought. While I was gone, parts for 9 (nine!) more clusters were delivered. It was awesome:

nine sensors

The nice thing about designing with scalability in mind was that when it actually came to scaling, everything worked. I was surprised, as I had been expecting hours of troubleshooting errors resulting from expanding the network by a factor of ten. Instead, there were no major problems. If I wasn’t already sold on scalable design, I was now.

This internship was a short six weeks. As with all projects, there are a ton of ways that this could be taken further. I would have like to set up a better way to communicate with the clusters. Kinesis, by design, only allows for one-way communication, so the service cannot talk back to the clusters. A web interface with the capability to change settings on each cluster would have been awesome. In addition, I didn’t get to spend much time on the consuming side of the project. I had a simple program that displayed the realtime results, but if I had more time, I would definitely look at visualizing the data.

I’ve mentioned it before but I can’t overstate how foreign the concept of scalability was to me when I came to this job. I built things with the main intent of getting them working the quickest way possible. Scalability was a waste of time if the project was small. These six weeks have completely changed my engineering mentality. Designing for potential size, instead of current size, not only allows scaling to happen smoothly when the time comes, but also leads to more solid code overall, and is invaluable to projects of every size.

SXSW Panel Picker: Transmedia Storytelling in the Age of Proximity

SXSW Panel Picker is OPEN!

We are really excited about LTE Direct, a new mobile technology from Qualcomm that’s due to hit the market in about 15 months. So we thought it would be a great idea to partner up with Qualcomm, the very folks who created the chip, and Titan, a leader in out-of-home advertising, to explain what the opportunity will be for brands once LTE Direct is made public.

With mobile applications and proximity technologies like bluetooth beacons and soon LTE Direct (LTED), synchronized campaigns will harmonize media noise and offer deeper engagement. Customers will begin an experience on their laptop, re-engage in a public space, respond to a call to action on their phones, and connect the dots with attribution in a retail store. This chip is going to change the mobile, retail, and out of home industries forever.

So if you think this sounds like an important topic for the SXSW community, VOTE for Transmedia Storytelling in the Age of Proximity! And tell your friends too! #LTEDsxsw

Future of Transit: Thoughts on Helsinki

Perhaps flying cars are not the future of transportation.The city of Helsinki recently announced plans to transform its existing public transportation network into a comprehensive multi-modal system that, in theory, would render cars unnecessary. Our in-house urbanists and transit experts, Jeff Maki and Neysa Pranger, provide their perspectives on this ambitious plan and what it could mean for NYC.

What’s your overall perspective on the Finnish plan?

JM: I think the Finnish plan well accounts for the direction technology is going– personal, mobile, ubiquitous and on-demand. In fact, I was interested to read the Masters thesis, which was the basis of the Finnish recommendation because it applied the logic of the Internet and its development in the US, as well as trends in the American energy market, to the future of public transit.

It’s an interesting perspective and theory of evolution for transit, especially coming from a country that is typically more friendly to state-owned infrastructure as opposed to privately-run, “market driven” approaches, usually found in the US. The focus on “millennials” and their unique perspectives on public services was also great to see in the thesis.

NP: While seamless travel options through integrated wayfinding and payment are not new ideas, Northern Europeans are once again pioneering an innovative, city-scale transportation initiative much as they did with congestion pricing, bike share and pedestrian friendly streets. The proposed plan by the Helsinki City Planning Department will be watched by many as citizen expectation for reliable service rises. While at the same time, however, cities and states grapple with the ability to improve services due to structural (decline in gas tax revenue, for example) and political (enacting new tax revenue remains highly difficult) funding constraints.

But Helsinki is doing it at the right time, as those most likely to embrace the “shared economy” (e.g. Airbnb, Uber, BikeShare and TaskRabbit, etc.) move into a prime user demographic. Overall, what Helsinki is proposing is highly innovative but will also require intense collaboration between public and private providers, special attention to equitable provisioning and continual pilot testing on how different user segments’ needs are addressed.

What’s it going to take for people to give up their cars?

JM: To be honest, I think it’s just time. It’s already happening. There’s been a lot published recently about millennials and their declining rates of car ownership. You can take a bus from NYC to Boston for a few dollars now– it’s certainly not the price holding people back at this point. It’s about shifting expectations to a “shared” mindset.

You might summarize by extending some of the logic from the Finnish Masters thesis: if the road network (and the car) was a symbol of freedom to the “boomers”, the Internet might be that same network to millennials. And it’s our task as designers of personal mobility systems to figure out how to enable mobile devices and other Internet-connected things to provide that same sense of freedom afforded by the car. That’s the thing that will cause people to switch, I think.

NP: I think it’s useful to remember that in New York City at least, owning a car is difficult already as the City has multiple public and private systems, like ZipCar, buses, bike share and subways. As a result, nearly half the residents in Manhattan do not own a car and car ownership city-wide is on the decline.

But for users to move away from personal car ownership permanently, they’ll need to be presented with a time-competitive option for getting from point A to point B for a number of different purposes. Also key to this will be the frequency of service (how long will I have to wait?) and reliability (does it show up when it’s supposed to?).

Other requirements include:

- one or two seat rides: moving from one mode to the next can be cumbersome, especially for the elderly or parents with strollers;

- a cultural shift in perceived benefits of owning a car (going from ‘privilege’ to ‘curse’);

- support from mayors and governors, including strong messaging and the right package of policy incentives to back it up.

How will the Internet of Things play into this?

JM: This plan requires that shared services– buses, car rentals, taxis, subways, etc.– will need to be connected to users. The Internet of Things is that connection, so I see its role as bringing the ability to engage with more physical systems to our phones via the Internet (or whatever form that might take in the future.) And it’s important to note that the interaction will go two ways: transit operators get data from users, and users get data from transit operators.

NP: I completely agree. Helsinki will find it difficult to get their system off the ground without real-time data availability and connected systems– both of which will be powered by the Internet of Things.

Is such a system feasible in a city like New York?

JM: Of course. We already have many of the pieces here. A ubiquitous network of taxis, an extensive transportation network in the form of commuter rail, subway and bus; car share vendors, car rentals; informal bus options and two world-gateway airports.

If there’s any barrier to realizing the Finnish plan here, I think it’s the lack of integration. Elsewhere, one organization operates many of these modes, but in NYC you have multiple organizations and little integration, making using these services more tedious– different fare cards, different mobile apps, etc. Getting the MTA, Port Authority and the City to form a working group charged with integrating transport in the New York region would be a huge step towards the Finnish plan, and a way to encourage people to use other options.

NP: While Helsinki’s motives for developing Mobility as a Service are driven by trends in the marketplace, environment and demographics, New York would likely be driven by others: relieving traffic that wreaks havoc on the economy, improving public health and safety of pedestrians, and solving the ever-ominous need to fund better public transportation options. Over the last five years New York has pointed to congestion pricing as a solution but that has not proven feasible so far. But addressing New York’s needs can be done many different ways– including increasing the supply of other time-competitive options, such as bike to ferry or bus to bike. So the development of shared systems such as Helsinki could be realized in New York and publicly supported.

The MTA, for example, spends fifteen cents of every fare dollar paid towards collecting that fare. Sharing fare collection across ten different systems that collect $10 billion in annual revenue would mean savings in the range of $1.5 billion dollars. That’s a strong argument for integration!

What would you do for NYC?

JM: As one concrete proposal, I would better integrate paratransit into NYC’s mass transit system. It’s the publicly-operated system we have that closest to the type described in the Finnish proposal. It’s also one that receives a lot of Federal funding, so the potential to innovate around it is potentially huge. There were plans to replace paratransit with taxi vouchers a few years ago– but what if we added paratransit to the transportation network and redirected that money towards programs that serve both those with special mobility needs as well as the general public? There are challenges here, but nothing that can’t be solved.

NP: We could pilot a shared system in Lower Manhattan, where there’s already limited parking, a residential and business population and access to several public and private systems including bike share, PATH, buses, subways, and ferries.

Data Freedom: Part 1 of 3

“…but they’ll never take our freedom!”

When friends ask me about living in New York, I usually answer: “there are pros and cons to everything, and if you’re willing to take the cons to get the pros, it’s fantastic.” It may seem strange to say that maxim could relate to the exciting field of SaaS data management… but there it is.

If you’re a reader of this blog, by now you know that we here at CG love the cloud and love all that SaaS vendors can bring to the table. But all this “not-reinventing-the-wheel” stuff leaves many of us with the curious question of how to access data that’s been diligently sent off elsewhere. Sure, most SaaS vendors provide excellent reporting for the data they know. But sometimes we want to do reporting across vendors, across data sets, or even just different reporting than the vendor easily allows.

Enter our “CG Platforms” internal task force, powered by our Enterprise Architecture group. Our team’s mission is to see what we can do to free up our data. For the long haul, the answer is a full Master Data Management architecture with a full-functioning middle tier. In some cases, though, you just need something small that doesn’t require diving into the deep end of learning a vendor’s API. So, how do you balance out building something small while keeping an eye on full MDM as the end-goal?

Not shockingly, success in that balance can look different depending on the vendor, the needs, and even the time we want to spend. In a series of “Data Freedom” blog posts, we’ll take a look at a few cases of how that’s looked for us and which technologies we’ve used along the way.

First up, the appropriately-named Vendor 1, on the “quick and useful” end of the spectrum….

Vendor 1:  For this vendor, used by our Accounting department, the data is easy to access using its internal reporting capabilities. It also provides a feature to output nicely-formatted PDF reports. Those are helpful features for day-to-day use, but they don’t help us with any historical backup or reporting. What will we do the day Alexis de Tocqueville sets off to write the great history of Control Group? (We can dream.)

So how did we backup these dynamically created, static files in a way that’s easily accessible and both lightweight but updateable? The vendor’s own reporting was a great start– we could pull a listing with information on these reports, but for Alexis and his ilk, we wanted the prettied-up assets themselves… and we were certainly not going to click through every possible one manually.

Faced with this issue, we turned to an old-favorite, Selenium, which you may know and love as an automated regression-testing framework. But we’ve actually been able to use Selenium to do lightweight browser automation beyond just testing. For these purposes, it gave us out-of-the box tools to do a lot of the heavy-lifting. Once we imported and filtered the vendor’s regular report of the data we needed, we took a look at what Selenium could get us– with a major eye towards what we could use to re-run the script to update those documents over time.

We did it in three easy steps:

  1. First, we set up Selenium’s default cookie management (called the CookieStore class) to handle the security side of things. That allowed us to programmatically log into our account within an automated browser session.
  2. From there, we wanted to have the program ask the system to pull up a long list of PDFs. For that we used Selenium’s Java Http client libraries with the information we pulled from the vendor’s own report in order to manipulate a variable URL to send into our “browser session.” The effect was just like looping through PDF reports as if they were in regular browser tabs.
  3. Finally, we had to put those files somewhere. Organizing the files was just a matter of saving them out to a folder structure that made sense for future “manual retrieval” (by humans, not robots). Just as we easily pulled report information to manipulate which URLs to call, we pulled ID numbers and dates for each file.  Then the program could save out the files to folders with names that made sense, i.e. customer names, report IDs, dates. On an updated run in the future, the same system will file the new reports alongside the old.

Three functions.  Half a day.  Repeatable backups.  Have at it, historians!

See, it’s that easy!

Custom Electronics Hack Demo

At Control Group, we love to share our knowledge and eat pizza, so from time to time we host lunchtime “Drive-By” sessions to kill two birds with one stone. A few weeks ago, one of our engineers, Bob Paradiso, gave a few demos on connecting systems with various consumer electronics and appliances. His presentation provides viewers with a new outlook on every day products: all electronics are like lego blocks that can be connected to enhance their individual value and the overall user experience.

Enjoy the video (and apologies for the sound). More to come from Bob and the CG team!