Sure, it’s not exactly like Scottish independence, but I feel like William Wallace might still give us the nod for our own effort at (data) freedom.
A few weeks ago we started looking at data freedom because, while there are many advantages to using SaaS vendors, there are some issues to keep an eye on. One of those issues is finding ways to access and use the data that’s been sent out into the vendor’s system. The first installment of this series was about a small problem with a fast solution. We didn’t have to worry about real-time or frequently-changing data.
But for Vendor 2, things weren’t so easy. Like well-known #2s Art Garfunkel and Ed McMahon, Vendor 2 is easy to overlook but nonetheless necessary on a day-to-day basis. Vendor 2 is one of those internal tracking vendors we use every day with data that changes quickly and often.
Vendor 2 got the job done for us, but sadly, their reporting left something to be desired. Sure, they had reports, but there was no way to link to external data. And don’t get me started on getting it to do any complicated slicing-and-dicing. We ended up with a lot of people who needed to pull down spreadsheets and re-do the same calculations month after month. We heard the cries from people-who-will-remain-nameless (but who are me): “I can write the darn SQL if you just let me!”
So, how did we setup a system that uses SaaS vendor data but reports the way we want it to? We setup a system to copy their information to a database we control… and then we wrote the darn SQL.
Easier said than done, for sure. For this case, we called in bigger guns and took a look at Talend, a full Enterprise Service Bus (ESB) solution that enabled us to create our own data store. The goal was to create a data store on our own terms that can auto-update as information changes on the vendor’s side in near-real time via Vendor 2’s full-featured API. Now we can do what we need with the data: write the SQL for static reports or hook up a BI tool to view it. Whatever we need.
Just that easy? Well…
In this case, we used the Community Edition of the ESB to see what it could do. One thing we found right away was that Talend organizes things two ways: “jobs” and “routes.” The routes side is based on something Enterprise Architecture veterans will know well as Apache Camel. Working with an agreed-upon standard has its own advantages, but we also found routes to be more robust than jobs. For instance, they had an ability to handle exceptions, such as the API responding slowly, or handling cases where we needed to “page through” long sets of data. With that, we were off and running with a few hurdles to hurdle.
Nice Flow Diagrams Do Not Mean Non-Technical: Starting with a “route”, we went data object-by-object to create a parallel data model on our side so we could write the SQL and map each to a specific API call. To the uninitiated, not-so-user-friendly Camel calls look like this:
.setHeader("RowCountInPage").groovy("headers.RowCountInPage = headers.RowCountInPage.isInteger() ? headers.RowCountInPage.toInteger() : 0")
Not exactly drag-and-drop syntax. That’s a fairly simple one, actually, but even still it’s using Camel along with groovy templating — and it can only be viewed or edited via a “property” of one of those flow icons, not in a text file. The GUI aspect falls away fairly quickly.
In short, this is a case that called for real development. It’s not rocket science but also not to be taken lightly. Don’t let the nice flow diagram fool you.
An API Is A Unique Blossom (sometimes): On the Vendor 2 side of things, they do have an API, but there were no quick answers here. You can do an awful lot with a full-featured API, but it might take a while to learn how to do it, as each API is a little different. In this case, each call required crafting a specific XML structure, with a unique manner of getting large data sets by page and sometimes opaque terminology. There was no easy “getProjects()” type of call to fall back on. We were able to work our way through Vendor 2’s documentation but it also made us appreciate a solution like we designed for Vendor 1, which allowed us to avoid that level of mucking about in somebody else’s data model.
And Here You Thought Things Like Version Control Were Straightforward: Just when you thought you had git mastered and thought it’d be easy to work in a team again, along comes a centralized system like this. As it turns out, a Talend workflow isn’t just based on a few nice and editable XML files. Instead it creates sets of files and auto-updates portions of them in unpredictable ways. For instance, the updated date is part of a project file, so every save “changes” the project. Be sure to close Talend before committing your files since they change when the Studio product is running!
Talend, the company, wants you to upgrade to the paid edition to have their internal version control, but that would also mean a separate repository specifically for their tool. In the end, we got it to work in our version control and lived to tell the tale. Unfortunately there were bumps in the road in places we thought might roll like the autobahn.
In general, Talend worked for us, but using the Community Edition wasn’t always so straightforward. For instance, going with the “routes” side of Talend skewed from Talend’s main offering in favor of the more standard Camel implementation. Using routes meant we could leverage lots of Apache Camel documentation but it cut off all sorts of Talend’s own forums and documentation, which were focused on the “jobs” side. Alas, there wasn’t an easy middle-ground to utilize the positives of both sides.
In the end, Vendor 2 was a lot more work to integrate than Vendor 1. That’s no surprise. But, now that we have it up and running, the volume of information we’re now capturing and updating is huge. Now that we have it implemented we can write those reports however we want: Business analytics packages, home-written darn-SQL statements, etc. And the Excel re-work won’t be necessary. We did this all without touching the main functions of Vendor 2.
We took on a lot more configuration work, but now find ourselves with a full backup of our data– able to do what we want with it, not what we can. This level of integration also makes us a little less dependent on Vendor 2. Should we need to swap them out someday, we will start with all our historical data completely at the ready.
After all, even Simon and Garfunkel eventually broke up.