In the first post in this series, I covered a personal project: LG Console. In this post I’m going to discuss a project I worked on earlier in 2012 for Jenkat Media, Inc., a company I helped co-found back in 2007.
clvr.io
With JenkatGames.com we have had a long-standing problem: we don’t really have a good idea on how much a new user is worth to us. Much of our user base does not register with the site, and since the content is all free, there’s no transactions to track. We decided to start tracking various actions, starting with page views, that generated small amounts of revenue. To that end, we decided on deploying a small REST service on an external host and having our server hit it when a trackable action occurred.
Clvr.io was built using:
- Java SDK 1.7
- JAX-RS
- JSON
- MongoDB
- Jetty
Overall, it was a pretty small application and was never fully fleshed out. I would call the stage we stopped development as Pre-Alpha. The data would be prepared in PHP and then sent via an asynchronous cURL call to the external server hosting Clvr.io.
$guid = processGuid(); $userIp = ip2long($_SERVER['REMOTE_ADDR']); $actionTimestamp = time(); $browser = $_SERVER['HTTP_USER_AGENT']; $fullUrl = full_url(); $path = $_SERVER['REQUEST_URI']; $data = '{"userId": "'.$guid.'", "userIp": '.$userIp.', "actionTimestamp": '.$actionTimestamp.', "browser": "'.$browser.'", "fullUrl": "'.$fullUrl.'", "path": "'.$path.'"}'; curl_post_async("http://app.clvrio.net:80/services/action",$data);
As you can see, it was pretty dirty code. Instead of using PHP’s built-in json_encode, I simply appended the data all together, but this was hardly more than a prototype. On the server side, the data was retrieved and converted from a JSON string into an Action, through some JSONDeserializer magic, provided by Flexjson.
@POST public void post(String actionJson) { try { JSONObject json = (JSONObject)new JSONParser().parse(actionJson); Action action = new JSONDeserializer().deserialize(json.toString(), Action.class); if (!actionDao.isConnected()) { actionDao.connect(); } actionDao.save(action); if (actionDao.isConnected()) { actionDao.disconnect(); } } catch (ParseException ex) { Logger.getLogger(ActionService.class.getName()).log(Level.SEVERE, null, ex); } }
The idea was to save the data as quickly as possible, since we were dealing with tens of thousands of these calls each day, and then process the data at a future time. Based on the URL the user hit, we would have a good idea on how much revenue they generated.
Unfortunately, there were a few issues with that approach. Some browsers will act very strangely, with image tags with empty sources, and other things, causing them to request your webpages multiple times. In addition, lots of traffic you receive isn’t “real” traffic. Google Analytics does a good job of minimizing it, but bots of all types hit popular websites, and many of them do not properly identify themselves. The data we were collecting was very noisy and we were constantly having to try new ways to clean it.
Most of our visitors do not register and do not login to the site. There is no straightforward and reliable way to uniquely identify a anonymous users between visits. Cookies work, but they are dumped by users with such regularity that they couldn’t fully counted upon. We had a number of techniques to attach a pretty consistent GUID to a user, through a combination of cookies and Flash Objects/supercookies, but they were not fool proof.
I also don’t think we thought through how much strain this would place on our web server. Sending out tens of thousands of outgoing web service calls every day is going to have an impact on CPU utilization, and it did, in the 10-20% range.
We later tried KissMetrics, and while it was somewhat promising at first, it isn’t really suited for what we were trying to do. They’re focused more on SaaS and other forms of recurring revenue, with the ability to provide a solid signup and cancellation date. We were sending in lots of data, and the numbers were fluctuating all over the place because none of our users would ever officially cancel. Still, overall I came away with a good impression of the product, and would likely recommend it if you’re in their target market.
Learning Through Failure
The mistakes:
- Too many assumptions
We assumed too many things about our traffic and the ability to identify and track anonymous web traffic. A number of the issues were already mentioned, so I won’t rehash those, but I was surprised at the amount of junk traffic that hit our website, and the issues that some browsers have with small HTML glitches.
The positives:
- MongoDB
Most of my development work has been with PostgreSQL and MySQL, so it was good to interact with a completely different type of datastore. - Heroku & PaaS in General
First exposure to a PaaS in the form of Heroku. We had originally planned to deploy this on EC2, but I wanted less overhead involved in setting it up and maintaining it, plus I had wanted to delve into a non-Amazon NoSQL option. I had recently attended a conference which had covered some of the NoSQL options out there. Cassandra had a little more focus than most, but I was more intrigued by MongoDB. Heroku has great MongoDB support from multiple providers, so it seemed like a good fit.
The End of Clvr.io?
In this form, targeting a consumer gaming website, Clvr.io is most definitely dead. On the new version of JenkatGames.com (still in development, release date TBD) we’ve been working on simply utilizing Google Analytics as best we can. In addition to standard client side javascript calls, we’ve been working with the PHP-GA library, which enables almost the entirety of the Google Analytics reporting functionality in PHP.