Event Tracking with Javascript & Flask

This article explains how to set up raw website event tracking with Javascript and [Flask](http://flask.pocoo.org/docs/latest/) for deep analysis and data-driven decisions.

In this article I want to give a short tutorial of how you can set up cross-site website event tracking, like clicking on links etc., on the client side via Javascript and on the server side via the Python web micro-framework Flask. In the end, we are able to log predefined events from the client side, including remote address, identification cookie and a fingerprint of browser characteristics. With this raw tracking data you can perform deep analysis in e.g. R or SciPy for data-driven decisions.

Please note, that you have to respect the rights of the user to deactivate tracking. Another important remark: The presented solution is a minimalistic prototypical approach, therefore not suited for environments where high-throughput and scalability are important factors.

Big picture

I think it is best to start with an overview of the presented solution. We want to do asynchronous event tracking with Javascript, so we can either 1) use HTTP requests via AJAX to the same webserver where the website was loaded or 2) we can dynamically embed an image tag with a dynamic image href, loaded automatically, which is called a tracking pixel.

While the former can only be employed when our tracking server code runs on the same machine as the website is served from, the latter can be used to gather data across multiple different domains. We will use the latter approach. Therefore we embed our tracking code into the sites and let it communicate with a remote Flask server via image loading.

Client

Let us start with the client side. For a better understanding I will explain each method of the tracking class sequentially. If the method is of minor relevance, I will note it, and you can skip it without not understanding the whole idea. Let us assume you define the following in your script tracking.js, which you later include in all of your sites. See this gist for the whole script. I used Basil.js for getting and setting cookies, do not forget to include it.

// Javascript object as an API for event triggering 
// and handling communication with remote server
var U = function() {
  this.url = "http://yourdomain.com/t?s=1&v=1";
  this.setCookie = ...; // Set initial cookie
  this.getCookie = ...; // Get user cookie
  this.sendEvent = ...; // Send event, used by the site code
  this.getFingerprint = ...; // Get fingerprint
}
window.u = new U();

The setCookie and getCookie method are straight forward functions for setting and getting a cookie with a random value identifying the client. We could also take the client fingerprint into account when generating the cookie value. The more interesting methods are getFingerprint and sendEvent: What is the basis of the fingerprint and how is the event data and client information transfered to the remote host? The getFingerprint method returns a hash of the user-agent, screen resolution, installed plugins and versions. The code is quite long, so if you are interested in the details I like to refer to the full code. Let us proceed with the body of the sendEvent method.

// sending client event info to the remote server
this.sendEvent = function(eventdata) { 
  if(window.navigator.doNotTrack == "1" || 
    window.navigator.doNotTrack == "yes") 
    return;  
  var props = {"u": this.getCookie(),"fp": this.getFingerprint(),
               "sr": this.getScreen(),"e": eventdata};

  var query = this.url + "&";
  for(var k in props) {
    query += k + "=" + props[k] + "&";
  }
  document.createElement("img").setAttribute("src", query);
};

Note the check if the doNotTrack parameter is set. As I mentioned in the beginning, you should respect the right of the user to disable tracking. For the transmission of the client information and the event data it is sufficient to add the properties as GET parameters in the source URL of the requested image. We are able to intercept these parameters on the server side easily. If you want to send an event to our tracking server, you can globally call window.u.sendEvent(event) and pass the event.

Server

The server code is implemented in Python. We will use the simple micro-framework Flask to get the job done. Flask applications can be very lightweight on a basis of just one file. If you are not familiar with Flask, I like to refer to the tutorial page. Here, we just handle one route and handle the GET parameters. For the following we need these imports:

from flask import Flask
from flask import request
from time import strftime, time
import os

sites = {"1":"trackedsite.com"}
app = Flask(__name__)

The dictionary of sites is used for decoding the site id, which is sent with the event data. Now, we define the route which is used on the client side in the tracking script. For the sake of simplicity and easier data import into the analytics tool chain the route handler writes the client side event data into CSVs grouped per day. In the following I will show snippets of the route handler. Let us start with the first part.

@app.route("/t", methods=['GET'])
def track():
   def rargs(arg):
      args = request.args
      return args[arg] if arg in args else "null"
   def nullwrap(arg):
      return arg if arg != None else "null"

   sitename = sites[request.args['s']]
   day = strftime("%D").replace("/","-")
   folder = "/var/trackingdata/"
   filename = folder + sitename + "/" + day + ".csv"

   # create file if not existent
   if not os.path.isfile(filename):
      open(filename, "w+").close()

   ...

First of all we need to define a new route /t. When we send an HTTP GET request to this url, the route handler will be used to construct the response. Depending on the site which is given by the parameter s, the server will select a a folder, where it writes the CSV files. As I mentioned earlier, for convinience, we group the CSVs by day. For this, we create the file if it does not exist.

  ...    
   hnd = open(filename,"a")
   a = [str(time()), # timestamp
    rargs('s'), # site id
    rargs('v'), # version of site
    nullwrap(request.remote_addr, # ip address
    rargs('u'), # user cookie 
    rargs('fp'), # user fingerprint
    rargs('sr'), # screen resolution
    nullwrap(request.headers.get('User-Agent')),
    nullwrap(request.headers.get('Referer')),
    rargs('e')] # event

   for p in a:
      hnd.write(p + "\t")
   hnd.write("\r\n")
   hnd.close()

In the second part of the request handler, we open the according CSV file in append mode and write the client information as well as the event data as comma-seperated values to a new line. This gist contains the full code of the server.

Deployment

To run the server on all public IPs of your server on port 5000, add the following line to the server script. With this you are able to run the standalone server via python app.py.

app.run(host='0.0.0.0')

For production system, I would recommend running the server script with a uWSGI middleware to serve it via NGINX. If you want to continue in this direction, I like to refer to this tutorial. For deploying your tracking server to a hosted plattform, I like to refer to this list of free plans for small projects.


One or two mails a month about the latest technology I'm hacking on.