Opitmizing my Django Reporting App with Google Pagespeed Part 1

This is what Google Pagespeed is telling me at the moment

Let’s tackle them one at the time. This is also my first time doing this so if I can do it, so can you.

Enable Compression

The common wisdom is for your CSS and JS files to not go through Django. It should go through your webserver like Apache. If you’re a noob, like myself, you might be wondering why should CSS and JS not go through Django? The answer to that is can be found in this question. However, in this case, the reporting app is an internal application and there is less need for more complicated setups.

Here are the steps:

  1. Get django-compress
  2. Get CSSTidy using sudo apt-get install csstidy
  3. Add ‘compress’ to INSTALLED_APPS in your settings.py file
  4. I had to modify my settings file to the snippet below. CSS_ROOT and JS_ROOT is the path to my CSS and JS files on my local machine respectively. I know I ought to be generating this dynamically. I don’t quite know how to do it at the moment and it is something I will get to eventually.
    COMPRESS = True
    COMPRESS_VERSION = True
    COMPRESS_CSS = {
        'all': {
            'source_filenames': (CSS_ROOT + 'main.css', CSS_ROOT + 'redmond/jquery-ui-1.7.1.custom.css', CSS_ROOT + 'redmond/ui.daterangepicker.css'),
            'output_filename': 'css/all_compressed.css',
        }
    }
    
    COMPRESS_JS = {
        'all': {
            'source_filenames': (JS_ROOT + 'daterangepicker.jQuery.js', JS_ROOT + 'jquery-ui-1.7.1.custom.min.js', JS_ROOT + 'jquery-1.3.1.min.js'),
            'output_filename': 'js/all_compressed.js',
        }
    }
  5. Next, I had to modify my base.html template. The script and link references have been removed so that my base.html file now has this instead:
    <html>
    	<head>
            <meta http-equiv="Content-type" content="text/html; charset=iso-8859-1">
            <meta name="description" content="Reporting and Analytics" />
    		<title>Insync Reporting</title>
    
    		{% load compressed %}
    		{% compressed_css 'all' %}
    		{% compressed_js 'all' %}
    
    .
    .
    .
Results

The minify JS and CSS recommendations are now gone. The score has gone up from 39 to 45. As for the recommendations about gzip, this is out of my scope for the time being (I hate sys admin) as it involved configuration on the webserver which I do not have the inclination to tackle at the moment. This took me all of about 1 hour max. Healthy returns for time spent IMO.

Advertisements

Building a Simple Reporting App in Django: Part 2

In this post, I’m gonna dive right in and show you what happens every step of the way from the time you submit the form to when the graph gets rendered. BTW, Part 1 is here for anyone who is interested.

So previously, we have the following views file

def activation_signupcount(request):
    if request.method == 'POST':
        date_segment_form = DateAndSegmentForm(request.POST)
    else:
        date_segment_form = DateAndSegmentForm()
    return render_to_response('activation_signupcount.html',
                            {   'date_segment_form': date_segment_form,
                                'display_graph': date_segment_form.is_valid()   })

As you can see, an ’empty’ DateAndSegmentForm() is created upon request of the page. The user then inputs the relevant fields (dates and usersegment) and submits the form.

This is what the activation_signupcount.html looks like:

{% extends "base_activation.html" %}

{% block content %}
	<h2>Sign ups</h2>
	<p>Number of users who have registered to the site between the dates entered</p>
{% endblock %}

{% block graph %}
	{% if display_graph %}
	<p>This counts the number of <strong>{{ date_segment_form.cleaned_data.usersegment }}</strong>
	users who have successfully registered to the site (verified their email)
	who joined between <strong>{{ date_segment_form.cleaned_data.datestart }}</strong>
	and <strong>{{  date_segment_form.cleaned_data.dateend }}</strong></p>
	<img src="./activation_signupcount.png?usersegment={{ date_segment_form.cleaned_data.usersegment }}&datestart={{ date_segment_form.cleaned_data.datestart }}&dateend={{ date_segment_form.cleaned_data.dateend }}" />
	{% endif %}
{% endblock %}

You can see that the contents within block graph gets displayed only if display_graph is true. From the views file, display_graph is only true when the form submitted is valid. This is the most basic level of form validation. One of the cool things I learnt is that you can access the submitted data in the form directly as a variable within the template. Previously, I used to set each field as a variable in my view file and then call the variable from my template. That step is now skipped and the result are variables that look like this {{ date_segment_form.cleaned_data.usersegment }} where date_segment_form is the variable I’ve set in my views file, cleaned_data is a built in forms function that makes the html output look pretty and usersegment is the name of the field in the form that I want.

So now, the following image is constructed dynamically depending on the input variables.

<img src="./activation_signupcount.png?usersegment={{ date_segment_form.cleaned_data.usersegment }}&datestart={{ date_segment_form.cleaned_data.datestart }}&dateend={{ date_segment_form.cleaned_data.dateend }}" />

Taking a peek in the urls file shows that this url is mapped to the following view

(r'^activation/signupcount/percentchange/activation_signup_percentchange.png/$', 'activation_signup_percentchange_graph') 

def activation_signupcount_graph(request):

    return generate_linegraph(  activation_signupcount_graph.__name__,
                                request.GET['datestart'],
                                request.GET['dateend'],
                                request.GET['usersegment'] )

So this view calls a function generate_linegraph() which looks like this:

def generate_linegraph(func_name, datestart_string, dateend_string, usersegment=''):

    data = generate_data(func_name, datestart_string, dateend_string, usersegment)

    # insert dates with 0 values
    full_dates = generate_dates(datestart_string, dateend_string)

    extradates = []

    for x in full_dates:
        if x not in [y[1] for y in data]:
            extradates.append(('0',x)) #SQL Query returns count then date

    data.extend(extradates)

    data = [(k[0],datetime.datetime.strptime(k[1],'%Y-%m-%d')) for k in data]

    data.sort(key=lambda x: x[1])

    # create classes
    fig = Figure(figsize=(10, 7), dpi=80, facecolor='w')
    ax = fig.add_subplot(111)

    # populate x and y coordinates
    x = []
    y = []

    x = [k[1] for k in data]
    y = [k[0] for k in data]

    # plot x and y coordinates
    ax.plot_date(x, y, color='blue', ls='--', marker='o',
                markerfacecolor='green', markersize=6)

    # format how the dates on x-axis look
    ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d'))
    fig.autofmt_xdate()
    ax.set_title(func_name)

    # name x and y axis
    ax.set_xlabel('Date')
    ax.set_ylabel('Count')

    # set x and y axis
    ax.set_ylim(0)

    canvas = FigureCanvas(fig)
    response = HttpResponse(content_type='image/png')
    canvas.print_png(response)

    return response

This is mostly Matplotlib but what I have done is separated the data query and graph output into 2 parts. As you can see, another function called generate_data() is called from within generate_linegraph() which turns a list to the function. You can check out where I got most of the original Matplotlib code from here. I’ve also prepared a list of dates from date_start to date_end to fill in the gaps in the returned data. What happens is that if a particular day has zero sign ups, this will not be reflected in my SQL query. I could have inserted this logic in that layer but seems cleaner (and easier) to do it here. generate_dates() snippet can be found here.

generate_data() looks like this

def generate_data(func_name, datestart_string, dateend_string, usersegment=''):

    timepattern = '%Y-%m-%d'
    datestart_int = int(time.mktime(time.strptime(datestart_string, timepattern)))
    dateend_int = int(time.mktime(time.strptime(dateend_string, timepattern)))

    #SQL Queries
    activation_signupcount_graph = """
    SELECT IFNULL(COUNT(1),0), FROM_UNIXTIME(registered_at, %(timepattern)s)
    FROM users
    WHERE registered_at BETWEEN %(datestart_int)s AND %(dateend_int)s
    GROUP BY FROM_UNIXTIME(registered_at, %(timepattern)s);
    """
    args =  {
            "timepattern" : timepattern,
            "datestart_int" : datestart_int,
            "dateend_int" : dateend_int
            }

    query_map = {
                 'activation_signupcount_graph': activation_signupcount_graph,
                 }

    cursor = connection.cursor() 

    cursor.execute(query_map.get(func_name), args)

    data = list(cursor.fetchall())

    cursor.close()
    connection.close() 

    return data

I’m doing a few things here: I do some datetime formatting which is peculiar to my case as the dates are stored as unix time. The function cursor.execute(query_map.get(func_name),args) is the main point of this function. It traverses a dictionary called query_map (other objects omitted) via the func_name and executes the SQL I’ve written. Note that the order in which the data is being returned here is important. In this case, it is count followed by date. If you look back to generate_linegraph(), you can see the setting of the x and y axis based on the index value of the list being returned as such:

x = [k[1] for k in data]
y = [k[0] for k in data]

So the data is then returned to generate_linegraph() which in turn returns a response of the image. At this point you can use Matplotlib functions to make the graphs look prettier which I have not done. Upon which, an image response is return and is finally shown on the webpage itself.

That concludes my walkthrough on how to create a basic reporting app in Django.

Building a Simple Reporting App in Django: Part 1

“Let’s parse the data and free the facts” – Someone wiser than me

My previous posts on “How I am Learning Python from Scratch” is becoming increasingly inaccurate. Based on what I have been doing the last 3 months or so, a title such as “How I am Learning Web Development from Scratch” or “How I am Learning Django from Scratch” is probably more apt.

Anyways, I’ve been busy working on a reporting app for Insync the last few weeks. This is my de facto project for my internship. The app has gone through many iterations and I’ll do my best to show the before/after results. You can read about Part2 here.

Overall User Flow

  1. Click on the metric you want to view
  2. Enter date range and submit form
  3. Page returns an ugly time series graph

Under the Hood

  1. Laying out the project
    1. I’m using Dave McClure’s excellent AARRR metrics guideline to breakdown the application. Each one of the AARRR is its own app within the project. Previously, I had everything all in one app which was a big mistake! My views.py file was overwhelming. I wasn’t sure when was a good time to break out the app into many apps. Protip: If you’re already unsure or thinking about it, then it’s probably a good idea to do it. Each app should perform a different function and it doesn’t matter how big or how small this function is. In this case, each metric is addressing a different part of the business so it’s intuitive to break it out into its own app. My main urls file looks like this to give you a better idea:
      urlpatterns = patterns('',
          (r'^', include('reporting.mainapp.urls')),
          (r'^', include('reporting.activation.urls')),
          (r'^', include('reporting.acquisition.urls')),
          (r'^', include('reporting.retention.urls')),
          (r'^', include('reporting.referral.urls')),
          (r'^', include('reporting.revenue.urls')),
          (r'^', include('reporting.engagement.urls')),
          (r'^admin/doc/', include('django.contrib.admindocs.urls')),
          (r'^admin/', include(admin.site.urls)),
          (r'^media/(?P<path>.*)$', 'django.views.static.serve',
                       {'document_root': settings.MEDIA_ROOT}),
          (r'^css/(?P<path>.*)$', 'django.views.static.serve',
                       {'document_root': settings.CSS_ROOT}),
          (r'^js/(?P<path>.*)$', 'django.views.static.serve',
                       {'document_root': settings.JS_ROOT})
      )
    2. Each app has a base template that is inherited from a main base template. For example, activation_signups.html inherits from base_activation.html which in turn inherits from base.html.
    3. Each app also contains a utility.py file where all the helper functions are located. This is the file where I write my SQL queries to generate the data and format them accordingly to the kind of output that I desire like a linegraph or a piechart. The library I am using to generate the graphs is Matplotlib.
  2. Forms, gotta love them
    1. I went from this in my template:
      {% if usersegment == "non-paying" %}
      <input type="radio" name="usersegment" value="non-paying" checked="yes"/> Non-paying Users <br />
      <input type="radio" name="usersegment" value="paying" /> Paying Users <br />
      <input type="radio" name="usersegment" value="all" /> All Users<br />
      {% endif %}
      
      {% if usersegment == "paying" %}
      <input type="radio" name="usersegment" value="non-paying" /> Non-paying Users <br />
      <input type="radio" name="usersegment" value="paying" checked="yes"/> Paying Users <br />
      <input type="radio" name="usersegment" value="all" /> All Users<br />
      {% endif %}
      
      {% if usersegment == "all" or not usersegment %}
      <input type="radio" name="usersegment" value="non-paying" checked="yes"/> Non-paying Users <br />
      <input type="radio" name="usersegment" value="paying" /> Paying Users <br />
      <input type="radio" name="usersegment" value="all" checked="yes"/> All Users<br />
      {% endif %}

      to this:

      forms.py

      usersegment = [['non-paying','Non-paying (Coming soon)'],['paying','Paying (Coming soon)'], ['all', 'All']]
      
      class SegmentForm(forms.Form):
          usersegment = forms.ChoiceField(label="", widget=RadioSelect(renderer=HorizontalRadioRenderer), choices=usersegment, initial='all')

      template

      <form method="post" action="">{% csrf_token %}
      <h3>Choose Segment</h3>
      {% if segment_form.errors %}
      <p style="color: red;">
      Please correct the error{{ form.errors|pluralize }} below.</p>
      {% endif %}
      {{ segment_form.as_p }}
      <p><input type="submit" value="Submit" /</p>
      </form>

      views.py

      def activation_signupcount(request):
          if request.method == 'POST':
              segment_form = SegmentForm(request.POST)
          else:
              segment_form = SegmentForm()
          return render_to_response('activation_signupcount.html',
                                  {   'segment_form': segment_form })

      It’s so beautiful, Django forms take care of everything!

  3. Datepicker
    1. The first date picker I used was a basic version from jQuery UI. Later, I found a souped-up version that was later put together by the folks at Filament. You can find it here. Basically, the newer version has preset dates which really helps with the usability. The preset dates are also customisable.

In my next post, I will talk about how the graphs are generated using Matplotlib and how the data is retrieved to populate the graphs. Thanks to the guys are /r/django for helping me out too!

Code Snippet: Generate a List of Dates

Here’s a short function you can use to generate a list of dates with the input parameters as ‘datestart’ and ‘dateend’.

def generate_dates(datestart_string, dateend_string):

        dates = []
        timepattern = '%Y-%m-%d'        

        ds = datetime.datetime.strptime(datestart_string, timepattern)
        de = datetime.datetime.strptime(dateend_string, timepattern)

        delta = datetime.timedelta(days=1)
        while ds <= de:
                dates.append(ds)
                ds += delta

        date_string = [datetime.datetime.strftime(date,timepattern) for date in dates]

        return date_string

So, print generate_dates(‘2011-01-01’, ‘2011-02-01’) will give you a list of dates as follows:

['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08', '2011-01-09', '2011-01-10', '2011-01-11', '2011-01-12', '2011-01-13', '2011-01-14', '2011-01-15', '2011-01-16', '2011-01-17', '2011-01-18', '2011-01-19', '2011-01-20', '2011-01-21', '2011-01-22', '2011-01-23', '2011-01-24', '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28', '2011-01-29', '2011-01-30', '2011-01-31', '2011-02-01']
This function can also be written as a generator like so:
def generate_dates(datestart_string, dateend_string, timepattern="%Y-%m-%d"):
    ds = datetime.datetime.strptime(datestart_string, timepattern)
    de = datetime.datetime.strptime(dateend_string, timepattern)
    delta = datetime.timedelta(days=1)      

    while ds <= de:
            yield datetime.datetime.strftime(ds, timepattern)
            ds += delta

What is the difference you might ask? From what I understand, generators are more efficient or faster. In this case however, even if you were to set the start and end date to be 10 years apart, you will only have 3650 data points which is a small amount of data to be working with.