Do Users Need that Feature? We Used Data to Find Out

By applying data analysis to the agile development process, we can validate (or invalidate) assumptions — and ultimately save our clients time and money.

Written by
Daniel Searles
November 20, 2017
Filed under:
Data Science

As a Senior Software Engineer and a Data Science Apprentice, I’m always looking for ways to improve our process by analyzing the data I have around me. Let’s walk through one example of how I applied data analysis to the agile development process.

Essential Simplicity

“Agile” is a loaded term: it means different things to different people. To anchor this post, we’re going to take a look at how to use data analysis to apply the 10th principle of the Agile Manifesto. "Simplicity — the art of maximizing the amount of work not done — is essential.”


The example that we’ll be walking through is a feature that was designed for the Empact Collaboration Platform. Before we examine the feature, let’s dive into what Empact is. Here’s a quote from Empact:

“Empact Collaboration Platform is a SAAS platform for communities small and large to communicate, collaborate, community-curate, and aggregate collective knowledge in order to determine best practices or identify solutions to problems.”

On the platform, users upload/create different types of content that are then placed into topics. Here are a couple screenshots of the platform with some annotations.

Empact Topic Page

Data science agile software development
Data science agile software development

The Hypothesis

As a part of development on the platform, a new feature was designed to suggest users for you to follow based on a few algorithms.

As I was reviewing the algorithms, one of them jumped out at me. It was supposed to show users who created posts in five topics that I also posted in. I thought about my observations from watching users using the platform and I stated this hypothesis: “There are not enough users who post to five different topics for this feature to be viable.” Rather than assuming that my hypothesis was a fact and that this invalidated the feature, I decided to look at the data to see if it could be proven.

Jovian Data Analysis

I started by exporting the production data and importing it to a local database. Then I fired up Jupyter, loaded up pandas and sqlalchemy, and ran some SQL to load up data into a few variables, explained below.

Load Data From SQL

Activity Type Counts

Once we had those variables, we ran a few counts to see the percentages of users who have created different content types.

Data science agile software development

Topic Counts

As you can see, a very small percent (0.56%) of users created any posts at all, let alone in five separate topics. At this point, we can see that the algorithm as it stands provides little value for the majority of users.

Let’s modify the feature in question to make it applicable for most users. Instead of suggesting users who have posted in five similar topics, we’ll suggest users who have activity (i.e. post, comment, or upvote) in five similar topics. Now that we’ve (hopefully) updated the algorithm to apply to the majority of users, let’s count how many topics each user has activity within.

Data science agile software development

Let the Data Drive

As you can see in the topic counts, the vast majority (over 92%) of users have activity in only one topic. At this point, it’s obvious that this algorithm wouldn’t apply to many of the users at all.

Through quick and dirty analysis, we were able to save Empact time and money — and we saved ourselves time, too, by choosing not to build a feature that no one would get to use.

A final note: as data lovers know, finding a single answer is never the end of the road; it inevitably lead to more questions. So it’s important to timebox an activity like this — or else the time and money saved will end up being a wash.