Jovian Data Analysis
I started by exporting the production data and importing it to a local database. Then I fired up Jupyter, loaded up pandas and sqlalchemy, and ran some SQL to load up data into a few variables, explained below.
Activity Type Counts
Once we had those variables, we ran a few counts to see the percentages of users who have created different content types.
As you can see, a very small percent (0.56%) of users created any posts at all, let alone in five separate topics. At this point, we can see that the algorithm as it stands provides little value for the majority of users.
Let’s modify the feature in question to make it applicable for most users. Instead of suggesting users who have posted in five similar topics, we’ll suggest users who have activity (i.e. post, comment, or upvote) in five similar topics. Now that we’ve (hopefully) updated the algorithm to apply to the majority of users, let’s count how many topics each user has activity within.
Let the Data Drive
As you can see in the topic counts, the vast majority (over 92%) of users have activity in only one topic. At this point, it’s obvious that this algorithm wouldn’t apply to many of the users at all.
Through quick and dirty analysis, we were able to save Empact time and money — and we saved ourselves time, too, by choosing not to build a feature that no one would get to use.
A final note: As data lovers know, finding a single answer is never the end of the road; it inevitably lead to more questions. So, it’s important to timebox an activity like this — or else the time and money saved will end up being a wash.