Data Science vs. Data Mining vs. Machine Learning
"A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician." — Josh Wills of Slack
Data science is a field of study that encompasses everything we’ve been talking about so far, including data mining, machine learning, deep learning, statistics, and much more. Data science focuses on the science of data, while data mining focuses on the process of discovering new patterns in big data sets. And as we’ve already established, deep learning is a type of machine learning.
But, perhaps more than any of the other terms we’ve discussed, “data science” has proven difficult to define. Perhaps that’s why there are so many definitions floating around. We especially like the one Cassie Kozyrkov, chief decision intelligence engineer at Google, came up with: “Data science is the discipline of making data useful,” she wrote in a Hacker Noon blog post. (Of course, she goes into much more detail, but that tweetable phrase captures the essence of her post.)
Over at CIO, Thor Olavsrud came up with a somewhat similar, albeit longer, definition:
Data science is a method for gleaning insights from structured and unstructured data using approaches ranging from statistical analysis to machine learning. For most organizations, data science is employed to transform data into a value that might come in the form of improved revenue, reduced costs, business agility, improved customer experience, the development of new products, and the like.
One thing is for sure: It’s hot. A 2012 Harvard Business Review article called data scientist “the sexiest job of the 21st century.” Then, in 2018, Glassdoor named it the best job in America — just as it did in 2016 and 2017. And, based on the work of Microsoft’s Jim Gray, data science has been referred to as the fourth paradigm of science. The other three are empirical observation, theoretical approaches, and computational science.
A Team Sport
We should add one more thing about data science: It's a team sport. If you've worked with us before or follow our blog, you know we fully embrace a DevOps approach in everything we do. So, of course, we see data science as a team sport. But this isn't just our opinion:
"The biggest value a data science team can have is when they are embedded with business teams. Almost by definition, a novelty-seeking person, someone who really innovates, is going to find value or leakage of value that is not what people otherwise expected," Ted Dunning, chief application architect at MapR Technologies, told CIO. He recommends embedding data scientists in DevOps teams. So do we.
Much More to Learn
We’ve barely touched on the basics of these issues. If you want to learn more, the web is full of resources. And if you are looking for a deeper dive, consider a machine learning and data-mining book. Unsurprisingly, there plenty to choose from. For instance, TopTalkedBooks provides a list based on recommendations from Hacker News, Reddit, and Stack Overflow. If money isn’t an object, Springer’s Encyclopedia of Machine Learning and Data Mining is available for $749.
And, of course, our data science team is always available to help. If you’re ready to mine your data sets for insights that can transform your company, consider working with us. We’re applying data science to software product development. It’s a new frontier in an industry where new frontiers are rare.