Financial institutions are stuck between a rock and a hard place. Legacy systems and processes make innovation painful, and user expectations have never been higher. Many technology leaders struggle to lead a digital transformation, with the added pressure of knowing that their firm’s future depends on their success.
The promises of business intelligence and advanced analytics are enticing, but many firms just aren’t ready to implement these sophisticated solutions. If your data exists in different aging systems that won’t talk to each other, don’t be discouraged. You can lead a transformation to take your company from asking “What happened?” to asking, “What will happen next?
How? Investing in data warehousing technology might be the solution. We’re working with a client to help them make that transition, and in this post, we’ll talk about the things we’ve learned in the process.
Benefits of Data Warehousing
There are many benefits of using data warehouses, most notably: the ability to do advanced analytics, enabling business intelligence, and creating a more compelling customer experience.
The concept of business intelligence (BI) isn’t new. In fact, the term was first introduced more than a century ago, and it made its way into the mainstream in 1989 when a Gartner analyst used the word to encompass the confusing names for data storage and analysis, like DSS and executive information system (EIS). This renewed focus on business intelligence led to many technological advances, including data warehouses.
Despite having a long and rich history, many financial services companies still struggle to realize the promises of business intelligence. Often this is because their data is housed in many disparate systems, some of which they own, like their CRM, and others they merely have read access to. How do you combine data from HR, marketing, and sales systems to create a complete picture of business health? You guessed it — data warehouses.
Predictive analytics is a branch of advanced analytics that uses data, statistical algorithms, and machine learning to identify the likelihood of future events based on current and historical data. Simply put, predictive analytics can help you go beyond merely understanding what happened in the past to predict what will happen next.
By uniting your data in a single place, you’re setting yourself up to be able to implement advanced analytics techniques like predictive analytics that will help you gain a competitive edge.
Creating Compelling Customer Experiences
Today’s customer expects to have intuitive and seamless interactions with your firm across channels. But your data about each customer lives in many different systems owned by various departments — finance, marketing, sales, customer success. By unifying all of this data in one place, you can create applications that are tailored to each customer. Layer in advanced analytics, and you can even predict customer churn rates based on changes in behavior — and work to retain clients that are thinking about leaving.
What is a Data Warehouse?
At its simplest, a data warehouse is a central repository of data, which very often originate from disparate sources. Businesses use data warehouses to analyze all of their data in one place and to make more informed decisions. Data can come into the data warehouse from various sources including relational databases, transactional systems, and other sources.
Data warehouses enable businesses to keep enormous amounts of data in one place since operational and transactional databases are typically not designed for historical reporting or complex queries across terabytes or petabytes of data.
How Do Data Warehouses Work?
Just like a relational database, data warehousing systems define a schema usually consisting of multiple tables which themselves contain columns of some data type (text, number, etc.) While data warehousing systems may look similar to relational databases on the surface, they differ drastically.
Relational database systems are optimized for row-based operations. The common query that selects one or more rows of data can be quite fast since rows are stored next to each other on disk. However, performing an aggregate query on a particular column for the entire table results in a scan of the entire table, which is slow. Data warehouses take the opposite approach, where their columns are stored next to each other on disk. The benefit of this design is that you can quickly perform analytical queries on a column since the necessary data can be loaded from disk very quickly, without the overhead of the unneeded parts of the table.
Foreign keys and constraints is another area where data warehouses differ drastically from their relational cousins. Unique constraints for say, a social security number provide data integrity in a relational system, ensuring no two SSNs can be added to the same table. However, there are performance costs associated with this enforcement as the engine must validate this unique check whenever new data is added. Data warehouses relax or completely remove constraints such as these. In a system which is managing petabytes of data across multiple servers, it is simply too costly to enforce or check constraints like this.
When it comes to accessing data, business analysts and data scientists can view and explore their data through business intelligence (BI) tools, SQL clients, or custom-built analytics applications. Today, there is a vast array of BI tools that are custom built for this job and integrate with many data warehousing systems in addition to other data sources.
Cloud vs. On-Premise
If you decide to invest in building a data warehouse, you have to determine if you want a cloud-based solution like AWS Redshift or an on-premise solution like those offered by Oracle, Teradata, and IBM. There’s no one-size-fits-all approach, and there are advantages and disadvantages to each.
The ability to easily scale is a major advantage of using a cloud-based data warehouse. Scaling an on-premise system is time-consuming and resource-intensive because it usually requires purchasing and installing new hardware. In comparison, cloud data warehouses can be scaled up as needed with minimal effort. Services like Redshift are designed to perform well with petabytes of data, meaning it’s very unlikely you’ll outgrow them.
Like many other SaaS solutions, cloud-based data warehousing can offer significant cost benefits by eliminating many upfront costs of on-premise solutions. Without needing to worry about hardware, server rooms, IT staffing or operational costs, cloud offerings can save you a lot of money. Additionally, the ability to start a hosted data warehouse on-demand and then tear it down at the end of the day is incredibly enabling, allowing technical staff to get their feet wet without the big upfront cost of physical hardware and software licenses.
A significant advantage of many cloud offerings is that the job of ensuring high availability is outsourced to your cloud vendor of choice. In the case of Redshift, Amazon takes care of uptime guarantees, meaning you can expect your cluster to be reliable and running. Of course, keeping data warehouse up and running is possible on your own if you’re willing to accept that cost of ownership. Managing your own systems can be quite a bit of work, often requiring dedicated DevOps teams. For this reason, managed services such as Redshift become even more attractive.
The security model for a cloud-based data warehouse isn’t any different than the general cloud approach to security. Just as high availability is outsourced to your cloud vendor, so too are operating system patches, security updates and the like. On the other side of the equation is access to your data warehousing system, which falls entirely on you as the owner. Creating strong security credentials and locking down network access to a data warehouse is your responsibility. On-premise systems require you to own all aspects of security, from the networking layer to OS patches to software upgrades.
The choice of building a cloud vs. on-premise very likely comes down to what you current platform looks like. For those already using cloud architectures, it makes complete sense to stay in the cloud. Hosted data warehousing services provide many advantages including development speed, reliability, securing and scalability.
If you are running your own on-premise servers, the answer is more complicated. It’s possible to follow this same model of managing your own physical server and software. However, it’s also possible to leverage a hosted data warehouse and connect to it from your on-prem systems. Of course, there are trade-offs with this design, but the lower cost of ownership and hardware costs may offset any downsides.
Avoid This Common Pitfall
We often hear prospective customers say, “We need to build a data warehouse.” And our first question is, “What problem are you trying to solve and how much data do you have?” In the rush to implement new technologies, these crucial questions often go unasked and unanswered. Data warehouses can be the answer if you have enough data that cannot realistically be queried from an RDBMS system. It’s important to remember not to put the solution before the problem. If you don’t have a specific pain point your business is looking to fix with a data warehousing solution, how will you know whether the project is a success when you’re finished?
Still not sure if a data warehouse is right for you? Download our free comparison chart that breaks down the differences between data warehouses and traditional databases.