Part 1: Stop building on sand, get ahead
Vulnerability of data and use cases
Data has become the foundation for most businesses, introducing entirely new risks into established business models.
Data is liquid and inherently unstable
Data can be an extremely powerful tool but data itself is extremely vulnerable. It can easily be altered, broken, or deleted, intentionally or unintentionally. Because data is not very tangible, water analogies are often used: Pipelines, streams, lakes, swamps. estuaries, rivers. Sticking with this metaphor helps visualize both, the “lack of solidness” but also its erosive nature, if it is not contained or actively managed and just forces its way.
Data erosion
Without a proper foundation in place, everything built on top will over time start to crack and erode. Unfortunately, this is the reality for most companies today, and it forces them to spend significantly more money and other resources on data than they would normally have to.
Data creation
Medium-sized to large companies all face very similar issues. Their IT landscape is usually comprised of multiple systems, meaning that a lot of different system emit customer journey event data. Additionally, individual IT systems often are often the responsibility of different teams and departments, introducing coordination and leadership challenges at the data creation stage.
Data consumption
Additional challenges arise when data is consumed. More often than not, consuming teams take data directly from different source and assemble it on their own, often differently than other teams using the same data. When working with raw data and integrating different sources, the raw data almost always requires some extent of modeling and interpretation. This process almost always leads to data that is geared and skewed towards the respective’s team primary use case. This model of data management looks faster because everyone is busy. However, this seeming productivity is primarily dealing with chaos and therefore mostly not productive.
Use cases
For businesses, it’s increasingly important to execute and adapt quickly to ever-changing environments and use cases. Most data initiatives face significant delays because data has to be scraped together before it can be put into action. With an increasing number of data use cases, companies have to do this exercise more and more often, draining scarce resources and requiring quality-decreasing short-cuts. It’s a flywheel of increasing data mediocrity while the opposite is required to prevail and strive in today’s and future markets.
Not achieving business goals
To easier understand the implications of data, it may help to compare it to buildings. At the bottom, you always have the underlying source data, upon which you implement use cases.
Disappointing business results
Many data initiatives don’t provide the expected results, but not because the use case wasn’t valid, not because the team didn’t know what they were doing, and not because of bad luck. Similar to a physical building would face issues and ultimately could collapse, the consequences for data initiatives can be very similar with an unstable data foundation is unstable. This is especially true with AI use cases which require better and more fresh data than ever before.
Snowballing downstream costs
You have probably heard of “garbage in, garbage out” (or GIGO), which further contributes to the problem: The further downstream you move, the less likely it gets that you can actually fix your data. Most companies are adding more and more downstream engineers instead of fixing the problem at the source. Companies also need more of other roles, e.g. data scientists. If data scientists spend 50% or more of their time working with low-quality data, twice as many need to be hired to achieve the same results. The same is true for digital marketing and product teams. For medium-sized to large companies, this can easily cause damages in the millions of dollars per year, not to mention the general frustration low-quality data causes for everyone who’s job doesn’t solely exist because of it. Collecting high-quality data costs more than low-quality data, but dealing with low-quality data incurs even more costs downstream.
The age of AI
Additionally, with the ongoing rise of AI, the quality of the data foundation is more important than ever because unlike their human counterparts, AI lacks a lot of the context outside the data that humans have from talking to their co-workers. AI requires fresher and at the same time more robust than ever before.
What is customer journey data?
Our main focus has always been customer/user journey data, also known as analytics data, event data, logging data, or behavioral data. Unlike transactional data, it captures human interactions with all kinds of digital and, to a certain extent, non-digital touch points during their journey, even before becoming a customer.
Sources of customer journey data
- Website traffic
- Mobile app usage
- Out-of-home
- Email views and clicks
- Chatbot interactions
- etc.
Common issues
The main challenge with customer journey data is that it is often scattered across many different applications, with differences in measurement, partial overlaps, gaps, redundancies, and usually no single source of truth.
A decade of helping enterprise clients
From working with medium-sized to large businesses on both sides of the Atlantic for more than a decade, we have learned that most businesses face very similar, almost universal data challenges when it comes to the implementation of customer-centric use cases.
TODO: Enterprise customers have lots of different systems, departments, IT separate.
Main ideas behind the Data Cape
Even though the power of data is its liquidity and most data components are rather liquid themselves, a solid and stable component is needed. But unlike data warehouses, the component must be situated at the source to create a solid foundation and prevent issues.
There are a lot of tools and ideas out there that address individual root causes for data issues. However, to achieve a solid foundation of dependable data consistently over an extended period of time, all root causes for low-quality data need to be addressed in three ways:
- centrally: should be done only once
- at the source: should be done for all downstream consumers
- and at once: should include all measures, not just one or some
The term Data Cape was chosen because of a cape’s natural ability to withstand erosive forces, and it’s hybrid location between the land and the water, or speaking in data terms, data warehouses, data lakes, and data lake houses.
Cape.ly and Data as a Service
Cape.ly offers Data as a Service, which combines tracking technology with continuous implementation services. We believe that most teams should not spend time to implement a Data Cape themselves. Instead, they should use a service like Cape.ly, and focus on the business value, read fun part, of the process. However, we are sharing our learnings and concepts behind our product below. We intend to constantly expand the document, so make sure to come back from time to time.
Part 2: Your competitors face the same issues
Coming soon.
Non-reusable data
Inaccurate / incomplete data
Inconsistent data
Breaking data
Non-compliant data
Budget / resource limitations
Slow execution
Part 3: Reliable customer journey data is hard
Coming soon.
Views, clicks, scrolls, swipes, etc.
Soft and hard conversions
Campaign and attribution data
Email, text, QR code interactions
Cross-device tracking
UI, navigation, form interactions
Cancellations, refunds, renewals
Part 4: One reliable data source for everything
Coming soon.