Falling in Love: CTOs and the Composable CDP
Jon Mendez on the craze that started it all: the Composable CDP
Introduction
The data and martech scenes have at least one thing in common: we come up with a term and then throw it around as if everyone knows what it means. This happened last year with the term "composable CDP" - it became a shorthand for the accelerating convergence of data lake/warehousing and marketing workloads, without anybody locking down precisely what it meant.
When I started this Substack, I knew we needed to address this topic early on and get a fresh, external perspective on it. I have known Jon Mendez for years - he is a stalwart of the media and adtech scene in New York and has been homebrewing CDPs for major brands for years. Most recently he has founded Neuralift AI, the first company doing deep learning on customer data to improve conversion rates and lift marketing ROI.
There's no one better suited to bring clarity to this emerging trend.
Over to Jon!
Composability is on my mind
Jon writes: The word composable, of course, comes from compose, which means to arrange or form. As data applications continue to expand from the cloud, composable system architecture is being used to solve more and more use cases.
Many bright minds see DeFi (decentralized finance) and blockchain as spaces ahead of the data architecture curve. Here, interoperable and composable multi-chains (or interchains) are seen as the future of data architecture. Because of this, composability multi-chains allow independent chains to connect and transact while retaining their autonomy.
Data "autonomy" is the key word here. Made possible with composability.
I became interested in CDPs (Customer Data Platforms) in 2017. My company, Yieldbot, was a performance advertising network. Using "tags-on-page," we created a session ID and collected behavioral data for each visit to a publisher's website. We used first-party data and machine learning (ML) to match website visitors with ads in real time.
It was clear to me that the value of all our data would increase exponentially if we could join our first-party behavioral data with that of our publishers and advertisers. Performance would improve for everyone. When I first heard that CDPs were being built for this purpose, I was obsessed.
Earning my CDP spurs
In 2018, I was hired to help develop in-house CDPs (as we used to call them before “composable”) for two large multi-brand, omnichannel retailers. They had the same problem that all e-commerce sites have with a growing number of SaaS martech tools: There were too many pixels and "tags-on-page," there was no data management, and the JavaScript was slowing down their sites and making them vulnerable. To make matters worse, each tool had its own database, schemas, and user IDs. For the first wave of CDPs, the main use case of unified identity or "360 view of the customer" was an obvious homerun.
Even the C-suite understood that a "single source of truth" about the customer was the foundation for advanced machine learning, prediction, triggered events, and data modeling. It was also clear that the CDP could not only improve marketing, but that data about customer behavior was economic data. It could support forecasting for finance, supply chain, merchandising and more. I installed Snowplow at both brands to create the data sets that could become this source of truth.
At the time, very few companies were capable of managing and executing the development of their own CDP, even with external help. It was only logical that an early wave of CDP vendors emerged. ActionIQ, Amperity, Lytics, mParticle, Segment, Simon Data and many others established or reinvented themselves as pure-play CDPs. Incumbent marketing and data companies such as Adobe, Tealium, Merkle, Salesforce, Oracle and others retrofit their offerings and changed their market positioning to reflect a CDP product. The CDP boom had begun.
The space between CMO and CTO is the middle of nowhere
Back in the executive suite, the buy/build decision about CDPs and the responsibility for implementation lay somewhere between the CMO and CTO. Unfortunately, in most organizations this space is in the middle of nowhere. I wrote about this very problem at the time. Since the key early use cases were marketing-centric, the CMO had a large voice in the CDP vendor selection process. But selling to the CMO also undermined the value of a CDP (and first-party behavioral data) to other constituents like the CFO and CIO.
CTOs ultimately needed to sponsor or co-sponsor the CDP since they would set it up. However, it was acceptable for them to remain focused on the larger task of migrating company databases to cloud infrastructure solutions. Snowflake (valued at 'only' $4 billion at the end of 2018), Redshift, BigQuery and Databricks (valued at 'only' $2.5 billion at the end of 2018) all began gaining traction to house and manage enterprise data. These installations became known as data warehouses or data lakes and evolved into an indispensable SaaS cloud infrastructure that continues growing to this day.
Like other stores of company data, the CTO and CIO wanted the CDP to be synchronized with the Master Data Management (MDM) warehouse or lake. The objective of these data 'gravity wells' was to create a master database that was the System of Record (SOR). The lake as an endpoint is not technically insurmountable for CDP. However, a valid question emerged. If data lakes are being set up to ingest data downstream from all sorts of services, why is customer behavioral data not being piped and streamed directly into the data lake? They had a point.
Once in the data warehouse or data lake, customer behavioral data can be sent, keyed and used for all types of application development, including and especially ML and predictive/intelligent systems. All the necessary tools for collection, organization, integration, governance and activation are available. This architecture provides flexibility, extensibility and, most importantly, allows for an iterative approach to customer data platforming with true data ownership. It removes vendor lock-in, prevents a dual system of record, reduces costs and streamlines the data stack. With all these features, the CTO was buying into the idea of CDP as well.
Today, as the "martech stack" is recognized for what it is - an increasingly important part of the infrastructure for future business - data ownership is becoming more valuable and integration easier. Not surprisingly, this approach to customer data is growing in popularity. It has taken - for now - the name Composable CDP.
Enter the composable CDP
In the age of data privacy and security, composability is synonymous with data sovereignty and autonomy. It allows you and your teams to own and manage the data the way you want. It starts with data creation and extends throughout your data stack to modeling, activation and analytics. It will not only make your engineers happy, it will make your marketers happy as well. Most importantly, it will make your customers happy with more relevant experiences.
In our post-GDPR, privacy-focused future, ownership of data pipelines and APIs is becoming a core competency. Behaviors that used to be captured client side must now be captured on the server side. Enrichments that used to be done with cookie syncs now need to be done in clean rooms. A transparent tracking and sandboxing future awaits us all. New use cases around customer behavioral data are emerging all around us, and no two businesses have the same customers, the same data, or the same results.
Every company is its own orchestra of data with its unique customer attributes and behaviors. As machine learning and decision intelligence permeate business operations and determine the winners and losers, and as businesses rely on customers to survive and thrive, first-party behavioral data is every company's single most important strategic asset. It is the foundation for building customer behavioral profiles that are far richer and more predictive than the single-row 'customer golden records' created with previous generation MDM technologies.
Snowplow has supported customer behavioral profiles for years through a wide range of tracking SDKs and a behavioral data engine. All of this is battle-tested, open-sourced and fully mature technology. What was missing, however, were the downstream integrations from the data 'gravity wells' such as Snowflake and Databricks to the martech and adtech technologies and tools that marketers and growth executives depend on.
Bright minds in the industry have recognized this opportunity, and a new wave of Reverse ETL startups like Hightouch and Census have emerged to fill the void. As they also pipe into and out of cloud databases and work well with customer behavioral profiles created by Snowplow, these other tools make up the rest of what is being referred to as the composable CDP.
Alongside this trend, we are also seeing the benefits of composability in a growing wave of "connected apps" built directly on top of Snowflake, BigQuery, Databricks and Redshift. These apps reside in your cloud and can do everything from simply building predictive models (Continual) to creating and engaging segments of customers (MessageGears) to product analytics (NetSpring, Kubit).
One easy way to think about composable CDP is this chart:
As your use cases grow and evolve, the technology needed can be added incrementally. Considering that change management is a key component of vendor CDP implementation at the F1000, this iterative and incremental approach makes sense for many types of organizations. Especially for companies that want speed-to-value. Another key benefit of an iterative approach is that you can gain acceptance throughout the organization of the value of composable CDP (or a CDP in general) through Proof-of-Concept (PoC).
This PoC work is generally the ability to solve a single use case in a short time frame, that could be anywhere from a few weeks to a few months. This helps build credibility for the time, resources, and budget needed to further enhance the value of a composable CDP.
Speed is important for businesses. With most CDP vendors, the implementation process takes six months (although I have heard it takes as little as 3 months for some and 9-12 months for others) to be fully in place. There is usually an implementation fee for this work, and once implemented, there is little technical expertise internally. We are probably all familiar with contacting a vendor with a technical issue only to receive a link to their documentation in return.
Engineering teams are reluctant to add vendor systems and databases to their stack. They want control over the data and infrastructure, and for good reason. If something goes wrong, especially if it's mission-critical, they do not want to wait for vendor support to fix the issue. IT and Engineering buy-in are critical to the success of any CDP. The internal political benefits of a composable CDP for CMOs rather than vendor lock-in can be significant.
One of the greatest advances in behavioral data analytics has been the flexibility to compose custom event schemas for data creation and adjust these schemas over time. It cannot be overstated how important and valuable this is.
Just like customer behavior, custom schemas are never static. For example, if additional context is needed for a triggered event - such as attaching a new keyword search for a custom audience or an increased cart value for a triggered email - those iterations of the schema can move at the speed of your customers. All without losing your old data.
Marketing and consumer behavior are cyclical and iterative. Source data needs to be generative. as well. This is table-stakes in a modern enterprise focused on customer data. Importantly, custom schemas allow you to improve them over time as you get smarter, based on results, and as your business needs and use cases change. In other words, custom events and schemas are extensible.
My experience building CDPs on Snowplow
A huge benefit to composability on the data collection side is identity stitching. I mentioned earlier that this was originally the main use case for CDP vendors. But what about offline data? When I was building CDPs, I chose Snowplow in large part due to the omni-channel nature of their identity resolution. With an incredibly rich number of trackers available and a growing number of webhooks, there is full 360-degree identity resolution capability across online and offline consumer touchpoints. Simply put, this makes the customer behavioral profiles much smarter.
With this composability, if an omni-channel retailer wants to see how a mobile ad view led to a website visit, which in turn leads to an app download, which in turn leads to a physical purchase, they can capture that in a single behavioral profile. The same is true for a multi-brand holding company with multiple brands, which could stitch first-party user IDs across its different domains to understand cross-brand customer behaviors. Instead of using out of the (black box) attribution, retailers can roll their own attribution models that understand the unique nature of influencing their customer journeys.
We also know there are second and third order effects of customer behavior. But we often do not know what those are, or only have hypotheses. As your custom events aggregate, they should start to reflect the behavior of your customers and the downstream use cases you want to solve. This cannot be done without an incremental approach. Knowing how to shape your data from day one is important, but it's also impossible. Knowing how to shape your data over the course of quarters will separate the winners from the losers.
Cloud infrastructure and composable architecture in general have become best practice for CTOs and CIOs. They offer data flexibility, extensibility, and sovereignty over modular pipelines and services. It is easier to manage and easier to service. An entire ecosystem of cross-functional services and applications is emerging around the cloud. So it makes perfect sense that a composable CDP starts with flexibility, extensibility and ownership of data creation.
Composability is not a marketing ploy by new, emerging VC-driven SaaS vendors. Composable data architecture is a fundamental shift in data interoperability, scalability and world-class data infrastructure. The question is not whether you will have a composable system for your customer data. The question is when you will adopt this - and how far along will your competitors be in using their data when you do.
About the author
Jonathan Mendez has been a Founder/CEO in eCommerce, martech and adtech. His early work at Offermatica led to the company being acquired by Omniture and becoming Test & Target and then Adobe Target. He then founded Yieldbot and used first-party behavioral event data to become the second fastest growing technology company in North America from 2012-2016 (Deloitte Fast500). More recently, Jonathan was building composable CDPs for omni-channel retailers and advised leading telecom and retailers how to productize their first-party data. He has been writing about martech and adtech for over 15 years on his blog Optimize and Prophesize.