A new year and yet another Substack to subscribe to! Source of Truth is perhaps a dramatic name - but I wanted a place for myself and some friends, colleagues and peers to chart the 'Great Reset' that is coming in the adtech-martech solution space driven by data warehousing, first-party data and AI.
Genesis
For those who have followed me on other channels in recent years, the sudden focus on adtech and martech may come as a surprise - but it's not in fact a new interest for me. I met Yali, my Snowplow co-founder, back in 2007 at an open-source adtech startup called OpenX.
At OpenX we learnt a lot - about how (not!) to monetize open-source, about the digital advertising ecosystem, and of course about these new-fangled Hadoop-based data pipelines. We enjoyed some real triumphs, including building and launching an advertising exchange based on real-time bidding, hot on the heels of the first two players (Google and Yahoo!). Our experience at OpenX around advertising clickstreams would later contribute to us founding Snowplow in 2012, where we developed our pioneering focus on behavioral data creation.
In the decade that followed, Yali and I have been busy steadily growing Snowplow, first as an open-source project, then as a bootstrapped Commercial Open Source Software (COSS) company, and most recently as a VC-backed scale-up in the modern data stack. And all the while expanding from our original use case of generating web analytics data direct into S3 and Redshift, to creating a wide variety of consumer behavioral data and streaming that into Databricks, Snowflake, BigQuery and other destinations.
Where did that leave adtech-martech? Well, until recently it was a mature space, commoditized and monopolized, and frankly rather dull. But that's all changed in the past year. How come?
Why is martech interesting again?
1. Re-architecting around the data warehouse
At Snowplow we started to see the adtech/martech winds of change about 18 months ago when the Reverse ETL pioneers, Hightouch and Census, arrived on the scene.
Our customers and users had long had the freedom to ask their behavioral data any kind of question, and to join that data with other data sets, unlocking a string of high value user insights. The natural next step was to want to act on those insights - but that wasn't straightforward from an engineering point of view, until Hightouch and Census emerged. Suddenly our customers were talking to us about new projects to take their customers’ behavioral insights, and feed those insights back into downstream marketing and advertising tools.
This was very exciting! It was a huge endorsement of our founding philosophy that organizations should create their own behavioral data, build their own proprietary intelligence on top of that data, and then activate that intelligence, for example, by creating their own data products.
As we listened and learnt from our customers, partners and users, we started to interpret Reverse ETL as one strand in a wider movement: one where brands were starting to own and consolidate all of their data inside a cloud data warehouse or lakehouse. These brands were using this data - including Snowplow behavioral data - to develop and exploit their own proprietary intelligence.
Instead of the old world, where tag managers relay consumer behavior out to dozens of martech vendors, each with their own walled-off data silo, in the new world the martech vendors would come to the data. This trend is widely understood and commentated now, with folks like Luke Ambrosetti at Snowflake and Chris Hexton at Vero, and VCs like Martin Casado and Tomasz Tunguz, all doing a great job of evangelizing it.
This technological trend, which started with the release of cost-effective cloud data warehouses, will, over time, turn the entire martech market landscape on its head.
2. The decline and fall of third-party data
The decline and fall of third-party data is the next trend making martech interesting again. Client-side third-party tracking is our industry's trans fatty acid: widely used for decades but suddenly deemed incredibly unhealthy to consumers.
The FDA banned artificial trans fat from food in 2018. Third-party tracking is not quite there but it is definitely on the ropes, driven by the following factors, among others:
Increased regulation around the world to protect the rights of data subjects (e.g. GDPR, CCPA and Virginia's new Consumer Data Protection Act)
Browser vendors are becoming increasingly intolerant of third-party cookies (e.g. Apple's full third-party cookie blocking on iOS, iPadOS and Safari)
Heightened awareness of the importance of protecting vulnerable constituencies online (e.g. from the United Nations Committee on the Rights of the Child)
While third-party data is under fire, in its place we are seeing a steady rise in first-party data use. This is data generated by brands about their own consumers, often now shared with other brands through carefully managed data clean rooms.
This triumph of first-party data is a fundamental validation of Snowplow's founding philosophy of data ownership by brands. The original 2012 open-source version of Snowplow was quickly picked up by multinational companies including Vodafone, News Corporation, Unilever and Bauer so they could generate their own first-party behavioral data.
Our open source model has empowered these brands to capture, understand and govern the data around their own customers' behavior in a world where adtech vendors are trying to disintermediate them.
Snowplow’s commercial offerings, too, have followed the same philosophy: we were one of the first software as a service (SaaS) vendors to deploy and manage our software inside a customer's own cloud account, preserving those key attributes of end-to-end data ownership.
First-party data is becoming more and more mainstream - and this is going to up-end so many of the playbooks, technologies and products of the past decade.
3. The rise of AI and machine learning
Every year at Snowplow, we used to survey our most data-mature customers and ask them, among other things, whether they had any substantial AI or ML projects. Every year they would tell us that their AI or ML experiments were promising (or not) but hadn't made it out of the lab.
This all changed 18 months ago. Those same customers looked at us like we were crazy and told us: "Of course we have live ML projects, and we've found that the features we’ve engineered from behavioral data are highly predictive for our personalization and recommendation models".
This was amazing news to us. We did some more digging and found a couple of things: firstly, that the richness of the data we are creating (think page scroll depths, time-on-screen, video duration watched) gave a level of behavioral insight that other technologies just couldn't reach. And secondly, that our strongly schematized approach to data creation led to a predictability in the downstream data that made machine learning models faster to develop but more importantly much more resilient in production. We expect more of our customers to get onboard the AI and ML train in the immediate future.
As machine learning moves from lab to live, traditional martech vendors will find themselves under increased pressure. Why create clunky rules-based audience profiles when you can use AI to build better ones (for instance with Zingg)? When is your ESP going to let you generate copy dynamically using ChatGPT? I expect to see more defensive purchases of AI technology similar to mParticle's purchase of Vidora, but I am not convinced that the AI genie can be put back in the bottle.
What to expect from this newsletter
Putting all of this together, at Snowplow we find ourselves in the middle of a perfect storm that started with the release of Amazon Redshift and is poised to end with a 'Great Reset' of the adtech and martech stacks. We want to ensure that our customers, users and partners can rely on Snowplow behavioral data to meet their marketing and advertising needs, no matter what the world looks like after the reset.
This new Substack is a forum to explore how these technology and market trends will impact our day-to-day work in marketing and growth functions, as consumers active across digital platforms, and as data-driven technologists.
Coming up first: Jonathan Mendez on Composable CDP
What's on the block first? This Substack will only be a success if we hear a range of opinions, from people who have spent a good part of their careers steeped in this adtech and martech landscape. I want to share diverse voices and make sure we break out of any filter bubbles or technological navel-gazing.
First up, we have a post from Jonathan Mendez, an old friend and linchpin of the media and adtech scene in New York, about the craze that arguably started it all: the Composable CDP. Jon's article will be landing on Source of Truth soon!
Congrats on starting the newsletter Alex, will be keenly reading your thoughts!