“80% of our sales come from 20% of our customers.” Sound familiar? Companies usually use that phrase to explain that the most loyal customers buy substantially more than the occasional shoppers.
The 80/20 rule (also known as the Pareto Principle) highlights a reality that, within a marketplace, not all members are able or likely to contribute the same value. So, our attention should be directed toward the most valuable entities, and we should largely ignore the less valuable members.
In a frustrated moment pondering the mass acceptance of third-party datasets, I started to wonder whether the same principle applied to a data marketplace.
Is the future of quality data at risk, given the current buying trends and the lack of acknowledgement of the 80/20 rule? Are most advertising dollars driven by 20% of the data doing 80% of the work?
If so, why does there need to be 60,000+ segments on some DMPs (data management platforms) when a small fraction of them (informed by actual consumer behavior commonly referred to as “first-party data”) can answer the questions being asked? Questions include “Does this audience regularly buy X dollars of Y product?” and “Did my targeted shoppers actually buy the product after seeing an ad?”
Data Management Platforms
In today’s platforms-centered market, we get so excited about the idea of data being available that we forget to ask how and when the platform benefits.
A DMP does benefit when you come back (based on a successful exchange of value), but are the DMPs intentionally allowing thousands (if not tens of thousands) of heavily modeled datasets to swarm around test-and-learn programmatic buying environments without any machine asking, “Should I even test this?”
The benefits of better access of data to enrich a decision offer incredible opportunities. But the existing platforms must remain focused on the quality of data they can offer, even if it compromises the variety of marketing descriptions that can be stated.
If you are a data supplier, you aren’t particularly excited about how your data is being used or how much it’s being valued at. You may be making money with an asset that used to be just a cost. But if DMPs continue to allow access to your data without regard for how it gets built into derivative products or used in combination with other data sets, the shelf life of this asset is a few years, at best.
Not all DMPs play fast and loose with your data… at least not on purpose. However, most weren’t built to enable the best direct data with permitted use. Rather, they were built to protect your data by surrounding it with other well-seeded but questionably stretched data assets.
We’ve reached an inflection point, and data owners must demand that their data isn’t surrounded by some modeled and stretched dataset with an unknown collection method that isn’t fresh.
A New Approach
I want to propose a different way that data might be managed and ultimately priced in the future.
Data owners will only get paid for good answers to tough advertiser questions, and not for the 80% of “almost answers” they can offer or model towards. For example, Adobe is starting to move in this direction with its device co-op to get around the walled gardens. And data providers and advertisers will get on board quickly because any moves that dilute their reliance on Facebook and Google are readily welcomed.
Let me try and illustrate my point with a hypothetical example.
A data provider has sales data on 30% of US consumer electronics buyers; other data providers have 20% and 10% respectively. After the top 50%, the market is owned by much smaller retailers that don’t use loyalty programs or other ways to tie consumers to buying habits. We are looking at a total data scenario where three organizations can sell data to inform advertisers that want to to target behavior related to consumer electronics sales.
Assuming all three of these companies want to sell their data, a typical scenario would be that each of those data owners would approach someone to work magic to make audience segments.
Each data owner fills out its unique claims in a media buyer’s playbook and publishes it online because it covers different geographies or proprietary retail products. All are valuable but don’t individually offer a complete view on the national population of consumer electronics shoppers.
The problem is that it’s not clear what each segment is missing. They are labeled such that “Consumer Electronics Shopper” looks a lot like the others labeled “Consumer Electronics Buyers.” You get the drift. Qualities like freshness, representivity, etc. aren’t even questioned. But it doesn’t really matter because the brand in the playbook is big enough and they are all priced at $ 1.00 per thousand impressions.
Sounds great for everyone, right?
At some point, however, the campaign will get measured, and even a survey-driven methodology will start to set the datasets apart. Sales can be overstated or understated if the retailer skews towards a particular demographic or a lack of one. The pricing suffers, confusion follows, and eventually, despite the roots of the segments being rich with first-party DNA, yet another dataset surfaces and confusion for buyers amplifies on what to do.
Forgetting the bright shiny object effect that a new data set would normally offer to a human buyer, the machines simply detect it’s a new, well-branded (read: the right keywords in a search) segment and it needs to be tested. It doesn’t, of course, but we can test everything in such small spends that it doesn’t hurt—or doesn’t appear to hurt.
The revenue payout looks something like this:
If you are leading data set (or really any first-party dataset), you are confused about how a market could race to zero. It’s simple… We don’t have the level of transparency in place to catch it.
But what if the way we bought and sold data was a little simpler and in more of a cooperative fashion? Perhaps we do so in a marketplace where data is valued based on its actual worth.
Let me illustrate.
In this model, each data owner gets paid for the answer he or she can provide to a specific question or decision, not the cleverness of the way the model can stretch.
The result is a stabilization effect that will allow the actual data owners to be appropriately compensated. The entrance of new proxy datasets will be greatly reduced, and the total revenue will grow and be distributed as an incentive for each data source to contribute a response when requested.
So, as opposed to an impression-driven model, in the new era of programmatic, we monetize at the individual response level.
This type of co-opetition among the top first-party datasets (without commingling the literal data) allows the dollars to get paid back to the right source. It also limits spending on subpar datasets that are new to the world of inexperienced buyers or automatically created algorithms (which view a new dataset as a potential improvement). Ultimately, your dollars then only go to the 20% of datasets that can provide 80% of the answers.
- Better data gets rewarded and priced accurately.
- Data creators get paid for the actual value they contribute. And they’ll want to contribute more.
- It becomes easier to see and buy the right type of data because, instead of hundreds (or thousands) of options, you see only the ones that perform and deliver.
- Master segments of verified shoppers emerge instead of countless proxy segments.
- DMPs start optimizing for the outcomes sought by the advertiser (e.g., sales).
- Data owners can price based on the specific use case of the data and other factors, such as freshness, rather than the amount of households the data could be stretched across.