Consumer Data Utilization: A Study in Contrasts

Preface: May 4, 2021

The following was originally written in 2016 and revised in 2018. While somewhat obviated by the death of the cookie and other developments, the comments may still provide useful context.

Introduction & Disclaimer

For over 30 years, I have been working with a variety of large organizations to help them use data to wrap themselves around consumers. What follows represents lessons learned, especially as regards emerging trends. I ask that you indulge me insofar as I will be making hypothesis and assertions without providing much supporting data. This is mainly because I would need to satisfy a high standard to share intellectual property that belongs to others or would be divulging that which I am contractually obligated not to. Being a fallibilist with respect to knowledge, I encourage you to prove wrong any hypotheses or assertions that you hear.

I am only representing my own viewpoint and not that of my current or any former employer.

Please see my LinkedIn profile, www.linkedin.com/in/drewtalbot, for a truncated but accurate description of my qualifications and mindset.

Background

A decade of super low rates and the Fed being the buyer of last resort have resulted in many large corporations piling up debt to finance stock buybacks, thus driving up the share price (and insuring executive bonuses) while avoiding investing in drivers of future growth. As a result, balance sheets are vulnerable to the next downturn which, if history is any guide, is all but inevitable. The likely result will be draconian cost controls across the board and a reduction or freeze on discretionary items like hiring, bonuses for the rank and file, new product development and advertising.

The demand by executives for differentiating value propositions has never been higher, but investment is low and likely to remain that way. Additionally, commoditization continues to affect more categories as price discovery and hence promotion parity are instantly available with an inexpensive handheld device. This plays in favor of embedded analytics and more rapid and executable insights. However, constraints on the use of consumer data are significant and likely to grow rather than abate.

The availability of technologies that easily store, move and manipulate unstructured data, pioneered by Google and Yahoo and now commercialized under names like Hadoop and Cassandra, have given rise to many new opportunities to both manipulate and potentially abuse consumer data.

High profile data breaches, revelations about pervasive technical back doors in most networked devices and public resentment for indiscriminate use of government surveillance combined with a lax enforcement environment have led to loss of trust in data security. Calls for increased controls and standards are growing, as is the cost of litigation in jurisdictions where the consumer’s right to privacy and control over data about them are aggressively pursued by authorities.

While transnational corporations look for efficiencies through globalization, regional and country-specific regulators may force them to refrain from collecting data of certain types, stop transferring data across borders and sunset data already in their possession.

We are living in interesting times geopolitically, financially and technically. It is very difficult to predict what will happen in the realm of consumer data, but it is easy to see that there are various countervailing trends that will produce moments of conflict and, potentially, of synthesis. The following will be a necessarily high level review of some of these.

Cross-currents

Underpinning and cutting across the discussion of several thesis, antithesis and synthesis counter-trends are issues that present both opportunity and risk. The following are some of the salient examples.

1st & 3rd Party Data Integration

Marketers at major brands want to be able to connect all types of useful information that might help them to connect what consumers want, do and are in the market for to what they want to sell. This means linking attitudinal, behavioral (transaction), and identity/demographic/ location data. Historically this has been difficult to do and the advent of the Internet has not made it easy. Recently, sophisticated ways to connect all these types of data have become commercially available and are also known to be used by government agencies (despite claims to the contrary). The desirability of, as well as the risks associated with, connecting the dots continue to rise astronomically.

First party data is what a company has about customers and prospects. Presumably, this is information that a consumer has voluntarily granted access to. This does not mean there are no rules for permissible use. There are strict rules in most jurisdictions for PII, or information that identifies an individual and can easily be used by identity thieves or others to defraud or misrepresent that person. PII does not only refer to individual items of record but also to how they might be connected. For example, a Social Security Number is PII with or without the addition of a name and address, but when these are linked more rules apply regarding permissible use. In some cases, it is OK for a company to provide service messages to a customer (e.g. claims processing details for an insurance company) but not to mix what it knows for marketing purposes. For example, information about an individual’s disease state or credit worthiness may not be shared beyond those who have a legitimate need to know.

Third party data is information about individuals or groups and is typically collected by a handful of data aggregators and then sold to others for marketing and other uses. Examples include demographics like age or presence of children in a household. Historically, the data were drawn from warranty cards, auto registrations, subscriptions and public records including Census Bureau data and could be about individuals, households, census block groups, zip codes or other levels of aggregation. Buyers of these services may either have the vendor append data on the fly for a campaign and act as a list broker or may do mass appends to their own customer files. The vendors may also act as “safe harbor” for matching of data elements that the buyer may not be allowed to do themselves due to law or contractual covenants. There are restrictions on permissible use of PII that apply when the aggregator or compiler is collecting the data, storing the data or making it available for specific purposes. Additionally, regulations increasingly restrict the combination of aggregate level data to individual data in building models or inference engines. Data suppliers include data shops such as Acxiom as well as credit bureaus such as Experian and Trans Union. DMP’s such as BlueKai that store online behavior data including cookies are also included in this category.

Given the obvious desirability of integrating online and offline data about consumers, the purveyors of same are now engaged in an armed truce with extensive coopetition. Nobody who has a big data store wants to share. Nobody who knows how to use it properly is in a prime position to use it all. This is important to realize because each in turn makes big claims about what they alone can do.

Key Performance Indicators (KPIs)

Public companies all share certain metrics of success at an enterprise level that are interesting to investors. An example is EBITDA, which along with Earnings per Share (EPS) and stock price appreciation are probably the most important KPIs on which CEO’s and other top executives are graded. While it would be useful to have an enterprise metric that shows change in the value of the franchise at the customer level, few large publicly traded companies do this. Instead, they make inferences that attempt to translate various operational and campaign metrics (e.g. Click to Buy ratio) into broader and often subjective measures of marketing success (e.g. ROMI). Over the past decade, conventional metrics such as surveyed intent to purchase and brand strength have given way to attributed sales online. This is largely because the latter is perceived to be a harder metric, and not because it necessarily captures all of the results of marketing. There is continued pressure on marketing leaders to attribute every atomic event or tactic to a sale and this has led to a large shift in emphasis as well as the skill sets of marketers. There have been unintended consequences of this trend. Organizations have seen decays in conversion that increase more rapidly in each new channel. This is often blamed on the fragmentation of the media market and short attention spans. Another contributing factor is the emphasis on short term results at the expense of brand health. It is becoming harder to close sales because the consumer has not had the time or inclination to form a relationship with the brand. The usual solution to this is to increase the emphasis on acquiring new customers, which may throw gasoline on the fire by lowering the margin on the consumer portfolio, increasing the time to value for each new account and increasing the risk of defection by the customers who produce profit. While improved sophistication in the analytical toolbox can mitigate these problems, some companies have already done great harm to their brands that may take years of investment to correct.

Self Service Profiles

Marketing organizations have, to one degree or another, allowed consumers to set preferences in a variety of ways for years. Whether it’s telling a hotel that you want to get a certain type of pillow or going on a website and saying they only want email for service notices and to use direct mail or nothing for marketing messages, it becoming increasingly possible for users to tailor their experiences to their own liking. While altering a browsing experience or serving relevant content based on a user defined profile is relatively simple, more complex is allowing this information to alter the totality of the consumer’s experience across touches and time. With DMPs and other tools, organizations now have the capability to allow user defined preferences to drive experiences across multiple channels in near simultaneity. Unfortunately, in some jurisdictions this activity may be prohibited, even where consumers indicate this is what they want. Ironically, in these same jurisdictions, consumers increasingly must be given a place to go to let a company know whether they wish to be forgotten (i.e., not tracked online), have their profiles expunged or have a report of all data kept on them. This puts companies in the interesting place of having to determine if and how they might create “no profile, profiles.”

Governance

Organizations must provide for accountability of the accuracy and disposition of all consumer information they capture, store, manipulate and otherwise act upon. This is necessary for audit purposes, as a learning mechanism and, when necessary, to provide for legal defense. The process by which this is done is known as data governance. While there are other data governance functions, what concerns us now most are those that have to do with rules about who “owns the data,” as well as rules for how data are treated.

As applied to consumer data, governance means that the company knows and has assigned accountability for why each piece of data is collected, how it is used and how it will be disposed of when no longer necessary. Rules for how data elements can be combined also fall within the purview of governance. Procedures for data review and audit, as well as data security including assigning and monitoring access rights are also part of the work of data governance.

Clearly, a lack of appropriate data governance produces significant risks to the enterprise. An historic example of the risk of malfeasance would be fallout over pre-approving credit for deceased persons. In EU If the company can’t give an up to date and complete account of the data collected and the uses for same to a requesting individual then cost of litigation can be extreme.

On the other hand, appropriate governance can help an organization understand and leverage the value of the data it legitimately has.

Internet of Things & Other Disasters in the Making

Amidst much hype, new technology is being created and marketed that is supposed to be revolutionary. So far the reality appears to be something very different from the PR. The Apple Watch is shaping up to be a non-event and costly failure, much like the Newton of old. Each networked near field communication device represents a very easily abused opportunity to steal data or even cause dangerous outcomes in real-time, as government security bulletins have indicated but major media have mainly ignored. Consider the comments by Samsung and Amazon that indeed, smart TVs and speakers can be used by listening devices by anyone within range who has a modicum of sophistication, and that they are not responsible for how government agencies or others might use such data. While it does not pay to handicap eventual adoption, it is prudent to evaluate each opportunity carefully and consider the risks and rewards.

Hacks & Breaches

Target, SONY and Anthem are a few of the well-known large companies whose consumer data have been compromised recently. Between these three, it is virtually certain that you or someone you know have had their personal data obtained by unauthorized persons through leaks in corporate security. It is not known how much this will ultimately cost individuals and the economy because the perpetrators do not abuse the data all at once and the real implications may take years to be discovered. Target recently said the cost of the breach to them was in the tens of millions of dollars. The cost would have been much higher had regulators held them fully accountable for fraudulent use of compromised credit cards. Anthem went into crisis mode, ignoring media requests for information while hiring a specialist to manage providing identity theft protection to millions of affected consumers. In the case of SONY, the CEO of SONY Pictures was forced out and the payouts to victims could be very expensive. In the other cases, regulatory authorities have so far held the companies harmless in episodes where the most personal data, financial and purchase histories and medical data, have been compromised. At the same time, the desirability of storing information in the cloud and linking information is so great that most senior executives are paying scant attention to the very real risks they face. Enforcement in the USA has largely been of the slap on the wrist type and this is so far no more than can be borne by large financial and healthcare concerns as a cost of business.

It is not safe to assume that you or your employer will not be forced to pay penalties in future for these kinds of breaches that could literally shut them down. The penalty may come in the form of being phased out of participation in regions by governments who suspect companies of creating back doors for exploitation by other agents, as is happening to IBM and others in China. Moreover, it is not safe to assume that safeguards like single sign on and encryption will stop these types of breaches from occurring with greater frequency and severity. Breaches occur upstream from these measures and/or because of the ease of access to information “everywhere” in unstructured formats. The very thing that makes it “Big Data” is what makes it insecure.

I am not aware of a current solution for this problem other than segregating data by purpose into environments that are structurally more or less secure. Perversely, this may breathe new life into antiquated mainframe systems and hardwire networks that are still in use through failure to invest in infrastructure spread over many years.

Monetization v. Stewardship

Large enterprises tend to view information about their customers and prospects as company property and an asset that can be made profitable on its own. It comes as a shock to them to learn that regulatory and statutory laws increasingly countermand this notion.

The questions “who owns the customer,” and “who owns the data,” are commonplace in information technology and marketing organizations. Whether sales or some other business unit claims ownership of the customer, there may not actually be one person responsible for the accuracy and disposition of data about consumers with whom the company interacts. There are two problems here: the customer’s data is a fungible “asset” that needs to be maintained. This is often difficult because different profile information exists in different parts of the organization and often do not link in real-time. Even where this is desirable it can be impracticable. In the second place, the “owner” of the customer may be eager to use customer information for reasons that may be in breach of the company’s fiduciary responsibility, or may be ignorant of how this can come to pass by intention or accident.

The opportunity to convert data collected about consumers, with or without their consent, into profit is very real and is the primary business model or at least a strong earnings contributor for many organizations, especially those who don’t create a product with tangible value, don’t have infrastructure of their own or can’t earn much from advertising (most social media). These companies essentially exist to deliver the consumer, as product, to someone else for a fee. Consumers usually don’t know how this affects their online experience except in a vague sense.

For other organizations, selling consumer data is an ancillary service to their main business, e.g. a travel company makes data available to a car rental partner who serves offers up based on the profile they have received. In this case, the partners help each other and a referral fee often changes hands. The consumer usually does not know what is going on behind the scenes but may or may not appreciate the offer.

Laws and regulations around the world, including in EU, Australia, The Russian Federation and increasingly the USA, are challenging these business models in important ways. International standards are emerging that coalesce around the concept of the consumer as the owner of data about him/herself. A company is at best a steward of the data and at worst a violator of law for capturing, storing or using the data in any way not explicitly authorized and consented to by the consumer. Treating consumer as product is a pervasive reality today – but is already costing large organizations vast sums in compliance and litigation expense.

A sensible approach to dealing with this is to err on the side of asking permission and granting consumer control. Preferences should be progressive, opt-ins should be nuanced, and profiles should be guarded and audited to make sure they are not misused. Aggressive detection and correction of any errors also goes a long way. Additionally, companies should actively monitor the conversation about their brands in social media, even if they choose not to take part. In this way they can get ahead of problems. Companies need to have compliance and governance experts at the table early and often when exploring marketing and consumer communications strategies and tactics. Finally, companies need to have systems and processes for governance and capture of data that comply with the most restrictive covenants to which they are subject, anyplace they do business.

To be sure, all attempts to monetize consumer data will come under continual and intense scrutiny. Those that do so without explicit consumer consent and careful regulatory compliance run the risk of being held personally accountable for criminal and civil offenses.

Completeness v. Integrity

What analyst doesn’t want all the data? Every company with something to sell would dearly love to be able to have a golden record that identifies a consumer accurately and that links to comprehensive data about their demographics, habits and transactions with them and with others that might be predictive of purchase behavior. Legal restrictions aside, it is not always feasible for them to do this. Systems of record and systems of reference do not always sync and when they do, the data may not be stored in ways that are usable for purposes other than the intent of the system of capture. Data that are not critical for a specific function are also expected to roll off. For example, after a hotel stay it is normal for the stay information and reservation number to be overwritten, as they apply to reserving a space, not identifying a person. The system of record for a person may not be the same as that for a stay. Accordingly the most complete data may not be the most accurate and vice versa.

Companies seek to overcome limitations in stored data through the use of Master Data Management (MDM) processes that cleanse and normalize data across systems. Unfortunately, these systems have been developed primarily for analytic purposes and may not always work at the zero or very low latency speeds required for 24/7 real-time service.

Accordingly, companies face trade-offs between having the most data and the best data for a given set of purposes. These decisions are complicated by regulatory requirements but often the operational and systems issues are significant obstacles of long standing. In order that data may be accurate, which is a requirement, completeness may need to be limited. This will require more deliberate approaches to what is captured, stored and used. Analysts want to get “all the data.” This is unrealistic and the job of data science will be to work with governance and others to determine what is the appropriate latency, type, quantity, accuracy and completeness required for a given purpose.

Security v. Extensibility

The best way to secure data is to not store it in the first place. Because of increased restrictions, firms operating in the EU have stopped collecting even basic address information on customers to avoid later challenges. At the same time, current thinking says that to make systems future proof requires open architectures that can easily deal with very large amounts (petabytes) of unstructured data. Unfortunately, most of the technology that can easily handle the vast amounts of data generated online was created to support Google’s business model and would not pass muster in a mission critical environment that must securely process sensitive information, such as a payment processing company servicing multiple credit card issuing banks. As a result, there are numerous initiatives that are stalled in IT organizations at Fortune 500 companies because the working teams have not yet been able to satisfy the different stakeholders.

It is certainly true that digital has changed the game on the quantity of data sloshing around. It has always been true that for decision purposes, most of the available data are irrelevant. If anything, the new volume of unstructured data makes this more true than ever. The challenge is therefore not so much to get all of it, but to be able to pick what is important without having to store everything from the get go. Extensibility will therefore not be so much about adding capacity as about adding intelligence such as embedded analytics further up the chain.

Reliability v. Flexibility

The most reliable system is typically the one senior executives are embarrassed that they still have: a mainframe, mission critical box that is hard-wired to its inputs and outputs and is essentially bulletproof, boring and hard to maintain (because who writes this language anymore?). These systems are arguably the most secure – for all the reasons just listed as drawbacks. They are everything that is not 21st Century: not in the cloud, hard to modify accommodate new channels or tasks, notoriously lack interoperability and aren’t good with unstructured data.

In the future, as we see more and worse examples of data breaches and hacks, it is likely that what is old will again become new. This means that IT organizations will be forced to segregate data by purpose, not just by inputs and outputs. The use of unstructured data for analysis and decision support, even in real time, must not be allowed to compromise mission critical processing integrity or data security and accountability. Unfortunately, the ability to access PII and combine all customer data for the purposes of “targeting and messaging for relevance” may not advance much further in the medium term.

For marketers, this may mean getting over dreaming about the all-seeing all-knowing database and adopting a more flexible approach themselves (rather than just complaining about IT which increasingly they will own if Gartner is correct). Technology will support the business of funnel management in a more seamless way. The alternative of continuing to segregate the marketing discipline into branding, acquisition and retention/relationship management will likely produce poor results.

Privacy v. Service

For the most part, consumers both expect that companies know quite a lot about them and that they will in some way abuse this information. They tolerate the abuse because they either feel they are being compensated for it in some way or they have become fatalistic about it. Ironically, companies often know both more and less than consumers think they do. For example, big retailers should know everything you buy from them and should be able to use data to which they can subscribe to know how much you buy elsewhere and therefore be able to calculate with precision and speed what you are likely to buy in any given category, what their current share of your requirement is in the category and what your price elasticity as well as promotion responsiveness. In reality, though they ought to have this data, they seldom use it this way and so they “know” much less than one expects they “should.” This has little if anything to do with regulations and has more to do with their business model and culture.

Consumers value the use of information about them in ways that will provide value to them, whether in the form of a better price, more convenience or some other intangible benefit. They fully expect that companies will proactively use information to provide a better experience for them. Sometimes this works out and sometimes it doesn’t. This may be as a result of regulation but is often purely because of how companies are organized, how much or little investment they make in “knowing the customer,” and what the limitations of their legacy systems are.

One of ironies of the current regulatory environment is that it dis-incents companies from using forms of dialogue that are approved by the consumer, of value to the consumer and could provide a point of difference for the company.

Globalization v. Localization

Large multinational corporations want to standardize systems and processes worldwide to gain efficiency. They continue to make extensive use of off-shoring models to pay the lowest possible price for work, even that of a highly specialized and technical nature. Consequently, they have programming and analytics talent pools that may be concentrated in Bangalore, fulfillment in Latin America and widely distributed supply chain hubs. While it may be old hat at this point for a company like Caterpiller to use inter-modal transport to get heavy equipment to a buyer on site in a central Chinese province, it is still customary to try to roll up data into common reports and store customer transaction and PII centrally. Many companies lack a distributed or federated model for data to compare with their supply chain. Or, in some cases, their distributed supply chain is dependent on and must feed data to enterprise level reporting and control systems. Unfortunately, the ERP mentality that supports supply chain engineering is at cross purposes with the current and likely future reality of consumer data.

In some jurisdictions, asking customers for PII for, say, a hotel visit and storing it outside the country is prohibited on both counts. In Germany and Russia, the law stipulates what cannot be asked as well as what cannot be migrated or stored outside the country. This may put multinationals in the position of not only complying for business done in Germany, but potentially for business done elsewhere, since the German standard places de facto limitations on data capture and storage irrespective of the consumer’s country of domicile. Whether or not this is strictly enforceable is not the point. Companies have already stopped collecting data they formerly obtained in Germany and are on their way toward’s treating the German and other tough EU standards as the norm across the entire enterprise.

At the end of the day, providing consumers with controls as well as knowledge about what is captured and what happens to the data is not a nice-to-do, it’s a got-to-do. Being a global player means building in the flexibility (mostly absent now) to limit data capture and retrieval and hold certain sensitive data within the country. Whereas this is technically feasible today, the means for doing so might work at cross purposes with the most robust data security, which is also being required by authorities.

Opportunities v. CLM’s

What we have just reviewed constitutes a variety of ways of peeling an onion, that being how to best balance the need for information with the need to behave as a trustworthy steward of an asset which as analysts and business leaders does not really belong to us. This touches all aspects of technology selection & management, analytics and reporting of information.

Major opportunities exist to combine data in ways that will enhance consumer experiences with brands. While this is often talked about, there is still much work to be done to overcome the short comings of legacy systems, entrenched processes and business silos.

The current and likely future market reality means that companies cannot take their customers for granted or continue to acquire new customers to replace defectors. Accordingly, providing real dialogue and making decisions based on the voice of the customer will continue to pay dividends and provide job security.

On the other hand, the promise of “big data” can be undone by overzealous or careless use of consumer information. Penalties can include damaging brands , in some cases irreparably; and possibly facing civil and criminal penalties that can cripple an organization or constitute a career limiting move.

Data scientists will be much in demand, but they will need to be partners in the enterprise. It is not now and will not soon be possible for them to “have it all now.” They will need to pick and choose in ways that are both artful and scientific.

The opportunities for utilization of consumer data are real, but all come with risk. A prudent course of action is to understand risks, hedge them when possible, and alter strategies when this cannot effectively be done.