AWS adds natural language search service for business intelligence from its data sets

When Amazon Web Services launched QuickSight, its business intelligence service, back in 2016 the company wanted to provide product information and customer information for business users — not just developers.

At the time, the natural language processing technologies available weren’t robust enough to give customers the tools to search databases effectively using queries in plain speech.

Now, as those technologies have matured, Amazon is coming back with a significant upgrade called QuickSight Q, which allows users to just ask a simple question and get the answers they need, according to Andy Jassy’s keynote at AWS re:Invent.

“We will provide natural language to provide what we think the key learning is,” said Jassy. “I don’t like that our users have to know which databases to access or where data is stored. I want them to be able to type into a search bar and get the answer to a natural language question.

That’s what QuickSight Q aims to do. It’s a direct challenge to a number of business intelligence startups and another instance of the way machine learning and natural language processing are changing business processes across multiple industries.

“The way Q works. Type in a question in natural language [like]… ‘Give me the trailing twelve month sales of product X?’… You get an answer in seconds. You don’t have to know tables or have to know data stores.”

It’s a compelling use case and gets at the way AWS is integrating machine learning to provide more no-code services to customers. “Customers didn’t hire us to do machine learning,” Jassy said. “They hired us to answer the questions.”

Google Analytics update uses machine learning to surface more critical customer data

If you ever doubted the hunger brands have for more and better information about consumers, you only need to look at Twilio buying customer data startup Segment this week for $3.2 billion. Google sees this the same as everyone else, and today it introduced updates to Google Analytics to help companies understand their customers better (especially in conjunction with related Google tools).

Vidhya Srinivasan, vice president of measurement, analytics and buying platforms at Google, wrote in a company blog post introducing the new features that the company sees this changing customer-brand dynamic due to COVID, and it wants to assist by adding new features that help marketers achieve their goals, whatever those may be.

One way to achieve this is by infusing Analytics with machine learning to help highlight data automatically that’s important to marketers using the platform. “[Google Analytics] has machine learning at its core to automatically surface helpful insights and gives you a complete understanding of your customers across devices and platforms,” Srinivasan wrote in the blog post.

The idea behind the update is to give marketers access to more information they care about most by using that machine learning to surface data like which groups of customers are most likely to buy and which are most likely to churn, the very types of information marketing (and sales) teams need to try make proactive moves to keep customers from leaving or conversely turning those ready to buy into sales.

Google_Analytics_predictive_metric predict churn and most likely to convert to sales.

Image Credits: Google

If it works as described, it can give marketers a way to measure their performance with each customer or group of customers across their entire lifecycle, which is especially important during COVID when customer needs are constantly changing.

Of course, this being a Google product it’s designed to play nicely with Google Ads, YouTube and other tools like Gmail and Google Search, along with non-Google channels. As Srinivasan wrote:

The new approach also makes it possible to address longtime advertiser requests. Because the new Analytics can measure app and web interactions together, it can include conversions from YouTube engaged views that occur in-app and on the web in reports. Seeing conversions from YouTube video views alongside conversions from Google and non-Google paid channels, and organic channels like Google Search, social, and email, helps you understand the combined impact of all your marketing efforts.

The company is also trying to futureproof analytics with an eye toward stricter privacy laws like GDPR in Europe or CCPA in California by using modeling to fill in gaps in information when you can’t use cookies or other tracking software.

All of this is designed to help marketers, caught in trying times with a shifting regulatory landscape, to better understand customer needs and deliver them what they want when they want it — when they’re just trying to keep the customers satisfied.

These 3 factors are holding back podcast monetization

Podcast advertising growth is inhibited by three major factors:

  • Lack of macro distribution, consumption and audience data.
  • Current methods of conversion tracking.
  • Idea of a “playbook” for podcast performance marketing.

Because of these limiting factors, it’s currently more of an art than a science to piece disparate data from multiple sources, firms, agencies and advertisers, into a somewhat conclusive argument to brands as to why they should invest in podcast advertising.

1. Lack of macro distribution, consumption and audience data

There were several resources that released updates based on what they saw in terms of consumption when COVID-19 hit. Hosting platforms, publishers and third-party tracking platforms all put out their best guesses as to what was happening. Advertisers’ own podcast listening habits had been upended due to lockdowns; they wanted to know how broader changes in listening habits were affecting their campaigns. Were downloads going up, down or staying the same? What was happening with sports podcasts, without sports?


Read part 1 of this article, Podcast advertising has a business intelligence gap, on TechCrunch.


At Right Side Up, we receive and analyze all of the available research from major publishers (Stitcher, aCast), to major platforms (Megaphone) and third-party research firms (Podtrac, IAB, Edison Research). However, no single entity encompasses the entire space or provides the kind of interactive, off-the-shelf customizable SaaS product we’d prefer, and that digitally native marketers expect. Plus, there isn’t anything published in real-time; most sources publish once or twice annually.

So what did we do? We reached out to trusted publishers and partners to gather data around shifting consumption due to COVID-19 ourselves, and determined that, though there was a drop in downloads in the short term, it was neither as precipitous nor as enduring as some had feared. This was confirmed by some early reports available, but how were we to evidence our own piecewise sample with another? Moreover, how could you invest 6-7 figures of marketing dollars if you didn’t have the firsthand intelligence we gathered and our subject matter experts on deck to make constant adjustments to your approach?

We were able to piece together trends we’re seeing that point to increased download activity in recent months that surpass February/March heights. We’ve determined that the industry is back on track for growth with a less steep, but still growing, listenership trajectory. But even though more recent reports have been published, a longitudinal, objective resource has not yet emerged to show a majority of the industry’s journey through one of the most disruptive media environments in recent history.

There is a need for a new or existing entity to create cohesive data points; a third party that collects and reports listening across all major hosts and distribution points, or “podcatchers,” as they’re colloquially called. As a small example: Wouldn’t it be nice to objectively track seasonal listening of news/talk programming and schedule media planning and flighting around that? Or to know what the demographics of that audience look like compared to other verticals?

What percentage increase in efficiency and/or volume would you gain from your marketing efforts in the channel? Would that delta be profitable against paying a nominal or ongoing licensing or research fee for most brands?

These challenges aren’t just affecting advertisers. David Cohn, VP of Sales at Megaphone, agrees that “full transparency from the listening platforms would make our jobs easier, along with everyone else’s in the industry. We’d love to know how much of an episode is listened to, whether an ad is skipped, etc. Along the same lines, having a central source for [audience] measurement would be ideal — similar to what Nielsen has been for TV.” This would also enable us to understand cross-show ad frequency, another black box for advertisers and the industry at large.

Podcast advertising has a business intelligence gap

There are sizable, meaningful gaps in the knowledge collection and publication of podcast listening and engagement statistics. Coupled with still-developing advertising technology because of the distributed nature of the medium, this causes uncertainty in user consumption and ad exposure and impact. There is also a lot of misinformation and misconception about the challenges marketers face in these channels.

All of this compounds to delay ad revenue growth for creators, publishers and networks by inhibiting new and scaling advertising investment, resulting in lost opportunity among all parties invested in the channel. There’s a viable opportunity for a collective of industry professionals to collaborate on a solution for unified, free reporting, or a new business venture that collects and publishes more comprehensive data that ultimately promotes growth for podcast advertising.

Podcasts have always had challenges when it comes to the analytics behind distribution, consumption and conversion. For an industry projected to exceed $1 billion in ad spend in 2021, it’s impressive that it’s built on RSS: A stable, but decades-old technology that literally means really simple syndication. Native to the technology is a one-way data flow, which democratizes the medium from a publishing perspective and makes it easy for creators to share content, but difficult for advertisers trying to measure performance and figure out where to invest ad dollars. This is compounded by a fractured creator, server and distribution/endpoint environment unique to the medium.

Because podcasts lag other media channels in business intelligence, it’s still an underinvested channel relative to its ability to reach consumers and impact purchasing behavior.

For creators, podcasting has begun to normalize distribution analytics through a rising consolidation of hosts like Art19, Megaphone, Simplecast and influence from the IAB. For advertisers, though, consumption and conversion analytics still lag far behind. For the high-growth tech companies we work with, and as performance marketers ourselves, measuring the return on investment of our ad spend is paramount.

Because podcasts lag other media channels in business intelligence, it’s still an underinvested channel relative to its ability to reach consumers and impact purchasing behavior. This was evidenced when COVID-19 hit this year, as advertisers that were highly invested or highly interested in investing in podcast advertising asked a very basic question: “Is COVID-19, and its associated lifestyle shifts, affecting podcast listening? If so, how?”

The challenges of decentralized podcast ad data

We reached out to trusted partners to ask them for insights specific to their shows.

Nick Southwell-Keely, U.S. director of Sales & Brand Partnerships at Acast, said: “We’re seeing our highest listens ever even amid the pandemic. Across our portfolio, which includes more than 10,000 podcasts, our highest listening days in Acast history have occurred in [July].” Most partners provided similar anecdotes, but without centralized data, there was no one, singular firm to go to for an answer, nor one report to read that would cover 100% of the space. Almost more importantly, there is no third-party perspective to validate any of the anecdotal information shared with us.

Publishers, agencies and firms all scrambled to answer the question. Even still, months later, we don’t have a substantial and unifying update on exactly what, if anything, happened, or if it’s still happening, channel-wide. Rather, we’re still checking in across a wide swath of partners to identify and capitalize on microtrends. Contrast this to native digital channels like paid search and paid social, and connected, yet formerly “traditional” media (e.g., TV, CTV/OTT) that provide consolidated reports that marketers use to make decisions about their media investments.

The lasting murkiness surrounding podcast media behavior during COVID-19 is just one recent case study on the challenges of a decentralized (or nonexistent) universal research vendor/firm, and how it can affect advertisers’ bottom lines. A more common illustration of this would be an advertiser pulling out of ads, for fear of underdelivery on a flat rate unit, missing out on incremental growth because they were worried about not being able to get download reporting and getting what they paid for. It’s these kinds of basic shortcomings that the ad industry needs to account for before we can hit and exceed the ad revenue heights projected for podcasting.

Advertisers may pull out of campaigns for fear of under-delivery, missing out on incremental growth because they were worried about not getting what they paid for.

If there’s a silver lining to the uncertainty in podcast advertising metrics and intelligence, it’s that supersavvy growth marketers have embraced the nascent medium and allowed it to do what it does best: personalized endorsements that drive conversions. While increased data will increase demand and corresponding ad premiums, for now, podcast advertising “veterans” are enjoying the relatively low profile of the space.

As Ariana Martin, senior manager, Offline Growth Marketing at Babbel notes, “On the other hand, podcast marketing, through host read ads, has something personal to it, which might change over time and across different podcasts. Because of this personal element, I am not sure if podcast marketing can ever be transformed into a pure data game. Once you get past the understanding that there is limited data in podcasting, it is actually very freeing as long as you’re seeing a certain baseline of good results, [such as] sales attributed to podcast [advertising] via [survey based methodology], for example.”

So how do we grow from the industry feeling like a secret game-changing channel for a select few brands, to widespread adoption across categories and industries?

Below, we’ve laid out the challenges of nonuniversal data within the podcast space, and how that hurts advertisers, publishers, third-party research/tracking organizations, and broadly speaking, the podcast ecosystem. We’ve also outlined the steps we’re taking to make incremental solutions, and our vision for the industry moving forward.

Lingering misconceptions about podcast measurement

1. Download standardization

In search of a rationale to how such a buzzworthy growth channel lags behind more established media types’ advertising revenue, many articles will point to “listener” or “download” numbers not being normalized. As far as we can tell at Right Side Up, where we power most of the scaled programs run by direct advertisers, making us a top three DR buying force in the industry, the majority of publishers have adopted the IAB Podcast Measurement Technical Guidelines Version 2.0.

This widespread adoption solved the “apples to apples” problem as it pertained to different networks/shows valuing a variable, nonstandard “download” as an underlying component to their CPM calculations. Previous to this widespread adoption, it simply wasn’t known whether a “download” from publisher X was equal to a “download” from publisher Y, making it difficult to aim for a particular CPM as a forecasting tool for performance marketing success.

However, the IAB 2.0 guidelines don’t completely solve the unique-user identification problem, as Dave Zohrob, CEO of Chartable points out. “Having some sort of anonymized user identifier to better calculate audience size would be fantastic —  the IAB guidelines offer a good approximation given the data we have but [it] would be great to actually know how many listeners are behind each IP/user-agent combo.”

2. Proof of ad delivery

A second area of business intelligence gaps that many articles point to as a cause of inhibited growth is a lack of “proof of delivery.” Ad impressions are unverifiable, and the channel doesn’t have post logs, so for podcast advertisers the analogous evidence of spots running is access to “airchecks,” or audio clippings of the podcast ads themselves.

Legacy podcast advertisers remember when a full-time team of entry-level staffers would hassle networks via phone or email for airchecks, sometimes not receiving verification that the spot had run until a week or more after the fact. This delay in the ability to accurately report spend hampered fast-moving performance marketers and gave the illusion of podcasts being a slow, stiff, immovable media type.

Systematic aircheck collection has been a huge advent and allowed for an increase in confidence in the space — not only for spend verification, but also for creative compliance and optimization. Interestingly, this feature has come up almost as a byproduct of other development, as the companies who offer these services actually have different core business focuses: Magellan AI, our preferred partner, is primarily a competitive intelligence platform, but pivoted to also offer airchecking services after realizing what a pain point it was for advertisers; Veritone, an AI company that’s tied this service to its ad agency, Veritone One; and Podsights, a pixel-based attribution modeling solution.

3. Competitive intelligence

Last, competitive intelligence and media research continue to be a challenge. Magellan AI and Podsights offer a variety of fee and free tiers and methods of reporting to show a subset of the industry’s activity. You can search a show, advertiser or category, and get a less-than-whole, but still directionally useful, picture of relevant podcast advertising activity. While not perfect, there are sufficient resources to at least see the tip of the industry iceberg as a consideration point to your business decision to enter podcasts or not.

As Sean Creeley, founder of Podsights, aptly points out: “We give all Podsights research data, analysis, posts, etc. away for free because we want to help grow the space. If [a brand], as a DIY advertiser, desired to enter podcasting, it’s a downright daunting task. Research at least lets them understand what similar companies in their space are doing.”

There is also a nontech tool that publishers would find valuable. When we asked Shira Atkins, co-founder of Wonder Media Network, how she approaches research in the space, she had a not-at-all-surprising, but very refreshing response: “To be totally honest, the ‘research’ I do is texting and calling the 3-5 really smart sales people I know and love in the space. The folks who were doing radio sales when I was still in high school, and the podcast people who recognize the messiness of it all, but have been successful at scaling campaigns that work for both the publisher and the advertiser. I wish there was a true tracker of cross-industry inventory — how much is sold versus unsold. The way I track the space writ large is by listening to a sample set of shows from top publishers to get a sense for how they’re selling and what their ads are like.”

Even though podcast advertising is no longer limited by download standardization, spend verification and competitive research, there are still hurdles that the channel has not yet overcome.


The conclusion to this article, These 3 factors are holding back podcast monetization, is available exclusively to Extra Crunch subscribers.

Will automation eliminate data science positions?

“Will automation eliminate data science positions?”

This is a question I’m asked at almost every conference I attend, and it usually comes from someone from one of two groups with a vested interest in the answer: The first is current or aspiring practitioners who are wondering about their future employment prospects. The second consists of executives and managers who are just starting on their data science journey.

They have often just heard that Target can determine whether a customer is pregnant from her shopping patterns and are hoping for similarly powerful tools for their data. And they have heard the latest automated-AI vendor pitch that promises to deliver what Target did (and more!) without data scientists. We argue that automation and better data science tooling will not eliminate or even reduce data science positions (including use cases like the Target story). It creates more of them!

Here’s why.

Understanding the business problem is the biggest challenge

The most important question in data science is not which machine learning algorithm to choose or even how to clean your data. It is the questions you need to ask before even one line of code is written: What data do you choose and what questions do you choose to ask of that data?

What is missing (or wishfully assumed) from the popular imagination is the ingenuity, creativity and business understanding that goes into those tasks. Why do we care if our customers are pregnant? Target’s data scientists had built upon substantial earlier work to understand why this was a lucrative customer demographic primed to switch retailers. Which datasets are available and how can we pose scientifically testable questions of those datasets?

Target’s data science team happened to have baby registry data tied to purchasing history and knew how to tie that to customer spending. How do we measure success? Formulating nontechnical requirements into technical questions that can be answered with data is amongst the most challenging data science tasks — and probably the hardest to do well. Without experienced humans to formulate these questions, we would not be able to even start on the journey of data science.

Making your assumptions

After formulating a data science question, data scientists need to outline their assumptions. This often manifests itself in the form of data munging, data cleaning and feature engineering. Real-world data are notoriously dirty and many assumptions have to be made to bridge the gap between the data we have and the business or policy questions we are seeking to address. These assumptions are also highly dependent on real-world knowledge and business context.

In the Target example, data scientists had to make assumptions about proxy variables for pregnancy, realistic time frame of their analyses and appropriate control groups for accurate comparison. They almost certainly had to make realistic assumptions that allowed them to throw out extraneous data and correctly normalize features. All of this work depends critically on human judgment. Removing the human from the loop can be dangerous as we have seen with the recent spate of bias-in-machine-learning incidents. It is perhaps no coincidence that many of them revolve around deep learning algorithms that make some of the strongest claims to do away with feature engineering.

So while parts of core machine learning are automated (in fact, we even teach some of the ways to automate those workflows), the data munging, data cleaning and feature engineering (which comprises 90% of the real work in data science) cannot be safely automated away.

A historical analogy

There is a clear precedent in history to suggest data science will not be automated away. There is another field where highly trained humans are crafting code to make computers perform amazing feats. These humans are paid a significant premium over others who are not trained in this field and (perhaps not surprisingly) there are education programs specializing in training this skill. The resulting economic pressure to automate this field is equally, if not more, intense. This field is software engineering.

Indeed, as software engineering has become easier, the demand for programmers has only grown. This paradox — that automation increases productivity, driving down prices and ultimately driving up demand is not new — we’ve seen it again and again in fields ranging from software engineering to financial analysis to accounting. Data science is no exception and automation will likely drive up demand for this skillset, not down.

As the pandemic creates supply chain chaos, Craft raises $10M to apply some intelligence

During the COVID-19 pandemic supply chains have suddenly become hot. Who knew that would ever happen? The race to secure PPE, ventilators, minor things like food, was and still is, an enormous issue. But perhaps, predictably, the world of ‘supply chain software’ could use some updating. Most of the platforms are deployed ‘empty’ and require the client to populate them with their own data or ‘bring their own data’. The UIs can be outdated and still have to be juggled with manual and offline workflows. So startups working in this space are now attracting some timely attention.

Thus, Craft, the enterprise intelligence company, today announces that it has closed a $10 million Series A financing to build what it characterizes as a ‘supply chain intelligence platform’. With the new funding, Craft will expand its offices in San Francisco, London, and Minsk, and grow remote teams across engineering, sales, marketing and operations in North America and Europe.

It competes with some large incumbents such as Dun & Bradstreet, Bureau van Dijk, Thomson Reuters . These are traditional data providers focused primarily on providing financial data about public companies, rather than real-time data from data sources such as operating metrics, human capital, and risk metrics.

The idea is to allow companies to monitor and optimize their supply chain and enterprise systems. The financing was led by High Alpha Capital, alongside Greycroft. Craft also has some high-flying Angel investors including Sam Palmisano, chairman of the Center for Global Enterprise and former CEO and chairman of IBM; Jim Moffatt, former CEO of Deloitte Consulting; Frederic Kerrest, executive vice-chairman, COO and co-founder of Okta; and Uncork Capital which previously led Craft’s Seed financing. High Alpha Partner, Kristian Andersen, is joining Craft’s Board of Directors.

The problem Craft is attacking is a lack of visibility into complex global supply chains. For obvious reasons, COVID-19 disrupted global supply chains which tended to reveal a lot of risks, structural weaknesses across industries and a lack of intelligence about how it’s all holding together. Craft’s solution is a proprietary data platform, API, and portal that integrates into existing enterprise workflows.

While many business intelligence products require clients to bring their own data, Craft’s data platform comes pre-deployed with data from thousands of financial and alternative sources, such as 300+ data points that are refreshed using both Machine Learning and human validation. It’s open-to-the-web company profiles appear in 50 million search results, for instance.

Ilya Levtov, co-founder and CEO of Craft said in a statement: “Today, we are focused on providing powerful tracking and visibility to enterprise supply chains, while our ultimate vision is to build the intelligence layer of the enterprise technology stack.”

Kristian Andersen, partner with High Alpha commented: “We have a deep conviction that supply chain management remains an underinvested and under-innovated category in enterprise software.”

In the first half of 2020, Craft claims its revenues have grown nearly threefold, with Fortune 100 companies, government and military agencies, and SMEs among its clients.

IoT and data science will boost foodtech in the post-pandemic era

Even as e-grocery usage has skyrocketed in our coronavirus-catalyzed world, brick-and-mortar grocery stores have soldiered on. While strict in-store safety guidelines may gradually ease up, the shopping experience will still be low-touch and socially distanced for the foreseeable future.

This begs the question: With even greater challenges than pre-pandemic, how can grocers ensure their stores continue to operate profitably?

Just as micro-fulfillment centers (MFCs), dark stores and other fulfillment solutions have been helping e-grocers optimize profitability, a variety of old and new technologies can help brick-and-mortar stores remain relevant and continue churning out cash.

Today, we present three “must-dos” for post-pandemic retail grocers: rely on the data, rely on the biology and rely on the hardware.

Rely on the data

Image Credits: Pixabay/Pexels (opens in a new window)

The hallmark of shopping in a store is the consistent availability and wide selection of fresh items — often more so than online. But as the number of in-store customers continues to fluctuate, planning inventory and minimizing waste has become ever more so a challenge for grocery store managers. Grocers on average throw out more than 12% of their on-shelf produce, which eats into already razor-thin margins.

While e-grocers are automating and optimizing their fulfillment operations, brick-and-mortar grocers can automate and optimize their inventory planning mechanisms. To do this, they must leverage their existing troves of customer, business and external data to glean valuable insights for store managers.

Eden Technologies of Walmart is a pioneering example. Spun out of a company hackathon project, the internal tool has been deployed at over 43 distribution centers nationwide and promises to save Walmart over $2 billion in the coming years. For instance, if a batch of produce intended for a store hundreds of miles away is deemed soon-to-ripen, the tool can help divert it to the nearest store instead, using FDA standards and over 1 million images to drive its analysis.

Similarly, ventures such as Afresh Technologies and Shelf Engine have built platforms to leverage years of historical customer and sales data, as well as seasonality and other external factors, to help store managers determine how much to order and when. The results have been nothing but positive — Shelf Engine customers have increased gross margins by over 25% and Afresh customers have reduced food waste by up to 45%.

Four steps for drafting an ethical data practices blueprint

In 2019, UnitedHealthcare’s health-services arm, Optum, rolled out a machine learning algorithm to 50 healthcare organizations. With the aid of the software, doctors and nurses were able to monitor patients with diabetes, heart disease and other chronic ailments, as well as help them manage their prescriptions and arrange doctor visits. Optum is now under investigation after research revealed that the algorithm (allegedly) recommends paying more attention to white patients than to sicker Black patients.

Today’s data and analytics leaders are charged with creating value with data. Given their skill set and purview, they are also in the organizationally unique position to be responsible for spearheading ethical data practices. Lacking an operationalizable, scalable and sustainable data ethics framework raises the risk of bad business practices, violations of stakeholder trust, damage to a brand’s reputation, regulatory investigation and lawsuits.

Here are four key practices that chief data officers/scientists and chief analytics officers (CDAOs) should employ when creating their own ethical data and business practice framework.

Identify an existing expert body within your organization to handle data risks

The CDAO must identify and execute on the economic opportunity for analytics, and with opportunity comes risk. Whether the use of data is internal — for instance, increasing customer retention or supply chain efficiencies — or built into customer-facing products and services, these leaders need to explicitly identify and mitigate risk of harm associated with the use of data.

A great way to begin to build ethical data practices is to look to existing groups, such as a data governance board, that already tackles questions of privacy, compliance and cyber-risk, to build a data ethics framework. Dovetailing an ethics framework with existing infrastructure increases the probability of successful and efficient adoption. Alternatively, if no such body exists, a new body should be created with relevant experts from within the organization. The data ethics governing body should be responsible for formalizing data ethics principles and operationalizing those principles for products or processes in development or already deployed.

Ensure that data collection and analysis are appropriately transparent and protect privacy

All analytics and AI projects require a data collection and analysis strategy. Ethical data collection must, at a minimum, include: securing informed consent when collecting data from people, ensuring legal compliance, such as adhering to GDPR, anonymizing personally identifiable information so that it cannot reasonably be reverse-engineered to reveal identities and protecting privacy.

Some of these standards, like privacy protection, do not necessarily have a hard and fast level that must be met. CDAOs need to assess the right balance between what is ethically wise and how their choices affect business outcomes. These standards must then be translated to the responsibilities of product managers who, in turn, must ensure that the front-line data collectors act according to those standards.

CDAOs also must take a stance on algorithmic ethics and transparency. For instance, should an AI-driven search function or recommender system strive for maximum predictive accuracy, providing a best guess as to what the user really wants? Is it ethical to micro-segment, limiting the results or recommendations to what other “similar people” have clicked on in the past? And is it ethical to include results or recommendations that are not, in fact, predictive, but profit-maximizing to some third party? How much algorithmic transparency is appropriate, and how much do users care? A strong ethical blueprint requires tackling these issues systematically and deliberately, rather than pushing these decisions down to individual data scientists and tech developers that lack the training and experience to make these decisions.

Anticipate – and avoid – inequitable outcomes

Division and product managers need guidance on how to anticipate inequitable and biased outcomes. Inequalities and biases can arise due simply to data collection imbalances — for instance, a facial recognition tool that has been trained on 100,000 male faces and 5,000 female faces will likely be differently effective by gender. CDAOs must help ensure balanced and representative data sets.

Other biases are less obvious, but just as important. In 2019, Apple Card and Goldman Sachs were accused of gender bias when extending higher credit lines to men than women. Though Goldman Sachs maintained that creditworthiness — not gender — was the driving factor in credit decisions, the fact that women have historically had fewer opportunities to build credit likely meant that the algorithm favored men.

To mitigate inequities, CDAOs must help tech developers and product managers alike navigate what it means to be fair. While computer science literature offers myriad metrics and definitions of fairness, developers cannot reasonably choose one in the absence of collaborations with the business managers and external experts who can offer deep contextual understanding of how data will eventually be used. Once standards for fairness are chosen, they must also be effectively communicated to data collectors to ensure adherence.

Align organizational structure with the process for identifying ethical risk

CDAOs often build analytics capacity in one of two ways: via a center of excellence, in service to an entire organization, or a more distributed model, with data scientists and analytics investments committed to specific functional areas, such as marketing, finance or operations. Regardless of organizational structure, the processes and rubrics for identifying ethical risk must be clearly communicated and appropriately incentivized.

Key steps include:

  • Clearly establishing accountability by creating linkages from the data ethics body to departments and teams. This can be done by having each department or team designate its own “ethics champion” to monitor ethics issues. Champions need to be able to elevate concerns to the data ethics body, which can advise on mitigation strategies, such as augmenting existing data, improving transparency or creating a new objective function.
  • Ensuring consistent definitions and processes across teams through education and training around data and AI ethics.
  • Broadening teams’ perspectives on how to identify and remediate ethical problems by facilitating collaborations across internal teams and sharing examples and research from other domains.
  • Creating incentives — financial or other recognitions — to build a culture that values the identification and mitigation of ethical risk.

CDAOs are charged with the strategic use and deployment of data to drive revenue with new products and to create greater internal consistencies. Too many business and data leaders today attempt to “be ethical” by simply weighing the pros and cons of decisions as they arise. This short-sighted view creates unnecessary reputational, financial and organizational risk. Just as a strategic approach to data requires a data governance program, good data governance requires an ethics program. Simply put, good data governance is ethical data governance.

Google Cloud opens its Seoul region

Google Cloud today announced that its new Seoul region, its first in Korea, is now open for business. The region, which it first talked about last April, will feature three availability zones and support for virtually all of Google Cloud’s standard service, ranging from Compute Engine to BigQuery, Bigtable and Cloud Spanner.

With this, Google Cloud now has a presence in 16 countries and offers 21 regions with a total of 64 zones. The Seoul region (with the memorable name of asia-northeast3) will complement Google’s other regions in the area, including two in Japan, as well as regions in Hong Kong and Taiwan, but the obvious focus here is on serving Korean companies with low-latency access to its cloud services.

“As South Korea’s largest gaming company, we’re partnering with Google Cloud for game development, infrastructure management, and to infuse our operations with business intelligence,” said Chang-Whan Sul, the CTO of Netmarble. “Google Cloud’s region in Seoul reinforces its commitment to the region and we welcome the opportunities this initiative offers our business.”

Over the course of this year, Google Cloud also plans to open more zones and regions in Salt Lake City, Las Vegas and Jakarta, Indonesia.

Facebook’s use of Onavo spyware faces questions in EU antitrust probe — report

Facebook’s use of the Onavo spyware VPN app it acquired in 2013 — and used to inform its 2014 purchase of the then rival WhatsApp messaging platform — is on the radar of Europe’s antitrust regulator, per a report in the Wall Street Journal.

The newspaper reports that the Commission has requested a large volume of internal documents as part of a preliminary investigation into Facebook’s data practices which was announced in December.

The WSJ cites people familiar with the matter who told it the regulator’s enquiry is focused on allegations Facebook sought to identify and crush potential rivals and thereby stifle competition by leveraging its access to user data.

Facebook announced it was shutting down Onavo a year ago — in the face of rising controversial about its use of the VPN tool as a data-gathering business intelligence dragnet that’s both hostile to user privacy and raises major questions about anti-competitive practices.

As recently as 2018 Facebook was still actively pushing Onavo at users of its main social networking app — marketing it under a ‘Protect’ banner intended to convince users that the tool would help them protect their information.

In fact the VPN allowed Facebook to monitor their activity across third party apps — enabling the tech giant to spot emerging trends across the larger mobile ecosystem. (So, as we’ve said before, ‘Protect Facebook’s business’ would have been a more accurate label for the tool.)

By the end of 2018 further details about how Facebook had used Onavo as a key intelligence lever in major acquisitions emerged when a UK parliamentary committee obtained a cache of internal documents related to a US court case brought by a third party developer which filed suit alleging unfair treatment on its app platform.

UK parliamentarians concluded that Facebook used Onavo to conduct global surveys of the usage of mobile apps by customers, apparently without their knowledge — using the intel to assess not just how many people had downloaded apps but how often they used them, which in turn helped the tech giant to decide which companies to acquire and which to treat as a threat.

The parliamentary committee went on to call for competition and data protection authorities to investigate Facebook’s business practices.

So it’s not surprising that Europe’s competition commission should also be digging into how Facebook used Onavo. The Commission also been reviewing changes Facebook made to its developer APIs which affected what information it made available, per the WSJ’s sources.

Internal documents published by the UK parliament also highlighted developer access issues — such as Facebook’s practice of whitelisting certain favored developers’ access to user data, raising questions about user consent to the sharing of their data — as well as fairness vis-a-vis non-whitelisted developers.

According to the newspaper’s report the regulator has requested a wide array of internal Facebook documents as part of its preliminary investigation, including emails, chat logs and presentations. It says Facebook’s lawyers have pushed back — seeking to narrow the discovery process by arguing that the request for info is so broad it would produce millions of documents and could reveal Facebook employees’ personal data.

Some of the WSJ’s sources also told it the Commission has withdrawn the original order and intends to issue a narrower request.

We’ve reached out to Facebook and the competition regulator for comment.

Back in 2017 the European Commission fined Facebook $122M for providing incorrect or misleading information at the time of the WhatsApp acquisition. Facebook had given regulator assurances that user accounts could not be linked across the two services — which cleared the way for it to be allowed to acquire WhatsApp — only for the company to u-turn in 2016 by saying it would be linking user data.

In addition to investigating Facebook’s data practices over potential antitrust concerns, the EU’s competition regulator is also looking into Google’s data practices — announcing a preliminary probe in December.