Kleeen raises $3.8M to make front-end design for business applications easy

Building a front-end for business applications is often a matter of reinventing the wheel, but because every business’ needs are slightly different, it’s also hard to automate. Kleeen is the latest startup to attempt this, with a focus on building the user interface and experience for today’s data-centric applications. The service, which was founded by a team that previously ran a UI/UX studio in the Bay Area, uses a wizard-like interface to build the routine elements of the app and frees a company’s designers and developers to focus on the more custom elements of an application.

The company today announced that it has raised a $3.8 million seed round led by First Ray Venture Partners. Leslie Ventures, Silicon Valley Data Capital, WestWave Capital, Neotribe Ventures, AI Fund and a group of angel investors also participated in the round. Neotribe also led Kleeen’s $1.6 million pre-seed round, bringing the company’s total funding to $5.3 million.

Image Credits: Kleeen

After the startup he worked at sold, Kleeen co-founder, CPO and President Joshua Hailpern told me, he started his own B2B design studio, which focused on front-end design and engineering.

“What we ended up seeing was the same pattern that would happen over and over again,” he said. “We would go into a client, and they would be like: ‘we have the greatest idea ever. We want to do this, this, this and this.’ And they would tell us all these really cool things and we were: ‘hey, we want to be part of that.’ But then what we would end up doing was not that. Because when building products — there’s the showcase of the product and there’s all these parts that support that product that are necessary but you’re not going to win a deal because someone loved that config screen.”

The idea behind Kleeen is that you can essentially tell the system what you are trying to do and what the users need to be able to accomplish — because at the end of the day, there are some variations in what companies need from these basic building blocks, but not a ton. Kleeen can then generate this user interface and workflow for you — and generate the sample data to make this mock-up come to life.

Once that work is done, likely after a few iterations, Kleeen can generate React code, which development teams can then take and work with directly.

Image Credits: Kleeen

As Kleeen co-founder and CEO Matt Fox noted, the platform explicitly doesn’t want to be everything to everybody.

“In the no-code space, to say that you can build any app probably means that you’re not building any app very well if you’re just going to cover every use case. If someone wants to build a Bumble-style phone app where they swipe right and swipe left and find their next mate, we’re not the application platform for you. We’re focused on really data-intensive workflows.” He noted that Kleeen is at its best when developers use it to build applications that help a company analyze and monitor information and, crucially, take action on that information within the app. It’s this last part that also clearly sets it apart from a standard business intelligence platform.

Census raises $16M Series A to help companies put their data warehouses to work

Census, a startup that helps businesses sync their customer data from their data warehouses to their various business tools like Salesforce and Marketo, today announced that it has raised a $16 million Series A round led by Sequoia Capital. Other participants in this round include Andreessen Horowitz, which led the company’s $4.3 million seed round last year, as well as several notable angles, including Figma CEO Dylan Field, GitHub CTO Jason Warner, Notion COO Akshay Kothari and Rippling CEO Parker Conrad.

The company is part of a new crop of startups that are building on top of data warehouses. The general idea behind Census is to help businesses operationalize the data in their data warehouses, which was traditionally only used for analytics and reporting use cases. But as businesses realized that all the data they needed was already available in their data warehouses and that they could use that as a single source of truth without having to build additional integrations, an ecosystem of companies that operationalize this data started to form.

The company argues that the modern data stack, with data warehouses like Amazon Redshift, Google BigQuery and Snowflake at its core, offers all of the tools a business needs to extract and transform data (like Fivetran, dbt) and then visualize it (think Looker).

Tools like Census then essentially function as a new layer that sits between the data warehouse and the business tools that can help companies extract value from this data. With that, users can easily sync their product data into a marketing tool like Marketo or a CRM service like Salesforce, for example.

Image Credits: Census

Three years ago, we were the first to ask, ‘Why are we relying on a clumsy tangle of wires connecting every app when everything we need is already in the warehouse? What if you could leverage your data team to drive operations?’ When the data warehouse is connected to the rest of the business, the possibilities are limitless.” Census explains in today’s announcement. “When we launched, our focus was enabling product-led companies like Figma, Canva, and Notion to drive better marketing, sales, and customer success. Along the way, our customers have pulled Census into more and more scenarios, like auto-prioritizing support tickets in Zendesk, automating invoices in Netsuite, or even integrating with HR systems.

Census already integrates with dozens of different services and data tools and its customers include the likes of Clearbit, Figma, Fivetran, LogDNA, Loom and Notion.

Looking ahead, Census plans to use the new funding to launch new features like deeper data validation and a visual query experience. In addition, it also plans to launch code-based orchestration to make Census workflows versionable and make it easier to integrate them into enterprise orchestration system.

TigerGraph raises $105M Series C for its enterprise graph database

TigerGraph, a well-funded enterprise startup that provides a graph database and analytics platform, today announced that it has raised a $105 million Series C funding round. The round was led by Tiger Global and brings the company’s total funding to over $170 million.

“TigerGraph is leading the paradigm shift in connecting and analyzing data via scalable and native graph technology with pre-connected entities versus the traditional way of joining large tables with rows and columns,” said TigerGraph found and CEO, Yu Xu. “This funding will allow us to expand our offering and bring it to many more markets, enabling more customers to realize the benefits of graph analytics and AI.”

Current TigerGraph customers include the likes of Amgen, Citrix, Intuit, Jaguar Land Rover and UnitedHealth Group. Using a SQL-like query language (GSQL), these customers can use the company’s services to store and quickly query their graph databases. At the core of its offerings is the TigerGraphDB database and analytics platform, but the company also offers a hosted service, TigerGraph Cloud, with pay-as-you-go pricing, hosted either on AWS or Azure. With GraphStudio, the company also offers a graphical UI for creating data models and visually analyzing them.

The promise for the company’s database services is that they can scale to tens of terabytes of data with billions of edges. Its customers use the technology for a wide variety of use cases, including fraud detection, customer 360, IoT, AI, and machine learning.

Like so many other companies in this space, TigerGraph is facing some tailwind thanks to the fact that many enterprises have accelerated their digital transformation projects during the pandemic.

“Over the last 12 months with the COVID-19 pandemic, companies have embraced digital transformation at a faster pace driving an urgent need to find new insights about their customers, products, services, and suppliers,” the company explains in today’s announcement. “Graph technology connects these domains from the relational databases, offering the opportunity to shrink development cycles for data preparation, improve data quality, identify new insights such as similarity patterns to deliver the next best action recommendation.”

Use Git data to optimize your developers’ annual reviews

The end of the year is looming and with it one of your most important tasks as a manager. Summarizing the performance of 10, 20 or 50 developers over the past 12 months, offering personalized advice and having the facts to back it up — is no small task.

We believe that the only unbiased, accurate and insightful way to understand how your developers are working, progressing and — last but definitely not least — how they’re feeling, is with data. Data can provide more objective insights into employee activity than could ever be gathered by a human.

It’s still very hard for many managers to fully understand that all employees work at different paces and levels.

Consider this: Over two-thirds of employees say they would put more effort into their work if they felt more appreciated, and 90% want a manager who’s fair to all employees.

Let’s be honest. It’s hard to judge all of your employees fairly if you’re (1) unable to work physically side-by-side with them, meaning you’ll inevitably have more contact with the some over others (e.g., those you’re more friendly with); and (2) you’re relying on manual trackers to keep on top of everyone’s work, which can get lost and take a lot of effort to process and analyze; (3) you expect engineers to self-report their progress, which is far from objective.

It’s also unlikely, especially with the quieter ones, that on top of all that you’ll have identified areas for them to expand their talents by upskilling or reskilling. But it’s that kind of personal attention that will make employees feel appreciated and able to progress professionally with you. Absent that, they’re likely to take the next best job opportunity that shows up.

So here’s a run down of why you need data to set up a fair annual review process; if not this year, then you can kick-start it for 2021.

1. Use data to set next year’s goals

The best way to track your developers’ progress automatically is by using Git Analytics tools, which track the performance of individuals by aggregating historical Git data and then feeding that information back to managers in minute detail.

This data will clearly show you if one of your engineers is over capacity or underworked and the types of projects they excel in. If you’re assessing an engineering manager and the team members they’re responsible for have been taking longer to push their code to the shared repository, causing a backlog of tasks, it may mean that they’re not delegating tasks properly. An appropriate goal here would be to track and divide their team’s responsibilities more efficiently, which can be tracked using the same metrics, or cross-training members of other teams to assist with their tasks.

Another example is that of an engineer who is dipping their toe into multiple projects. Indicators of where they’ve performed best include churn (we’ll get to that later), coworkers repeatedly asking that same employee to assist them in new tasks and of course positive feedback for senior staff, which can easily be integrated into Git analytics tools. These are clear signs that next year, your engineer could be maximizing their talents in these alternative areas, and you could diversify their tasks accordingly.

Once you know what targets to set, you can use analytics tools to create automatic targets for each engineer. That means that after you’ve set it up, it will be updated regularly on the engineer’s progress using indicators directly from the code repository. It won’t need time-consuming input from either you or your employee, allowing you both to focus on more important tasks. As a manager you’ll receive full reports once the deadline of the task is reached and get notified whenever metrics start dropping or the goal has been met.

This is important — you’ll be able to keep on top of those goals yourself, without having to delegate that responsibility or depend on self-reporting by the engineer. It will keep employee monitoring honest and transparent.

2. Three Git metrics can help you understand true performance quality

The easiest way for managers to “conclude” how an engineer has performed is by looking at superficial output: the number of completed pull requests submitted per week, the number of commits per day, etc. Especially for nontechnical managers, this is a grave but common error. When something is done, it doesn’t mean it’s been done well or that it is even productive or usable.

Instead, look at these data points to determine the actual quality of your engineer’s work:

  1. Churn is your number-one red flag, telling you how many times someone has modified their code in the first 21 days after it has been checked in. The more churn, the less of an engineer’s code is actually productive, with good longevity. Churn is a natural and healthy part of the software development process, but we’ve identified that any churn level above the normal 15%-30% indicates that an engineer is struggling with assignments.

AWS adds natural language search service for business intelligence from its data sets

When Amazon Web Services launched QuickSight, its business intelligence service, back in 2016 the company wanted to provide product information and customer information for business users — not just developers.

At the time, the natural language processing technologies available weren’t robust enough to give customers the tools to search databases effectively using queries in plain speech.

Now, as those technologies have matured, Amazon is coming back with a significant upgrade called QuickSight Q, which allows users to just ask a simple question and get the answers they need, according to Andy Jassy’s keynote at AWS re:Invent.

“We will provide natural language to provide what we think the key learning is,” said Jassy. “I don’t like that our users have to know which databases to access or where data is stored. I want them to be able to type into a search bar and get the answer to a natural language question.

That’s what QuickSight Q aims to do. It’s a direct challenge to a number of business intelligence startups and another instance of the way machine learning and natural language processing are changing business processes across multiple industries.

“The way Q works. Type in a question in natural language [like]… ‘Give me the trailing twelve month sales of product X?’… You get an answer in seconds. You don’t have to know tables or have to know data stores.”

It’s a compelling use case and gets at the way AWS is integrating machine learning to provide more no-code services to customers. “Customers didn’t hire us to do machine learning,” Jassy said. “They hired us to answer the questions.”

Google Analytics update uses machine learning to surface more critical customer data

If you ever doubted the hunger brands have for more and better information about consumers, you only need to look at Twilio buying customer data startup Segment this week for $3.2 billion. Google sees this the same as everyone else, and today it introduced updates to Google Analytics to help companies understand their customers better (especially in conjunction with related Google tools).

Vidhya Srinivasan, vice president of measurement, analytics and buying platforms at Google, wrote in a company blog post introducing the new features that the company sees this changing customer-brand dynamic due to COVID, and it wants to assist by adding new features that help marketers achieve their goals, whatever those may be.

One way to achieve this is by infusing Analytics with machine learning to help highlight data automatically that’s important to marketers using the platform. “[Google Analytics] has machine learning at its core to automatically surface helpful insights and gives you a complete understanding of your customers across devices and platforms,” Srinivasan wrote in the blog post.

The idea behind the update is to give marketers access to more information they care about most by using that machine learning to surface data like which groups of customers are most likely to buy and which are most likely to churn, the very types of information marketing (and sales) teams need to try make proactive moves to keep customers from leaving or conversely turning those ready to buy into sales.

Google_Analytics_predictive_metric predict churn and most likely to convert to sales.

Image Credits: Google

If it works as described, it can give marketers a way to measure their performance with each customer or group of customers across their entire lifecycle, which is especially important during COVID when customer needs are constantly changing.

Of course, this being a Google product it’s designed to play nicely with Google Ads, YouTube and other tools like Gmail and Google Search, along with non-Google channels. As Srinivasan wrote:

The new approach also makes it possible to address longtime advertiser requests. Because the new Analytics can measure app and web interactions together, it can include conversions from YouTube engaged views that occur in-app and on the web in reports. Seeing conversions from YouTube video views alongside conversions from Google and non-Google paid channels, and organic channels like Google Search, social, and email, helps you understand the combined impact of all your marketing efforts.

The company is also trying to futureproof analytics with an eye toward stricter privacy laws like GDPR in Europe or CCPA in California by using modeling to fill in gaps in information when you can’t use cookies or other tracking software.

All of this is designed to help marketers, caught in trying times with a shifting regulatory landscape, to better understand customer needs and deliver them what they want when they want it — when they’re just trying to keep the customers satisfied.

These 3 factors are holding back podcast monetization

Podcast advertising growth is inhibited by three major factors:

  • Lack of macro distribution, consumption and audience data.
  • Current methods of conversion tracking.
  • Idea of a “playbook” for podcast performance marketing.

Because of these limiting factors, it’s currently more of an art than a science to piece disparate data from multiple sources, firms, agencies and advertisers, into a somewhat conclusive argument to brands as to why they should invest in podcast advertising.

1. Lack of macro distribution, consumption and audience data

There were several resources that released updates based on what they saw in terms of consumption when COVID-19 hit. Hosting platforms, publishers and third-party tracking platforms all put out their best guesses as to what was happening. Advertisers’ own podcast listening habits had been upended due to lockdowns; they wanted to know how broader changes in listening habits were affecting their campaigns. Were downloads going up, down or staying the same? What was happening with sports podcasts, without sports?


Read part 1 of this article, Podcast advertising has a business intelligence gap, on TechCrunch.


At Right Side Up, we receive and analyze all of the available research from major publishers (Stitcher, aCast), to major platforms (Megaphone) and third-party research firms (Podtrac, IAB, Edison Research). However, no single entity encompasses the entire space or provides the kind of interactive, off-the-shelf customizable SaaS product we’d prefer, and that digitally native marketers expect. Plus, there isn’t anything published in real-time; most sources publish once or twice annually.

So what did we do? We reached out to trusted publishers and partners to gather data around shifting consumption due to COVID-19 ourselves, and determined that, though there was a drop in downloads in the short term, it was neither as precipitous nor as enduring as some had feared. This was confirmed by some early reports available, but how were we to evidence our own piecewise sample with another? Moreover, how could you invest 6-7 figures of marketing dollars if you didn’t have the firsthand intelligence we gathered and our subject matter experts on deck to make constant adjustments to your approach?

We were able to piece together trends we’re seeing that point to increased download activity in recent months that surpass February/March heights. We’ve determined that the industry is back on track for growth with a less steep, but still growing, listenership trajectory. But even though more recent reports have been published, a longitudinal, objective resource has not yet emerged to show a majority of the industry’s journey through one of the most disruptive media environments in recent history.

There is a need for a new or existing entity to create cohesive data points; a third party that collects and reports listening across all major hosts and distribution points, or “podcatchers,” as they’re colloquially called. As a small example: Wouldn’t it be nice to objectively track seasonal listening of news/talk programming and schedule media planning and flighting around that? Or to know what the demographics of that audience look like compared to other verticals?

What percentage increase in efficiency and/or volume would you gain from your marketing efforts in the channel? Would that delta be profitable against paying a nominal or ongoing licensing or research fee for most brands?

These challenges aren’t just affecting advertisers. David Cohn, VP of Sales at Megaphone, agrees that “full transparency from the listening platforms would make our jobs easier, along with everyone else’s in the industry. We’d love to know how much of an episode is listened to, whether an ad is skipped, etc. Along the same lines, having a central source for [audience] measurement would be ideal — similar to what Nielsen has been for TV.” This would also enable us to understand cross-show ad frequency, another black box for advertisers and the industry at large.

Podcast advertising has a business intelligence gap

There are sizable, meaningful gaps in the knowledge collection and publication of podcast listening and engagement statistics. Coupled with still-developing advertising technology because of the distributed nature of the medium, this causes uncertainty in user consumption and ad exposure and impact. There is also a lot of misinformation and misconception about the challenges marketers face in these channels.

All of this compounds to delay ad revenue growth for creators, publishers and networks by inhibiting new and scaling advertising investment, resulting in lost opportunity among all parties invested in the channel. There’s a viable opportunity for a collective of industry professionals to collaborate on a solution for unified, free reporting, or a new business venture that collects and publishes more comprehensive data that ultimately promotes growth for podcast advertising.

Podcasts have always had challenges when it comes to the analytics behind distribution, consumption and conversion. For an industry projected to exceed $1 billion in ad spend in 2021, it’s impressive that it’s built on RSS: A stable, but decades-old technology that literally means really simple syndication. Native to the technology is a one-way data flow, which democratizes the medium from a publishing perspective and makes it easy for creators to share content, but difficult for advertisers trying to measure performance and figure out where to invest ad dollars. This is compounded by a fractured creator, server and distribution/endpoint environment unique to the medium.

Because podcasts lag other media channels in business intelligence, it’s still an underinvested channel relative to its ability to reach consumers and impact purchasing behavior.

For creators, podcasting has begun to normalize distribution analytics through a rising consolidation of hosts like Art19, Megaphone, Simplecast and influence from the IAB. For advertisers, though, consumption and conversion analytics still lag far behind. For the high-growth tech companies we work with, and as performance marketers ourselves, measuring the return on investment of our ad spend is paramount.

Because podcasts lag other media channels in business intelligence, it’s still an underinvested channel relative to its ability to reach consumers and impact purchasing behavior. This was evidenced when COVID-19 hit this year, as advertisers that were highly invested or highly interested in investing in podcast advertising asked a very basic question: “Is COVID-19, and its associated lifestyle shifts, affecting podcast listening? If so, how?”

The challenges of decentralized podcast ad data

We reached out to trusted partners to ask them for insights specific to their shows.

Nick Southwell-Keely, U.S. director of Sales & Brand Partnerships at Acast, said: “We’re seeing our highest listens ever even amid the pandemic. Across our portfolio, which includes more than 10,000 podcasts, our highest listening days in Acast history have occurred in [July].” Most partners provided similar anecdotes, but without centralized data, there was no one, singular firm to go to for an answer, nor one report to read that would cover 100% of the space. Almost more importantly, there is no third-party perspective to validate any of the anecdotal information shared with us.

Publishers, agencies and firms all scrambled to answer the question. Even still, months later, we don’t have a substantial and unifying update on exactly what, if anything, happened, or if it’s still happening, channel-wide. Rather, we’re still checking in across a wide swath of partners to identify and capitalize on microtrends. Contrast this to native digital channels like paid search and paid social, and connected, yet formerly “traditional” media (e.g., TV, CTV/OTT) that provide consolidated reports that marketers use to make decisions about their media investments.

The lasting murkiness surrounding podcast media behavior during COVID-19 is just one recent case study on the challenges of a decentralized (or nonexistent) universal research vendor/firm, and how it can affect advertisers’ bottom lines. A more common illustration of this would be an advertiser pulling out of ads, for fear of underdelivery on a flat rate unit, missing out on incremental growth because they were worried about not being able to get download reporting and getting what they paid for. It’s these kinds of basic shortcomings that the ad industry needs to account for before we can hit and exceed the ad revenue heights projected for podcasting.

Advertisers may pull out of campaigns for fear of under-delivery, missing out on incremental growth because they were worried about not getting what they paid for.

If there’s a silver lining to the uncertainty in podcast advertising metrics and intelligence, it’s that supersavvy growth marketers have embraced the nascent medium and allowed it to do what it does best: personalized endorsements that drive conversions. While increased data will increase demand and corresponding ad premiums, for now, podcast advertising “veterans” are enjoying the relatively low profile of the space.

As Ariana Martin, senior manager, Offline Growth Marketing at Babbel notes, “On the other hand, podcast marketing, through host read ads, has something personal to it, which might change over time and across different podcasts. Because of this personal element, I am not sure if podcast marketing can ever be transformed into a pure data game. Once you get past the understanding that there is limited data in podcasting, it is actually very freeing as long as you’re seeing a certain baseline of good results, [such as] sales attributed to podcast [advertising] via [survey based methodology], for example.”

So how do we grow from the industry feeling like a secret game-changing channel for a select few brands, to widespread adoption across categories and industries?

Below, we’ve laid out the challenges of nonuniversal data within the podcast space, and how that hurts advertisers, publishers, third-party research/tracking organizations, and broadly speaking, the podcast ecosystem. We’ve also outlined the steps we’re taking to make incremental solutions, and our vision for the industry moving forward.

Lingering misconceptions about podcast measurement

1. Download standardization

In search of a rationale to how such a buzzworthy growth channel lags behind more established media types’ advertising revenue, many articles will point to “listener” or “download” numbers not being normalized. As far as we can tell at Right Side Up, where we power most of the scaled programs run by direct advertisers, making us a top three DR buying force in the industry, the majority of publishers have adopted the IAB Podcast Measurement Technical Guidelines Version 2.0.

This widespread adoption solved the “apples to apples” problem as it pertained to different networks/shows valuing a variable, nonstandard “download” as an underlying component to their CPM calculations. Previous to this widespread adoption, it simply wasn’t known whether a “download” from publisher X was equal to a “download” from publisher Y, making it difficult to aim for a particular CPM as a forecasting tool for performance marketing success.

However, the IAB 2.0 guidelines don’t completely solve the unique-user identification problem, as Dave Zohrob, CEO of Chartable points out. “Having some sort of anonymized user identifier to better calculate audience size would be fantastic —  the IAB guidelines offer a good approximation given the data we have but [it] would be great to actually know how many listeners are behind each IP/user-agent combo.”

2. Proof of ad delivery

A second area of business intelligence gaps that many articles point to as a cause of inhibited growth is a lack of “proof of delivery.” Ad impressions are unverifiable, and the channel doesn’t have post logs, so for podcast advertisers the analogous evidence of spots running is access to “airchecks,” or audio clippings of the podcast ads themselves.

Legacy podcast advertisers remember when a full-time team of entry-level staffers would hassle networks via phone or email for airchecks, sometimes not receiving verification that the spot had run until a week or more after the fact. This delay in the ability to accurately report spend hampered fast-moving performance marketers and gave the illusion of podcasts being a slow, stiff, immovable media type.

Systematic aircheck collection has been a huge advent and allowed for an increase in confidence in the space — not only for spend verification, but also for creative compliance and optimization. Interestingly, this feature has come up almost as a byproduct of other development, as the companies who offer these services actually have different core business focuses: Magellan AI, our preferred partner, is primarily a competitive intelligence platform, but pivoted to also offer airchecking services after realizing what a pain point it was for advertisers; Veritone, an AI company that’s tied this service to its ad agency, Veritone One; and Podsights, a pixel-based attribution modeling solution.

3. Competitive intelligence

Last, competitive intelligence and media research continue to be a challenge. Magellan AI and Podsights offer a variety of fee and free tiers and methods of reporting to show a subset of the industry’s activity. You can search a show, advertiser or category, and get a less-than-whole, but still directionally useful, picture of relevant podcast advertising activity. While not perfect, there are sufficient resources to at least see the tip of the industry iceberg as a consideration point to your business decision to enter podcasts or not.

As Sean Creeley, founder of Podsights, aptly points out: “We give all Podsights research data, analysis, posts, etc. away for free because we want to help grow the space. If [a brand], as a DIY advertiser, desired to enter podcasting, it’s a downright daunting task. Research at least lets them understand what similar companies in their space are doing.”

There is also a nontech tool that publishers would find valuable. When we asked Shira Atkins, co-founder of Wonder Media Network, how she approaches research in the space, she had a not-at-all-surprising, but very refreshing response: “To be totally honest, the ‘research’ I do is texting and calling the 3-5 really smart sales people I know and love in the space. The folks who were doing radio sales when I was still in high school, and the podcast people who recognize the messiness of it all, but have been successful at scaling campaigns that work for both the publisher and the advertiser. I wish there was a true tracker of cross-industry inventory — how much is sold versus unsold. The way I track the space writ large is by listening to a sample set of shows from top publishers to get a sense for how they’re selling and what their ads are like.”

Even though podcast advertising is no longer limited by download standardization, spend verification and competitive research, there are still hurdles that the channel has not yet overcome.


The conclusion to this article, These 3 factors are holding back podcast monetization, is available exclusively to Extra Crunch subscribers.

Will automation eliminate data science positions?

“Will automation eliminate data science positions?”

This is a question I’m asked at almost every conference I attend, and it usually comes from someone from one of two groups with a vested interest in the answer: The first is current or aspiring practitioners who are wondering about their future employment prospects. The second consists of executives and managers who are just starting on their data science journey.

They have often just heard that Target can determine whether a customer is pregnant from her shopping patterns and are hoping for similarly powerful tools for their data. And they have heard the latest automated-AI vendor pitch that promises to deliver what Target did (and more!) without data scientists. We argue that automation and better data science tooling will not eliminate or even reduce data science positions (including use cases like the Target story). It creates more of them!

Here’s why.

Understanding the business problem is the biggest challenge

The most important question in data science is not which machine learning algorithm to choose or even how to clean your data. It is the questions you need to ask before even one line of code is written: What data do you choose and what questions do you choose to ask of that data?

What is missing (or wishfully assumed) from the popular imagination is the ingenuity, creativity and business understanding that goes into those tasks. Why do we care if our customers are pregnant? Target’s data scientists had built upon substantial earlier work to understand why this was a lucrative customer demographic primed to switch retailers. Which datasets are available and how can we pose scientifically testable questions of those datasets?

Target’s data science team happened to have baby registry data tied to purchasing history and knew how to tie that to customer spending. How do we measure success? Formulating nontechnical requirements into technical questions that can be answered with data is amongst the most challenging data science tasks — and probably the hardest to do well. Without experienced humans to formulate these questions, we would not be able to even start on the journey of data science.

Making your assumptions

After formulating a data science question, data scientists need to outline their assumptions. This often manifests itself in the form of data munging, data cleaning and feature engineering. Real-world data are notoriously dirty and many assumptions have to be made to bridge the gap between the data we have and the business or policy questions we are seeking to address. These assumptions are also highly dependent on real-world knowledge and business context.

In the Target example, data scientists had to make assumptions about proxy variables for pregnancy, realistic time frame of their analyses and appropriate control groups for accurate comparison. They almost certainly had to make realistic assumptions that allowed them to throw out extraneous data and correctly normalize features. All of this work depends critically on human judgment. Removing the human from the loop can be dangerous as we have seen with the recent spate of bias-in-machine-learning incidents. It is perhaps no coincidence that many of them revolve around deep learning algorithms that make some of the strongest claims to do away with feature engineering.

So while parts of core machine learning are automated (in fact, we even teach some of the ways to automate those workflows), the data munging, data cleaning and feature engineering (which comprises 90% of the real work in data science) cannot be safely automated away.

A historical analogy

There is a clear precedent in history to suggest data science will not be automated away. There is another field where highly trained humans are crafting code to make computers perform amazing feats. These humans are paid a significant premium over others who are not trained in this field and (perhaps not surprisingly) there are education programs specializing in training this skill. The resulting economic pressure to automate this field is equally, if not more, intense. This field is software engineering.

Indeed, as software engineering has become easier, the demand for programmers has only grown. This paradox — that automation increases productivity, driving down prices and ultimately driving up demand is not new — we’ve seen it again and again in fields ranging from software engineering to financial analysis to accounting. Data science is no exception and automation will likely drive up demand for this skillset, not down.

As the pandemic creates supply chain chaos, Craft raises $10M to apply some intelligence

During the COVID-19 pandemic supply chains have suddenly become hot. Who knew that would ever happen? The race to secure PPE, ventilators, minor things like food, was and still is, an enormous issue. But perhaps, predictably, the world of ‘supply chain software’ could use some updating. Most of the platforms are deployed ‘empty’ and require the client to populate them with their own data or ‘bring their own data’. The UIs can be outdated and still have to be juggled with manual and offline workflows. So startups working in this space are now attracting some timely attention.

Thus, Craft, the enterprise intelligence company, today announces that it has closed a $10 million Series A financing to build what it characterizes as a ‘supply chain intelligence platform’. With the new funding, Craft will expand its offices in San Francisco, London, and Minsk, and grow remote teams across engineering, sales, marketing and operations in North America and Europe.

It competes with some large incumbents such as Dun & Bradstreet, Bureau van Dijk, Thomson Reuters . These are traditional data providers focused primarily on providing financial data about public companies, rather than real-time data from data sources such as operating metrics, human capital, and risk metrics.

The idea is to allow companies to monitor and optimize their supply chain and enterprise systems. The financing was led by High Alpha Capital, alongside Greycroft. Craft also has some high-flying Angel investors including Sam Palmisano, chairman of the Center for Global Enterprise and former CEO and chairman of IBM; Jim Moffatt, former CEO of Deloitte Consulting; Frederic Kerrest, executive vice-chairman, COO and co-founder of Okta; and Uncork Capital which previously led Craft’s Seed financing. High Alpha Partner, Kristian Andersen, is joining Craft’s Board of Directors.

The problem Craft is attacking is a lack of visibility into complex global supply chains. For obvious reasons, COVID-19 disrupted global supply chains which tended to reveal a lot of risks, structural weaknesses across industries and a lack of intelligence about how it’s all holding together. Craft’s solution is a proprietary data platform, API, and portal that integrates into existing enterprise workflows.

While many business intelligence products require clients to bring their own data, Craft’s data platform comes pre-deployed with data from thousands of financial and alternative sources, such as 300+ data points that are refreshed using both Machine Learning and human validation. It’s open-to-the-web company profiles appear in 50 million search results, for instance.

Ilya Levtov, co-founder and CEO of Craft said in a statement: “Today, we are focused on providing powerful tracking and visibility to enterprise supply chains, while our ultimate vision is to build the intelligence layer of the enterprise technology stack.”

Kristian Andersen, partner with High Alpha commented: “We have a deep conviction that supply chain management remains an underinvested and under-innovated category in enterprise software.”

In the first half of 2020, Craft claims its revenues have grown nearly threefold, with Fortune 100 companies, government and military agencies, and SMEs among its clients.