This YC-backed startup preps Chinese students for US data jobs

In recent years, data analysts have gone from optional to a career that holds great promise, but demand for quantitative skills applied in business decisions has raced ahead of supply as college curriculum often lags behind the fast-changing workplace.

CareerTu, a New York-based startup launched by a former marketing manager at Amazon, aims to close that talent gap. Think of it as Codecademy for digital marketing, data analytics, product design and a whole lot of other jobs that ask one to spot patterns from a sea of data that can potentially boost business efficiency. The six-year-old profitable business runs a flourishing community of 160,000 users and 500 recruiting patners including Amazon, Google and Alibaba, an achievement that has secured the startup a spot at Y Combinator’s latest batch plus a $150,000 check from the Mountain View-based accelerator.

In a way, CareerTu is helping fledgling tech startups on a tight budget train ready-to-use data experts. “American companies have a huge demand for digital marketing and data talents these days … but not all of them want to or can spend money on training, and that’s where we can come in,” said Xu, who made her way into Amazon after burying herself in online tutorials about digital marketing.

The gig was well paid, and Xu felt the urge to share her experience with people like her — Chinese workers and students seeking data jobs in the U.S. She took up blogging, and eventually grew it into an online school. CareerTu offers many of its classes for free while sets aside a handful of premium content for a fee. 6,000 of its users are actively paying, which translates to some $500,000 in revenue last year. The virtual academy continues to blossom as many students return to become mentors, helping their Chinese peers to chase the American dream.

CareerTu

Y Combinator founder Paul Graham (second left) with CareerTu founder Zhang Ruiwan (second right) and her team members / Photo: CareerTu

Securing a job in the U.S. could be a daunting task for international students, who must convince employers to invest the time and money in getting them a work visa. But when it comes to courting scare data talents, the visa trap becomes less relevant.

“Companies could have hired locals to do data work, but it’s very difficult to find the right candidate,” suggested Xu. LinkedIn estimated that in 2018 the U.S. had a shortage of more than 150,000 people with “data science skills,” which find application not just in tech but also traditional sectors like finance and logistics.

“Nationalities don’t matter in this case,” Xu continued. “Employers will happily apply a work visa or even a green card for the right candidate who can help them save money on marketing campaigns. And many Chinese people happen to have a really strong background in data and mathematics.”

A Chinese business in the US

Though most of CareerTu’s users live in the U.S., the business is largely built upon WeChat, Tencent’s messaging app ubiquitous among Chinese users. That CareerTu sticks to WeChat for content marketing, user acquisition and tutoring is telling of the super app’s user stickiness and how overseas Chinese are helping to extend its global footprint.

And it makes increasing sense to keep CareerTu within the WeChat ecosystem after Xu noticed a surge in inquiries coming from her homeland. In 2018, only 5 percent of CareerTu’s users were living in China, many of whom were export sellers on Amazon. By early 2019, the ratio has shot up to 12 percent.

Xu believes there are two forces at work. For one, Chinese exporters are leaving Amazon to set up independent ecommerce sites, efforts that are in part enabled by Shopify’s entry into China in 2018. The alternative path provides merchants more control over branding, margins and access to customer insights. Breaking up with the ecommerce titan, on the other hand, requires Chinese sellers to get savvier at reaching foreign shoppers, expertise that CareerTu prides itself on.

careertu

CareerTu offers online courses via WeChat / Photo: CareerTu

Next door, large Chinese tech firms are increasingly turning abroad to fuel growth. Bytedance is possibly the most aggressive adventurer among its peers in recent years, buying up media startups around the world including Musical.ly, which would later merge with TikTok. Indeed, some of CareerTu’s recent grads have gone on to work at the popular video app. Rising interest from China eventually paved Zhang’s way home as she recently set up her first Chinese office in her hometown Chengdu, the laid-back city known for its panda parks and witnessing a tech boom.

Just as foreign companies need crash courses on WeChat before entering China, Chinese firms going global must familiarize themselves with the marketing mechanisms of Facebook and Google despite China’s ban on the social network and search engine.

When American companies growth hack, they make long-term plans that involve “model building, A/B testing, and making discoveries from big data,” observed Xu. By comparison, Chinese companies fighting in a more competitive landscape are more agile and opportunist as they don’t have the time to ponder or test out the different variants in a campaign.

“Going abroad is a great thing for Chinese companies because it sets them against their American counterparts,” said Xu. “We are teaching Chinese the western way, but we are also learning the Chinese way of marketing from players like Bytedance. I’m excited to see in a few years whether any of these Chinese companies abroad will become a local favorite.”

eToro bringing crypto trading and wallet to the US

eToro, the social investing and trading platform, announced that it will finally be launching its platform in the US. The platform, which already operates in more than 140 countries, will be available in 30 states and two territories with plans to expand elsewhere in the US after receiving the necessary regulatory sign-offs.

The US platform will only support trading for crypto assets at launch, but eToro plans to add additional asset classes within the next 12 months. In eToro’s existing markets, the company’s ten million-plus users are able to trade and hold over 1,500 different asset classes and markets, including stocks, bonds, cryptocurrencies, fiat currencies, commodities and more.

Though eToro even supports more advanced trading strategies – including short-selling and the use of leverage – the platform’s transparency and community engagement features act as great tools for beginners to learn the capital markets and learn how to trade.

eToro is equal parts trading platform, social network and educational resource. Anyone who signs up for eToro can see, comment and copy the trading activity of everyone else on the network, as well as their realized returns and losses to date (though only on a percentage basis to protect sensitive financial information). While learning from the strategies of their peers, users can opt to invest with virtual currency to practice and effectively train before actually risking their own money.

Alternatively, based on a trader’s track record, other users can choose to mimic their portfolio through eToro’s “CopyTrader” feature, which not only proportionally allocates funds to match the trader’s portfolio but can also automatically make any trade the copied investor makes. On top of that, members are also able to share, comment on, engage with or follow specific users, assets, or markets – allowing them to participate in the latest debate and news regarding their particular area of interest.

Despite being limited to crypto at launch, almost all the same features available in eToro’s existing geographical markets will be available in the US. And alongside its trading platform, the company is also launching its digital multi-signature eToro wallet where users can store, send and receive multiple coins across a multitude of cryptocurrencies.

Using their eToro accounts, US users can now transfer cryptocurrencies to and from their trading account and can easily convert between them as well. The wallet initially will support Bitcoin, Ethereum, Litecoin, Bitcoin Cash, Ripple and Stellar for US users but the company plans to make additional currencies available in the near future.

eToro users can make transactions, share trading activities, and portfolio performance with the community, allowing users to discuss ideas that are executed using real dollars.

The expansion plan, however, doesn’t come without risk. eToro is entering a competitive marketplace – alongside other popular trading platforms like Coinbase and Robinhood – and is launching its crypto-only version in the midst of “crypto winter”, where widespread weakness has plagued the sector.

Part of the strategy is attributable to the fact that crypto is a lighter lift from a licensing perspective relative to other asset classes in the strict and highly fragmented US regulatory environment. But eToro’s launch strategy is also firmly rooted in the company’s belief in the immense market opportunity that exists with the tokenization of assets.

“We think [the tokenization of assets] is a bigger opportunity than the internet and we have to be in the US when it happens given its the financial hub in the world,” eToro founder and CEO Yoni Assia said in a conversation with TechCrunch.

eToro is taking a long-term view with its strategy and isn’t thrown by the current crypto weakness. Assia equated the market softness to the dotcom bubble, where despite the crash, the internet still permeated and disrupted the economy in the long-run. And just like with the internet, Assia and eToro believe there will be more than enough room for multiple winners in the broader crypto ecosystem.

The company was the first platform in its markets to support Ethereum and Ripple and believes that as similar currencies and the next generation of investors mature, eToro will be there to support them wherever they are in whatever way they need.

“When I founded eToro, I envisioned a community where people could trade, invest and share their knowledge in a simple and transparent way,” said Assia. “eToro also acts as a bridge between the old world of investing and a blockchain-powered future, helping our users navigate and benefit from the transition to crypto assets for wealth building.”

LinkedIn forced to ‘pause’ mentioned in the news feature in Europe after complaints about ID mix-ups

LinkedIn has been forced to ‘pause’ a feature in Europe in which the platform emails members’ connections when they’ve been ‘mentioned in the news’.

The regulatory action follows a number of data protection complaints after LinkedIn’s algorithms incorrect matched members to news articles — triggering a review of the feature and subsequent suspension order.

The feature appears as a case study in the ‘Technology Multinationals Supervision’ section of an annual report published today by the Irish Data Protection Commission (DPC). Although the report does not explicitly name LinkedIn — but we’ve confirmed it is the named professional social network.

The data watchdog’s report cites “two complaints about a feature on a professional networking platform” after LinkedIn incorrectly associated the members with media articles that were not actually about them.

“In one of the complaints, a media article that set out details of the private life and unsuccessful career of a person of the same name as the complainant was circulated to the complainant’s connections and followers by the data controller,” the DPC writes, noting the complainant initially complained to the company itself but did not receive a satisfactory response — hence taking up the matter with the regulator.

The complainant stated that the article had been detrimental to their professional standing and had resulted in the loss of contracts for their business,” it adds.

“The second complaint involved the circulation of an article that the complainant believed could be detrimental to future career prospects, which the data controller had not vetted correctly.”

LinkedIn appears to have been matching members to news articles by simple name matching — with obvious potential for identity mix-ups between people with shared names.

“It was clear from the complaints that matching by name only was insufficient, giving rise to data protection concerns, primarily the lawfulness, fairness and accuracy of the personal data processing utilised by the ‘Mentions in the news’ feature,” the DPC writes.

“As a result of these complaints and the intervention of the DPC, the data controller undertook a review of the feature. The result of this review was to suspend the feature for EU-based members, pending improvements to safeguard its members’ data.”

We reached out to LinkedIn with questions and it pointed us to this blog post where it confirms: “We are pausing our Mentioned in the News feature for our EU members while we reevaluate its effectiveness.”

LinkedIn adds that it is reviewing the accuracy of the feature, writing:

As referenced in the Irish Data Protection Commission’s report, we received useful feedback from our members about the feature and as a result are evaluating the accuracy and functionality of Mentioned in the News for all members.

The company’s blog post also points users to a page where they can find out more about the ‘mentioned in the news’ feature and get information on how to manage their LinkedIn email notification settings.

The Irish DPC’s action is not the first privacy strike against LinkedIn in Europe.

Late last year, in its early annual report, on the pre-GDPR portion of 2018, the watchdog revealed it had investigated complaints about LinkedIn related to it targeting non-users with adverts for its service.

The DPC found the company had obtained emails for 18 million people for whom it did not have consent to process their data. In that case LinkedIn agreed to cease processing the data entirely.

That complaint also led the DPC to audit LinkedIn. It then found a further privacy problem, discovering the company had been using its social graph algorithms to try to build suggested networks of compatible professional connections for non-members.

The regulator ordered LinkedIn to cease this “pre-compute processing” of non-members’ data and delete all personal data associated with it prior to GDPR coming into force.

LinkedIn said it had “voluntarily changed our practices as a result”.

German antitrust office limits Facebook’s data-gathering

A lengthy antitrust probe into how Facebook gathers data on users has resulted in Germany’s competition watchdog banning the social network giant from combining data on users across its own suite of social platforms without their consent.

The investigation of Facebook data-gathering practices began in March 2016.

The decision by Germany’s Federal Cartel Office, announced today, also prohibits Facebook from gathering data on users from third party websites — such as via tracking pixels and social plug-ins — without their consent.

Although the decision does not yet have legal force and Facebook has said it’s appealing.

In both cases — i.e. Facebook collecting and linking user data from its own suite of services; and from third party websites — the Bundeskartellamt says consent must be voluntary, so cannot be made a precondition of using Facebook’s service.

The company must therefore “adapt its terms of service and data processing accordingly”, it warns.

“Facebook’s terms of service and the manner and extent to which it collects and uses data are in violation of the European data protection rules to the detriment of users. The Bundeskartellamt closely cooperated with leading data protection authorities in clarifying the data protection issues involved,” it writes, couching Facebook’s conduct as “exploitative abuse”.

“Dominant companies may not use exploitative practices to the detriment of the opposite side of the market, i.e. in this case the consumers who use Facebook. This applies above all if the exploitative practice also impedes competitors that are not able to amass such a treasure trove of data,” it continues.

“This approach based on competition law is not a new one, but corresponds to the case-law of the Federal Court of Justice under which not only excessive prices, but also inappropriate contractual terms and conditions constitute exploitative abuse (so-called exploitative business terms).”

Commenting further in a statement, Andreas Mundt, president of the Bundeskartellamt, added: “In future, Facebook will no longer be allowed to force its users to agree to the practically unrestricted collection and assigning of non-Facebook data to their Facebook user accounts.

“The combination of data sources substantially contributed to the fact that Facebook was able to build a unique database for each individual user and thus to gain market power. In future, consumers can prevent Facebook from unrestrictedly collecting and using their data. The previous practice of combining all data in a Facebook user account, practically without any restriction, will now be subject to the voluntary consent given by the users.

“Voluntary consent means that the use of Facebook’s services must not be subject to the users’ consent to their data being collected and combined in this way. If users do not consent, Facebook may not exclude them from its services and must refrain from collecting and merging data from different sources.”

“With regard to Facebook’s future data processing policy, we are carrying out what can be seen as an internal divestiture of Facebook’s data,” Mundt added. 

Facebook has responded to the Bundeskartellamt’s decision with a blog post setting out why it disagrees. The company did not respond to specific questions we put to it.

One key consideration is that Facebook also tracks non-users via third party websites. Aka, the controversial issue of ‘shadow profiles’ — which both US and EU politicians questioned founder Mark Zuckerberg about last year.

Which raises the question of how it could comply with the decision on that front, if its appeal fails, given it has no obvious conduit for seeking consent from non-users to gather their data. (Facebook’s tracking of non-users has already previously been judged illegal elsewhere in Europe.)

The German watchdog says that if Facebook intends to continue collecting data from outside its own social network to combine with users’ accounts without consent it “must be substantially restricted”, suggesting a number of different criteria are feasible — such as restrictions including on the amount of data; purpose of use; type of data processing; additional control options for users; anonymization; processing only upon instruction by third party providers; and limitations on data storage periods.

Should the decision come to be legally enforced, the Bundeskartellamt says Facebook will be obliged to develop proposals for possible solutions and submit them to the authority which would then examine whether or not they fulfil its requirements.

While there’s lots to concern Facebook in this decision, it isn’t all bad for the company — or, rather, it could have been worse.

The authority makes a point of saying the social network can continue to make the use of each of its messaging platforms subject to the processing of data generated by their use, writing: “It must be generally acknowledged that the provision of a social network aiming at offering an efficient, data-based business model funded by advertising requires the processing of personal data. This is what the user expects.”

Although it also does not close the door on further scrutiny of that dynamic, either under data protection law (as indeed, there is a current challenge to so called ‘forced consent‘ under Europe’s GDPR); or indeed under competition law.

“The issue of whether these terms can still result in a violation of data protection rules and how this would have to be assessed under competition law has been left open,” it emphasizes.

It also notes that it did not investigate how Facebook subsidiaries WhatsApp and Instagram collect and use user data — leaving the door open for additional investigations of those services.

On the wider EU competition law front, in recent years the European Commission’s competition chief has voiced concerns about data monopolies — going so far as to suggest, in an interview with the BBC last December, that restricting access to data might be a more appropriate solution to addressing monopolistic platform power vs breaking companies up.

In its blog post rejecting the German Federal Cartel Office’s decision, Facebook’s Yvonne Cunnane, head of data protection for its international business, Facebook Ireland, and Nikhil Shanbhag, director and associate general counsel, make three points to counter the decision, writing that: “The Bundeskartellamt underestimates the fierce competition we face in Germany, misinterprets our compliance with GDPR and undermines the mechanisms European law provides for ensuring consistent data protection standards across the EU.”

On the competition point, Facebook claims in the blog post that “popularity is not dominance” — suggesting the Bundeskartellamt found 40 per cent of social media users in Germany don’t use Facebook. (Not that that would stop Facebook from tracking those non-users around the mainstream Internet, of course.)

Although, in its announcement of the decision today, the Federal Cartel Office emphasizes that it found Facebook to have a dominant position in the Germany market — with (as of December 2018) 23M daily active users and 32M monthly active users, which it said constitutes a market share of more than 95 per cent (daily active users) and more than 80 per cent (monthly active users).

It also says it views social services such as Snapchat, YouTube and Twitter, and professional networks like LinkedIn and Xing, as only offering “parts of the services of a social network” — saying it therefore excluded them from its consideration of the market.

Though it adds that “even if these services were included in the relevant market, the Facebook group with its subsidiaries Instagram and WhatsApp would still achieve very high market shares that would very likely be indicative of a monopolisation process”.

The mainstay of Facebook’s argument against the Bundeskartellamt decision appears to fix on the GDPR — with the company both seeking to claim it’s in compliance with the pan-EU data-protection framework (although its business faces multiple complaints under GDPR), while simultaneously arguing that the privacy regulation supersedes regional competition authorities.

So, as ever, Facebook is underlining that its regulator of choice is the Irish Data Protection Commission.

“The GDPR specifically empowers data protection regulators – not competition authorities – to determine whether companies are living up to their responsibilities. And data protection regulators certainly have the expertise to make those conclusions,” Facebook writes.

“The GDPR also harmonizes data protection laws across Europe, so everyone lives by the same rules of the road and regulators can consistently apply the law from country to country. In our case, that’s the Irish Data Protection Commission. The Bundeskartellamt’s order threatens to undermine this, providing different rights to people based on the size of the companies they do business with.”

The final plank of Facebook’s rebuttal focuses on pushing the notion that pooling data across services enhances the consumer experience and increases “safety and security” — the latter point being the same argument Zuckerberg used last year to defend ‘shadow profiles’ (not that he called them that) — with the company claiming now that it needs to pool user data across services to identify abusive behavior online; and disable accounts link to terrorism; child exploitation; and election interference.

So the company is essentially seeking to leverage (you could say ‘legally weaponize’) a smorgasbord of antisocial problems many of which have scaled to become major societal issues in recent years, at least in part as a consequence of the size and scale of Facebook’s social empire, as arguments for defending the size and operational sprawl of its business. Go figure.

Fabula AI is using social spread to spot ‘fake news’

UK startup Fabula AI reckons it’s devised a way for artificial intelligence to help user generated content platforms get on top of the disinformation crisis that keeps rocking the world of social media with antisocial scandals.

Even Facebook’s Mark Zuckerberg has sounded a cautious note about AI technology’s capability to meet the complex, contextual, messy and inherently human challenge of correctly understanding every missive a social media user might send, well-intentioned or its nasty flip-side.

“It will take many years to fully develop these systems,” the Facebook founder wrote two years ago, in an open letter discussing the scale of the challenge of moderating content on platforms thick with billions of users. “This is technically difficult as it requires building AI that can read and understand news.”

But what if AI doesn’t need to read and understand news in order to detect whether it’s true or false?

Step forward Fabula, which has patented what it dubs a “new class” of machine learning algorithms to detect “fake news” — in the emergent field of “Geometric Deep Learning”; where the datasets to be studied are so large and complex that traditional machine learning techniques struggle to find purchase on this ‘non-Euclidean’ space.

The startup says its deep learning algorithms are, by contrast, capable of learning patterns on complex, distributed data sets like social networks. So it’s billing its technology as a breakthrough. (Its written a paper on the approach which can be downloaded here.)

It is, rather unfortunately, using the populist and now frowned upon badge “fake news” in its PR. But it says it’s intending this fuzzy umbrella to refer to both disinformation and misinformation. Which means maliciously minded and unintentional fakes. Or, to put it another way, a photoshopped fake photo or a genuine image spread in the wrong context.

The approach it’s taking to detecting disinformation relies not on algorithms parsing news content to try to identify malicious nonsense but instead looks at how such stuff spreads on social networks — and also therefore who is spreading it.

There are characteristic patterns to how ‘fake news’ spreads vs the genuine article, says Fabula co-founder and chief scientist, Michael Bronstein.

“We look at the way that the news spreads on the social network. And there is — I would say — a mounting amount of evidence that shows that fake news and real news spread differently,” he tells TechCrunch, pointing to a recent major study by MIT academics which found ‘fake news’ spreads differently vs bona fide content on Twitter.

“The essence of geometric deep learning is it can work with network-structured data. So here we can incorporate heterogenous data such as user characteristics; the social network interactions between users; the spread of the news itself; so many features that otherwise would be impossible to deal with under machine learning techniques,” he continues.

Bronstein, who is also a professor at Imperial College London, with a chair in machine learning and pattern recognition, likens the phenomenon Fabula’s machine learning classifier has learnt to spot to the way infectious disease spreads through a population.

“This is of course a very simplified model of how a disease spreads on the network. In this case network models relations or interactions between people. So in a sense you can think of news in this way,” he suggests. “There is evidence of polarization, there is evidence of confirmation bias. So, basically, there are what is called echo chambers that are formed in a social network that favor these behaviours.”

“We didn’t really go into — let’s say — the sociological or the psychological factors that probably explain why this happens. But there is some research that shows that fake news is akin to epidemics.”

The tl;dr of the MIT study, which examined a decade’s worth of tweets, was that not only does the truth spread slower but also that human beings themselves are implicated in accelerating disinformation. (So, yes, actual human beings are the problem.) Ergo, it’s not all bots doing all the heavy lifting of amplifying junk online.

The silver lining of what appears to be an unfortunate quirk of human nature is that a penchant for spreading nonsense may ultimately help give the stuff away — making a scalable AI-based tool for detecting ‘BS’ potentially not such a crazy pipe-dream.

Although, to be clear, Fabula’s AI remains in development at this stage, having been tested internally on Twitter data sub-sets at this stage. And the claims it’s making for its prototype model remain to be commercially tested with customers in the wild using the tech across different social platforms.

It’s hoping to get there this year, though, and intends to offer an API for platforms and publishers towards the end of this year. The AI classifier is intended to run in near real-time on a social network or other content platform, identifying BS.

Fabula envisages its own role, as the company behind the tech, as that of an open, decentralised “truth-risk scoring platform” — akin to a credit referencing agency just related to content, not cash.

Scoring comes into it because the AI generates a score for classifying content based on how confident it is it’s looking at a piece of fake vs true news.

A visualisation of a fake vs real news distribution pattern; users who predominantly share fake news are coloured red and users who don’t share fake news at all are coloured blue — which Fabula says shows the clear separation into distinct groups, and “the immediately recognisable difference in spread pattern of dissemination”.

In its own tests Fabula says its algorithms were able to identify 93 percent of “fake news” within hours of dissemination — which Bronstein claims is “significantly higher” than any other published method for detecting ‘fake news’. (Their accuracy figure uses a standard aggregate measurement of machine learning classification model performance, called ROC AUC.)

The dataset the team used to train their model is a subset of Twitter’s network — comprised of around 250,000 users and containing around 2.5 million “edges” (aka social connections).

For their training dataset Fabula relied on true/fake labels attached to news stories by third party fact checking NGOs, including Snopes and PolitiFact. And, overall, pulling together the dataset was a process of “many months”, according to Bronstein, He also says that around a thousand different stories were used to train the model, adding that the team is confident the approach works on small social networks, as well as Facebook-sized mega-nets.

Asked whether he’s sure the model hasn’t been trained to identified patterns caused by bot-based junk news spreaders, he says the training dataset included some registered (and thus verified ‘true’) users.

“There is multiple research that shows that bots didn’t play a significant amount [of a role in spreading fake news] because the amount of it was just a few percent. And bots can be quite easily detected,” he also suggests, adding: “Usually it’s based on some connectivity analysis or content analysis. With our methods we can also detect bots easily.”

To further check the model, the team tested its performance over time by training it on historical data and then using a different split of test data.

“While we see some drop in performance it is not dramatic. So the model ages well, basically. Up to something like a year the model can still be applied without any re-training,” he notes, while also saying that, when applied in practice, the model would be continually updated as it keeps digesting (ingesting?) new stories and social media content.

Somewhat terrifyingly, the model could also be used to predict virality, according to Bronstein — raising the dystopian prospect of the API being used for the opposite purpose to that which it’s intended: i.e. maliciously, by fake news purveyors, to further amp up their (anti)social spread.

“Potentially putting it into evil hands it might do harm,” Bronstein concedes. Though he takes a philosophical view on the hyper-powerful double-edged sword of AI technology, arguing such technologies will create an imperative for a rethinking of the news ecosystem by all stakeholders, as well as encouraging emphasis on user education and teaching critical thinking.

Let’s certainly hope so. And, on the educational front, Fabula is hoping its technology can play an important role — by spotlighting network-based cause and effect.

“People now like or retweet or basically spread information without thinking too much or the potential harm or damage they’re doing to everyone,” says Bronstein, pointing again to the infectious diseases analogy. “It’s like not vaccinating yourself or your children. If you think a little bit about what you’re spreading on a social network you might prevent an epidemic.”

So, tl;dr, think before you RT.

Returning to the accuracy rate of Fabula’s model, while ~93 per cent might sound pretty impressive, if it were applied to content on a massive social network like Facebook — which has some 2.3BN+ users, uploading what could be trillions of pieces of content daily — even a seven percent failure rate would still make for an awful lot of fakes slipping undetected through the AI’s net.

But Bronstein says the technology does not have to be used as a standalone moderation system. Rather he suggests it could be used in conjunction with other approaches such as content analysis, and thus function as another string on a wider ‘BS detector’s bow.

It could also, he suggests, further aid human content reviewers — to point them to potentially problematic content more quickly.

Depending on how the technology gets used he says it could do away with the need for independent third party fact-checking organizations altogether because the deep learning system can be adapted to different use cases.

Example use-cases he mentions include an entirely automated filter (i.e. with no human reviewer in the loop); or to power a content credibility ranking system that can down-weight dubious stories or even block them entirely; or for intermediate content screening to flag potential fake news for human attention.

Each of those scenarios would likely entail a different truth-risk confidence score. Though most — if not all — would still require some human back-up. If only to manage overarching ethical and legal considerations related to largely automated decisions. (Europe’s GDPR framework has some requirements on that front, for example.)

Facebook’s grave failures around moderating hate speech in Myanmar — which led to its own platform becoming a megaphone for terrible ethnical violence — were very clearly exacerbated by the fact it did not have enough reviewers who were able to understand (the many) local languages and dialects spoken in the country.

So if Fabula’s language-agnostic propagation and user focused approach proves to be as culturally universal as its makers hope, it might be able to raise flags faster than human brains which lack the necessary language skills and local knowledge to intelligently parse context.

“Of course we can incorporate content features but we don’t have to — we don’t want to,” says Bronstein. “The method can be made language independent. So it doesn’t matter whether the news are written in French, in English, in Italian. It is based on the way the news propagates on the network.”

Although he also concedes: “We have not done any geographic, localized studies.”

“Most of the news that we take are from PolitiFact so they somehow regard mainly the American political life but the Twitter users are global. So not all of them, for example, tweet in English. So we don’t yet take into account tweet content itself or their comments in the tweet — we are looking at the propagation features and the user features,” he continues.

“These will be obviously next steps but we hypothesis that it’s less language dependent. It might be somehow geographically varied. But these will be already second order details that might make the model more accurate. But, overall, currently we are not using any location-specific or geographic targeting for the model.

“But it will be an interesting thing to explore. So this is one of the things we’ll be looking into in the future.”

Fabula’s approach being tied to the spread (and the spreaders) of fake news certainly means there’s a raft of associated ethical considerations that any platform making use of its technology would need to be hyper sensitive to.

For instance, if platforms could suddenly identify and label a sub-set of users as ‘junk spreaders’ the next obvious question is how will they treat such people?

Would they penalize them with limits — or even a total block — on their power to socially share on the platform? And would that be ethical or fair given that not every sharer of fake news is maliciously intending to spread lies?

What if it turns out there’s a link between — let’s say — a lack of education and propensity to spread disinformation? As there can be a link between poverty and education… What then? Aren’t your savvy algorithmic content downweights risking exacerbating existing unfair societal divisions?

Bronstein agrees there are major ethical questions ahead when it comes to how a ‘fake news’ classifier gets used.

“Imagine that we find a strong correlation between the political affiliation of a user and this ‘credibility’ score. So for example we can tell with hyper-ability that if someone is a Trump supporter then he or she will be mainly spreading fake news. Of course such an algorithm would provide great accuracy but at least ethically it might be wrong,” he says when we ask about ethics.

He confirms Fabula is not using any kind of political affiliation information in its model at this point — but it’s all too easy to imagine this sort of classifier being used to surface (and even exploit) such links.

“What is very important in these problems is not only to be right — so it’s great of course that we’re able to quantify fake news with this accuracy of ~90 percent — but it must also be for the right reasons,” he adds.

The London-based startup was founded in April last year, though the academic research underpinning the algorithms has been in train for the past four years, according to Bronstein.

The patent for their method was filed in early 2016 and granted last July.

They’ve been funded by $500,000 in angel funding and about another $500,000 in total of European Research Council grants plus academic grants from tech giants Amazon, Google and Facebook, awarded via open research competition awards.

(Bronstein confirms the three companies have no active involvement in the business. Though doubtless Fabula is hoping to turn them into customers for its API down the line. But he says he can’t discuss any potential discussions it might be having with the platforms about using its tech.)

Focusing on spotting patterns in how content spreads as a detection mechanism does have one major and obvious drawback — in that it only works after the fact of (some) fake content spread. So this approach could never entirely stop disinformation in its tracks.

Though Fabula claims detection is possible within a relatively short time frame — of between two and 20 hours after content has been seeded onto a network.

“What we show is that this spread can be very short,” he says. “We looked at up to 24 hours and we’ve seen that just in a few hours… we can already make an accurate prediction. Basically it increases and slowly saturates. Let’s say after four or five hours we’re already about 90 per cent.”

“We never worked with anything that was lower than hours but we could look,” he continues. “It really depends on the news. Some news does not spread that fast. Even the most groundbreaking news do not spread extremely fast. If you look at the percentage of the spread of the news in the first hours you get maybe just a small fraction. The spreading is usually triggered by some important nodes in the social network. Users with many followers, tweeting or retweeting. So there are some key bottlenecks in the network that make something viral or not.”

A network-based approach to content moderation could also serve to further enhance the power and dominance of already hugely powerful content platforms — by making the networks themselves core to social media regulation, i.e. if pattern-spotting algorithms rely on key network components (such as graph structure) to function.

So you can certainly see why — even above a pressing business need — tech giants are at least interested in backing the academic research. Especially with politicians increasingly calling for online content platforms to be regulated like publishers.

At the same time, there are — what look like — some big potential positives to analyzing spread, rather than content, for content moderation purposes.

As noted above, the approach doesn’t require training the algorithms on different languages and (seemingly) cultural contexts — setting it apart from content-based disinformation detection systems. So if it proves as robust as claimed it should be more scalable.

Though, as Bronstein notes, the team have mostly used U.S. political news for training their initial classifier. So some cultural variations in how people spread and react to nonsense online at least remains a possibility.

A more certain challenge is “interpretability” — aka explaining what underlies the patterns the deep learning technology has identified via the spread of fake news.

While algorithmic accountability is very often a challenge for AI technologies, Bronstein admits it’s “more complicated” for geometric deep learning.

“We can potentially identify some features that are the most characteristic of fake vs true news,” he suggests when asked whether some sort of ‘formula’ of fake news can be traced via the data, noting that while they haven’t yet tried to do this they did observe “some polarization”.

“There are basically two communities in the social network that communicate mainly within the community and rarely across the communities,” he says. “Basically it is less likely that somebody who tweets a fake story will be retweeted by somebody who mostly tweets real stories. There is a manifestation of this polarization. It might be related to these theories of echo chambers and various biases that exist. Again we didn’t dive into trying to explain it from a sociological point of view — but we observed it.”

So while, in recent years, there have been some academic efforts to debunk the notion that social media users are stuck inside filter bubble bouncing their own opinions back at them, Fabula’s analysis of the landscape of social media opinions suggests they do exist — albeit, just not encasing every Internet user.

Bronstein says the next steps for the startup is to scale its prototype to be able to deal with multiple requests so it can get the API to market in 2019 — and start charging publishers for a truth-risk/reliability score for each piece of content they host.

“We’ll probably be providing some restricted access maybe with some commercial partners to test the API but eventually we would like to make it useable by multiple people from different businesses,” says requests. “Potentially also private users — journalists or social media platforms or advertisers. Basically we want to be… a clearing house for news.”

Facebook removes hundreds of accounts linked to fake news group in Indonesia

Facebook said today it has removed hundreds of Facebook and Instagram counts with links to an organization that peddled fake news.

The world’s fourth largest country with a population of over 260 million, Indonesia is in election year alongside Southeast Asia neighbors Thailand and the Philippines. Facebook said this week it has set up an ‘election integrity’ team in Singapore, its APAC HQ, as it tries to prevent its social network being misused in the lead-up to voting as happened in the U.S.

This Indonesia bust is the first move announced since that task force was put in place, and it sees 207 Facebook Pages, 800 Facebook accounts, 546 Facebook Groups, and 208 Instagram accounts removed for “engaging in coordinated inauthentic behavior.”

“About 170,000 people followed at least one of these Facebook Pages, and more than 65,000 followed at least one of these Instagram accounts,” Facebook said of the reach of the removed accounts.

The groups and accounts are linked to Saracen Group, a digital media group that saw three of its members arrested by police in 2016 for spreading “incendiary material,’ as Reuters reports.

Facebook isn’t saying too much about the removals other than: “we don’t want our services to be used to manipulate people.”

In January, the social network banned a fake news group in the Philippines in similar circumstances.

Despite the recent action, the U.S. company has struggled to manage the flow of false information that flows across its services in Asia. The most extreme examples come from Myanmar, where the UN has concluded that Facebook played a key role in escalating religious hatred and fueling violence. Facebook has also been criticized for allowing manipulation in Sri Lanka and the Philippines among other places.

Facebook to encrypt Instagram messages ahead of integration with WhatsApp, Facebook Messenger

Facebook is planning to roll out end-to-end encryption for Instagram messages, as part of a broader integration effort across the company’s messaging platforms, including WhatsApp and Facebook Messenger.

First reported by The New York Times, the social media giant said reworking the underlying infrastructure of its three messaging apps will allow users to talk to each other more easily. The apps will reportedly remain independent of one another — with Instagram and WhatsApp bringing in 1 billion and 1.5 billion users, respectively.

In doing so, Facebook is adding end-to-end encryption to Instagram messages. That will bring a new level of security and privacy to Instagram users for the first time. Facebook will also begin encrypting Facebook Messenger by default, which has, to date, required users to manually switch on the feature.

So far, only WhatsApp messages are end-to-end encrypted by default.

The plans are part of the company’s effort to keep people on the platform for longer, the Times reported, at a time when the company has 2.2 billion users but user trust has declined following a string of privacy scandals and security incidents. End-to-end encrypted messages can’t be read beyond the sender and the recipient — not even Facebook. In shutting itself out of the loop, it reduces the amount of data it can access — and can be theoretically stolen by hackers.

“We want to build the best messaging experiences we can; and people want messaging to be fast, simple, reliable and private,” a Facebook spokesperson told TechCrunch. “We’re working on making more of our messaging products end-to-end encrypted and considering ways to make it easier to reach friends and family across networks.”

“As you would expect, there is a lot of discussion and debate as we begin the long process of figuring out all the details of how this will work,” the spokesperson said, without providing a timeline on the planned unification.

But how the integration will be met by European regulators is anybody’s guess.

Two years ago, Facebook rolled back its plans to begin sharing WhatsApp user data with the social network for advertising at the request of U.K. data protection authorities, putting the plan on ice across the European continent. Under the proposed changes to its terms and conditions, WhatsApp would have shared the user’s phone number that was used to verify their account, and the last time they used the service. That led to concerns about privacy, given that a real-world identity isn’t needed for WhatsApp, unlike Facebook, which requires users display their real names.

Facebook acknowledged that it didn’t have answers just yet about how it plans to navigate the issue, citing the early stages of its planned integration.

The app integrations are said to be a priority for 2019, with an eye for a 2020 release, the Times said.

Microsoft confirms Bing is down in China

Microsoft’s Bing is down in China, according to users who took to social media beginning Wednesday afternoon to complain and express concerns.

The Seattle-based behemoth has confirmed that its search engine is currently inaccessible in China and is “engaged to determine next steps,” a company spokesperson said in a statement to TechCrunch Thursday morning.

Citing sources, the Financial Times reported (paywalled) on Thursday that China Unicom, a major state-owned telecommunication company, confirmed the government had ordered a block on Bing.

Public reaction

The situation appears to be a DNS (domain name system) corruption, one method for China to block websites through its intricate censoring system called the Great Firewall. When a user enters a domain name associated with a banned IP address, the Firewall will corrupt the connection to stop the page from loading.

Several users told TechCrunch they are still able to access Bing by directly visiting its IP address as of Thursday morning.

Other users writing on social media believe the block is a result of Bing’s server crash after a viral article (link in Chinese) attacking Baidu’s search quality directed traffic to its lesser-known American rival. Many referred to a Chinese report that says high traffic from Baidu had crashed Bing. The article, published by Jiemian, a news site under the state-owned Shanghai United Media Group, now returns a 404 error.

Tight seal

Bing remained one of the few non-Chinese internet firms that still have their core products up and running in a country where Google and Facebook have long been unavailable. Another rare case is LinkedIn, which runs a filtered version of its social network for professionals and caught flack for bending to local censorship.

Bing also censors its search service for Chinese users, so it would be odd if its inaccessibility turned out to be a case of government clampdown. That said, China appears to be further tightening control over the cyberspace. Case in point, LinkedIn recently started to run strict identity checks on its China-based users.

Baidu remains the biggest search engine in China with smaller rival Sogou coming in second. Bing, which some users find is a more pleasant alternative to local options that are usually flooded with ads, is active on 320,000 unique devices monthly, according to third-party research firm iResearch. That’s dwarfed by Baidu’s 466 million and Sogou’s 43 million.

Google told the U.S. Congress in December it had no immediate plans to relaunch its search engine in China but felt “reaching out and giving users more information has a very positive impact.” The Mountain View-based firm shut down its search engine in mainland China back in 2010 under pressure over censorship but also cited cyber attacks as a factor in its decision to leave.

Facebook is reportedly testing solar-powered internet drones again — this time with Airbus

Facebook last year grounded its ambitious plan to develop a solar-powered drone to beam internet across the world, but the company isn’t done with the concept, it seems. The social media giant is working with aeronautics giant Airbus to test drones in Australia, according to a new report from Germany’s NetzPolitik.

Using a request under Australia’s Freedom of Information Act, NetzPolitik got hold of a document that shows the two companies spent last year in talks over a collaboration with test flights scheduled for November and December 2018. The duo have collaborated before on communication systems for satellite drones.

Those trials — and it isn’t clear if they took place — involved the use of Airbus’ Zephyr drone, a model that is designed for “defence, humanitarian and environmental missions.” The Zephyr is much like Facebook’s now-deceased Aquila drone blueprint; it is a HAPS — “High Altitude Pseudo Satellite” — that uses solar power and can fly for “months.”

The Model S version chosen by Facebook sports a 25-meter wingspan, can operate at up to 20km altitude and it uses millimeter-wave radio to broadcast to the ground.

The Zephyr Model S and Model T as displayed on the Airbus website

The Facebook and Airbus were designed to test a payload from the social network — doubtless internet broadcasting gear — but, since the document covers planning and meetings prior to the tests, we don’t know what the outcome or results were.

“We continue to work with partners on High Altitude Platform System (HAPS) connectivity. We don’t have further details to share at this time,” a Facebook spokesperson told NetzPolitik.

TechCrunch contacted Facebook for further comment (06:55 am EST), but the company had not responded at the time of writing.

Facebook has a raft of projects that are aimed at increasing internet access worldwide, particularly in developing regions such as Asia, Africa and Latin America. The drone projects may be its boldest, they are aimed at bringing connectivity to remote areas, but it has also used software and existing infrastructure to try to make internet access more affordable.

That has included the controversial Internet.org project, which was outlawed in India because it violated net neutrality by selecting the websites and apps that could be used. Since renamed to Free Basics — likely promoted by the Indian setback — it has been scaled back in some markets but, still, Facebook said last year that the program has reached nearly 100 million people to date. Beyond that top line number, little is known about the service, which also includes paid tiers for users.

That aside, the company also has a public-private WiFi program aimed at increasing hotspots for internet users while they are out and about.

Vietnam threatens to penalize Facebook for breaking its draconian cybersecurity law

Well, that didn’t take long. We’re less than ten days into 2019 and already Vietnam is aiming threats at Facebook after it violating its draconian cybersecurity law which came into force on January 1.

The U.S. social network stands accused of allowing users in Vietnam to post “slanderous content, anti-government sentiment and libel and defamation of individuals, organisations and state agencies,” according to a report from state-controlled media Vietnam News.

The content is said to have been flagged to Facebook which, reports say, has “delayed removing” it.

That violates the law which — passed last June — broadly forbids internet users from organizing with, or training, others for anti-state purposes, spreading false information, and undermining the nation state’s achievements or solidarity, according to reports at the time. It also requires foreign internet companies to operate a local office and store user information on Vietnamese soil. That’s something neither Google nor Facebook has complied with, despite the Vietnamese government’s recent claim that the former is investigating a local office launch.

In addition, the Authority of Broadcasting and Electronic Information (ABEI) claimed Facebook had violated online advertising rules by allowing accounts to promote fraudulent products and scams, while it is considering penalties for failure to pay tax. The Vietnamese report claimed some $235 million was spent on Facebook ads in 2018, with $152.1 million going to Google.

Facebook responded by clarifying its existing channels for reporting illegal content.

“We have a clear process for governments to report illegal content to us, and we review all these requests against our terms of service and local law. We are transparent about the content restrictions we make in accordance with local law in our Transparency Report,” a Facebook representative told TechCrunch in a statement.

TechCrunch understands that the company is in contact with the Vietnamese government and it intends to review content flagged as illegal before making a decision.

Vietnamese media reports claim that Facebook has already told the government that the content in question doesn’t violate its community standards.

It looks likely that the new law will see contact from Vietnamese government censors spike, but Facebook has acted on content before. The company latest transparency report covers the first half of 2018 and it shows that received 12 requests for data in Vietnam, granting just two. Facebook confirmed it has previously taken action on content that has included the alleged illegal sale of regulated products, trade of wildlife, and efforts to impersonate an individual.

Facebook did not respond to the tax liability claim.

The company previously indicated its concern at the cybersecurity law via Asia Internet Coalition (AIC) — a group that represents the social media giant as well as Google, Twitter, LinkedIn, Line and others — which cautioned that the regulations would negatively impact Vietnam.

“The provisions for data localization, controls on content that affect free speech, and local office requirements will undoubtedly hinder the nation’s fourth Industrial Revolution ambitions to achieve GDP and job growth,” AIC wrote in a statement in June.

“Unfortunately, these provisions will result in severe limitations on Vietnam’s digital economy, dampening the foreign investment climate and hurting opportunities for local businesses and SMEs to flourish inside and beyond Vietnam,” it added.

Vietnam is increasingly gaining a reputation as a growing market for startups, but the cybersecurity act threatens to impact that. One key issue is that the broad terms appear to give the government signficant scope to remove content that it deems offensive.

“This decision has potentially devastating consequences for freedom of expression in Vietnam. In the country’s deeply repressive climate, the online space was a relative refuge where people could go to share ideas and opinions with less fear of censure by the authorities,” said Amnesty International.

Vietnam News reports that the authorities are continuing to collect evidence against Facebook.

“If Facebook did not take positive steps, Vietnamese regulators would apply necessary economic and technical measures to ensure a clean and healthy network environment,” the ABEI is reported to have said.