3 methods for valuing pre-revenue novel AI startups

Valuing pre-revenue tech startups is an established process today, but do the methods employed apply equally to pre-revenue companies using novel artificial intelligence? What kind of issues arise when you apply them to startups that are developing AI that can scale rapidly to millions of users? These questions are no longer academic.

This article provides a primer of the traditional methods used to value pre-revenue startups, examines some of the limitations that arise when these methods are used for novel AI startups and suggests ways to reduce risk.

Let’s start by looking at the three generally accepted ways of valuing pre-revenue or early-stage companies: Scorecard valuation, venture capital and the Berkus Method. We’ll later delve into some of the challenges in applying these methods to an early-stage company with novel AI applications.

Scorecard valuation method

AI can scale much faster than other technologies, so what works at the beta or minimum viable product stage may not work when an AI product scales to millions of users.

This valuation method seeks to compare a startup with others in the market.

First, the median pre-money valuation for other startups in the same market is determined. Then, this benchmark of value is used to compare the startup in question taking into account factors such as the strength of the management team, size of the opportunity, the product/technology, competitive environment and marketing/sales channels.

While highly subjective, each of these factors is assigned a value — akin to a scorecard. If the median pre-money valuation for startups in the market is $1 million and a startup’s various factors amount to 1.125, the two numbers are multiplied to obtain the pre-money valuation.

Venture capital method

The venture capital method seeks to determine a startup’s pre-money valuation by extrapolating its post-money valuation. Like with scorecard valuation, you need to make assumptions by comparing the startup to benchmark companies in the same market.

3 methods for valuing pre-revenue novel AI startups by Ram Iyer originally published on TechCrunch

Zappi raises $170M for its AI-based market research platform

Fundraising may be drying up for some segments of the tech industry at the moment, but one area that continues to get a lot of attention is AI and specifically startups that are using it to build out revenue-generating businesses. Today, a startup called Zappi, which has built a market research platform that utilizes AI and automation to speed up processes (and cut down costs by some 10x in the process), is announcing that it has raised a whopping $170 million to expand its business.

The majority of the investment is coming from Sumeru Equity Partners, with undisclosed co-investors. Zappi is also not commenting on its valuation with this round, but as a mark of the size of the business: Zappi’s been profitable until about a year ago, when it shifted gears to grow; prior to this it had only raised around $22 million since it was founded in 2012 (with previous backers including the likes of WPP); and at the end of last year it broke about $50 million in revenues. (Note: don’t try to use multiples of that to figure out valuation: that’s a metric that seems to be all over the place right now, ranging from 12x revenues through to 20x, and I’m sure there are higher and lower examples, too.)

Zappi today has some 350 clients, massive FMCG enterprises like PepsiCo, McDonald’s, Heineken and Reckitt, and the core of its product is that it helps these customers run surveys on ideas as they weigh up what kinds of products to develop as well as early insights into how best to market them. Zappi is mostly used at the pre-product stage, CEO and co-founder Steve Phillips said in an interview. He said that the market that it’s targeting is huge: $90 billion is spent annually on consumer insights and market research.

Typically, market research of this kind could cost these large FMCG companies as much as $20,000 and take between four to six weeks to complete, with lengthy surveys of panels of users and then a lot of data crunching to turn that into something that can used. Zappi’s pitch is that for $2,000, it uses a mix of human surveys — it integrates with other companies that build those networks of respondents and might get as many as 300 or 400 people per campaign — plus a lot of other data, including its own consumer database that it says is made up of 1.2 billion data points — to get the same research done in four to six hours. That research comes complete with reports that can be used by Zappi’s clients in their wider in-house analytics, too.

The gap in the market that Zappi is aiming at is the fact that a lot of FMCG companies don’t have a lot of digital DNA to them. You can see this from the products themselves being physical consumables, but also in the fact that products are localized in how they are made and distributed, using typically analogue channels like shops and via third parties to carry this out. This also means, Phillips said, that they are “flying blind when launching a new product.”

Companies might go so far as investing in developing new products to test out, before even knowing if they would fly with users, which means a lot of cost potentially going into something that might never hit the shelves of any store. “We thought, we could automate that process to make it quicker.”

AI is making big waves in the non-tech world among companies that want to tap into it to speed up how they work, and that preceded the arrival of the Covid-19 pandemic. (PepsiCo, one of Zappi’s big customers, as one example has been using other implementations of AI in product development for years.)

But, the pandemic was acutely felt in that world, precisely because of how fundamentally tied a lot of those products were to physical supply and distribution chains. The fact that all of that needed to be rethought definitely gave a fillip to digital transformation and opening up companies to the idea of working with companies like Zappi in areas like marketing and market research, Phillips said.

“We see a world where every single company wants to know what its customers are thinking — driving tens of billions of annual spend on researching consumer insights,” said Sanjeet Mitra, George Kadifa and Sofija Ostojic, the three principals at Sumeru who lead this deal, in a joint statement. “Zappi has reimagined the entire process by using innovative technology to empower enterprises to collaborate with customers in real-time, resulting in meaningful product and advertising decisions made more efficiently and thoughtfully. We are incredibly excited to partner with the Zappi team and are proud to invest in a company with culture and community impact at the center of its strategy.” All three are joining the board with this round.

Zappi raises $170M for its AI-based market research platform by Ingrid Lunden originally published on TechCrunch

Decentralized discourse: How open source is shaping Twitter’s future

Six weeks on from Elon Musk’s $44 billion Twitter takeover, few words can fully encapsulate events as they have unfolded in the period since. “Chaotic” or “farcical” come pretty close, though, with mass layoffs, u-turns, ultimatums, resignations,crowdsourced ban-reversals, advertiser standoffs, picking fights with Apple, and a revamped verification system that has everyone and their uncle confused.

In truth, this was all mostly expected by anyone paying attention to the flip-flopping grandstanding that enveloped the six-month period up to the acquisition. But taking a step back from the entropy now enshrined at Twitter Towers, it’s worth looking at a recurring theme that has permeated the saga ever since moneyman Musk entered the picture — one that could play an instrumental part in shaping Twitter’s future.

The open source factor

Even before procuring a 9.2% stake in Twitter back in April, Musk openly posited that Twitter’s recommendation algorithm should be open source. When Twitter later accepted his offer to buy the company outright, Musk doubled down on that notion, saying in his inaugural statement that he wanted to make Twitter “better than ever,” which included “making the algorithms open source to increase trust.”

The principle behind the idea is sound enough. To understand why Twitter is showing people a specific piece of content, and by extension the snowball effect this is having on society, having insights into algorithms could help — and open-sourcing these algorithms would play a part.

But by most estimations, such a solution is imperfect, because viewing code doesn’t tell you how the algorithm was created and what (if any) human biases were involved in its creation, nor what data it was built on.

Little has been said by Musk about open-sourcing Twitter’s algorithm since taking over, but he has laid off the entire “ethical AI” team that was working on the very problem that Musk had identified: bringing more algorithmic transparency to the table.

Twitter had in fact previously committed to open-sourcing at least one of its algorithms following controversy over racial bias that was seemingly embedded into its image-cropping tech. That never quite materialized, but the fact that its ML Ethics, Transparency and Accountability (META) team is now pretty much defunct means that it could be a while before a similar program emerges from Twitter.

However, the “open source factor” is still hovering around the world of Twitter in various guises.

The “Twitter alternative”

Image Credits: TechCrunch

Mastodon has emerged as the default life raft for those jumping ship from Twitter, and while it probably isn’t the Twitter 2.0 that much of the world really wants right now, it hints at what a future Twitter could look like. The so-called “open source Twitter alternative” does have Twitter-esque microblogging features, but it’s founded on an entirely different infrastructure centered around the concept of the fediverse: a decentralized network of interconnected servers that allow different platforms to communicate with each other, powered by the open ActivityPub protocol.

Although it springboarded past 1 million and then two million active users last month, Mastodon isn’t the only platform standing to benefit from the Twitter debacle. Tumblr was already positioning itself as a “better Twitter,” and parent company Automattic’s CEO Matt Mullenweg revealed that Tumblr downloads had skyrocketed in the weeks following Musk’s arrival at Twitter.

Data from Sensor Tower backs that up, with Tumblr app installs in the U.S. alone rising 96%.

Image Credits: Sensor Tower

Tumblr isn’t open source or decentralized, but Mullenweg is a fan of the genre. WordPress, which he co-created, is among the top open source projects on the planet, and Automattic recently open-sourced its Pocket Casts podcast app.

Looking to capitalize on Twitter’s predicament and Mastodon’s modest rise, Mullenweg has been quick to align Tumblr with the open source sphere, confirming previously discussed plans to make Tumblr as “open source as possible.” He also solicited feedback on plans to align Tumblr with the fediverse and support related open source protocols, before revealing that Tumblr intends to support the ActivityPub protocol in the future. This could mean that users of Mastodon and Tumblr would be able to communicate directly with each other. Flickr CEO Don MacAskill later polled his Twitter followers on whether the photo-hosting platform and community should also embrace ActivityPub.

Elsewhere, open source enterprise messaging platform Rocket.chat revealed earlier this year that it was transitioning to a similar decentralized communication protocol called Matrix.

So it’s clear that there is growing momentum in the social sphere to move away from centralization, toward an interoperable world where people aren’t tied into single-player ecosystems.

Bluesky thinking

Concept illustration depicting decentralized social network Bluesky Image Credits: Bluesky

This is one direction Twitter could also go down. The company in fact flirted with a similar decentralized approach in its earliest days, according to one person directly involved in the project, while the prospect has reared its head again in recent times too.

Blaine Cook, one of Twitter’s founding engineers who joined the company just months after it was created, took to Twitter recently to lament the fact that Twitter could have been a decentralized protocol from the get go. He said that it was something that he had started to develop while he was chief architect at the burgeoning social network, but the project was ultimately canned shortly after he left the company in 2008.

“The [decentralized] API was very much like ActivityPub today, and ActivityPub’s lineage can be traced back to those early experiments,” Cook told TechCrunch.

According to Cook, those “early experiments” were based on XMPP, the open communication messaging protocol (formerly known as Jabber) developed by Jeremie Miller, who also now sits on Bluesky’s board (more on Bluesky below) alongside Jack Dorsey. But despite the support of some, Cook said the idea just didn’t fly.

So, what happened to this fediverse project at Twitter? “I’ve never gotten the full story,” Cook said, noting that he was outvoted on the matter and was later “pushed out of the company” and the API never materialized.

Fast-forward to 2022, though, and there remains some prospect that Twitter could still embrace federation and an open source protocol.

As the Twitter acquisition crawled closer to its conclusion a few months back, cofounder and former CEO Jack Dorsey took to Twitter to say that his biggest regret was that Twitter had become a company in the first place (though it is easy to say that once you’ve made your billions). This built on other statements Dorsey had made to that effect, for example in April when he tweeted:

I’m [SIC] don’t believe any individual or institutions should own social media, or more generally media companies. It should be an open and verifiable protocol. Everything is a step toward that.

Twitter had in fact already birthed a Mastodon-esque decentralized project called Bluesky, which Dorsey introduced to the world back in 2019 while he was Twitter CEO. He said at the time that Twitter would be funding a “small independent team of up to five open source architects, engineers, and designers,” charged with building a decentralized standard for social media, and the ultimate goal was for Twitter to adopt this standard itself. But it was always going to be a long journey, with Bluesky only recently announcing a beta signup program for Bluesky Social, an app built on the new AT Protocol.

If all goes to plan, any company or developer will be able to build an app using the AT Protocol, and communicate with other apps that share that protocol (including Twitter). This means users could elect to use one specific app that presents messages in a completely different format powered by a different algorithm, and then “lift-and-shift” all their data to an alternative down the line if their requirements change.

Despite the chaos ensuing at Twitter today, the Bluesky project should remain safe from interference, insofar as it is a Public Benefit LLC that’s operationally independent from Twitter, though it was dependent on $13 million in funding from Twitter through its initial R&D phase.

With Musk now at the helm at Twitter, it’s impossible to know where this leaves Bluesky. Sure, Bluesky may be independent, but Twitter was supposed to be its big-name client, and Dorsey is no longer in charge at Twitter.

TechCrunch reached out to Bluesky lead Jay Graber, but they were unable to provide a comment at the time of writing. But on the day Musk took over Twitter, Graber did tweet to remind the world that Bluesky was independent and, much like email, decentralized initiatives such as the AT Protocol can’t be bought.

Very curious to see where Elon is going to take Twitter. Very glad we’re independent — will keep working on building protocols that make social more resilient to rapid change. Nobody can buy “email” as a platform, and that’s a good thing.

— Jay Graber (@arcalinea) October 28, 2022

Musk has shown on more than a few occasions that he is keen on the concept that underpins Bluesky though. He is known to be a big fan of crypto (some people actually think that Musk is Bitcoin creator Satoshi Nakamoto) and decentralization. In a series of messages exchanged between Musk and Dorsey earlier this year, Musk expressed interest in Dorsey’s vision for Twitter as part of an open source protocol. But with Musk currently more concerned with trying to jumpstart Twitter and avert bankruptcy, adopting the AT Protocol might not be top of his to-do list in the immediate future.

Dorsey, meanwhile, remains on the Bluesky board, and recently said that he’s pushing for Bluesky to be a direct competitor to “any company trying to own the underlying fundamentals for social media or the data of the people using it.” And that, of course, includes Twitter.

Challenges

Image Credits: Bryce Durbin / TechCrunch

One of the biggest argumentsagainst a decentralized social network is likely to come from a business perspective, as federated systems give users more choice and it’s more difficult for companies to lock users in. The so-called “network effect,” where a product’s value increases as the number of people using it increases, isn’t nearly as potent if the user can download an app from one company and chat with their friends who use a different app.

“Since the inception of Facebook, Twitter, and Instagram, network effects have trapped users on those platforms — no-one wants to go somewhere their friends aren’t,” Cook said. “The fediverse inverts the control, and allows people to choose where to host their online identity. The hope is that ultimately, this will also mean competition between — for example, Mastodon and other social software — and the evolution of features in a way Twitter was never able to support.”

Hemant Mohapatra, partner at VC firm Lightspeed India and former investor at Silicon Valley’s Andreessen Horowitz (which backed both Facebook and Twitter), said that while decentralization has its benefits, the existing centralized “web 2.0” model allows social networks to engineer “serendipity” into their content recommendations using a larger pool of data. In other words, it’s easier for companies to build something where people can find “things” that they like — people or content — and thus entice them back for more.

“In centralized systems, the algorithm decides the idea of ‘serendipity,’ based on interests, filters and so on,” Mohapatra told TechCrunch. “TikTok’s entire platform runs purely on this. When you decentralize this, depending on how the backend is built, ‘crawling’ the sharded data is that much harder. Users then have to go to what is a ‘pub/sub’ architecture — users subscribe to publishers instead of getting the platform to recommend and surface things. The surface area of random discovery, or serendipity, goes down.”

This helps to highlight that while decentralization might benefit users in terms of giving them flexibility and avoiding lock-in, there are trade-offs. Such trade-offs could also prevent Twitter, or whatever future contender, from being able to monetize as effectively — certainly at the level of the social networks of today. Super-targeted and behavioral advertising might also be off the cards in the fediverse, which would mean money will have to come from elsewhere. That could be old-school contextual advertising tailored to a specific “instance” of a social network, but it could also mean that subscriptions for “power features” become an integral part of social networks — something that Musk is focused on right now at Twitter as the advertising dollars dry up.

Ultimately, though, where there are masses of people, the innovators and entrepreneurs always figure out new ways to cash in.

“I fully expect for-profit entities to emerge, offering white-label or polished Twitter-like experiences,” Cook added. “A good analogue here would be email: many people use email daily for important business operations, and many companies are able to provide value-added services on top, despite the underlying protocols and most email software being free.”

For now, the network effect is still very much alive in today’s big social networks. But with a growing array of decentralized options, the current crop of social networks could become less sticky over time: if someone can jump ship to Tumblr and still chat with their pals over on Mastodon, there is less impetus for everyone to be in the same social space.

Cory Doctorow, author, activist, and special advisor to the Electronic Frontier Foundation (EFF), is a strong proponent of open source and interoperability— making things created by different people or companies work together, whether it’s printers and ink cartridges or, indeed, social networks.

So what does the next version of social media look like?

“A big, sprawling, pluralistic web of semi-connected systems, some better run, some worse, with both technical and legal protections for freedom of movement to let you change nodes without losing your communities, customers, family and friends,” Doctorow explained to TechCrunch.

What we’re likely talking about are lots of separate commercial (or not) apps connected by shared protocols, but none really getting uncomfortably large. And if enough social networks do join an open protocol, it won’t matter so much if people leave Twitter and join the same social network, as they will be able to choose from multiple alternatives. This could support a whole new array of smaller social networks — lots of different apps doing their own little thing, built on their own algorithms and moderation policies, with their own business models in place.

However, social networks as they stand remain virtual vortexes for the most part, by virtue of the fact that people want to be where all their friends are. True interoperability remains a pipedream for now, but there are encouraging signs on the horizon.

Regulation time

Digital Markets Act (DMA) Image Credits: Tanaonte/ Getty

There has been an array of anti-trust lawsuits that have already led to some meaningful change, such as Apple being forced to allow dating app developers in the Netherlands to use alternative payment options, while Google has faced similar regulatory pressure to open up.

This helps to illustrate how Big Tech is being strong-armed into loosening its stranglehold on their respective platforms. In tandem, these companies have also been trying to appease regulators through more proactive measures.

In 2018, Facebook, Google, Microsoft, and Twitter joined forces for the Data Transfer Project (Apple joined later), an open source initiative to co-develop tools for transferring data between services. Not a huge amount has come from this effort so far, but there have been a few things of note — Facebook has launched a tool that lets users transfer their photos and videos to Google Photos, for example. And earlier this year, Google revealed that it would be investing $3 million in portability programs.

But none of this goes nearly far enough in terms of addressing the underlying “stickiness” embedded into these platforms. There is nothing really stopping Facebook and Twitter users being able to message each other today, beyond the technological barriers each company has chosen to implement. This is why regulators are continuing to look closely at these kinds of walled gardens, with Europe pushing ahead with rules to force interoperability between messaging platforms. And in the U.S. there are similar plans for an interoperable future via the ACCESS Act.

Elsewhere, Europe’s Digital Services Act, which entered into force last month, has provisions for algorithmic transparency. The European Commission recently launched the European Centre for Algorithmic Transparency (ECAT) to help support its oversight and algorithmic auditing of very large online platforms (VLOP). And earlier this year, U.S. Senators introduced the Algorithmic Accountability Act, touted as a “landmark bill” designed to bring transparency and oversight to software and automated systems that are used to “make critical decisions about nearly every aspect of Americans’ lives.”

None of this necessarily requires social platforms to open up all their algorithms for the world to see, but in light of his well-publicised obsession with open-sourcing Twitter’s recommendation algorithms, such regulations could spur Musk into releasing the code (for whatever good that would actually do).

Throw all of this together into a giant melting pot, and what we have is a fertile landscape for change: a growing array of open source protocols that can bridge myriad social networks, a push toward algorithmic transparency, and regulators forcing the long-established incumbents to participate.

But whatever promising growth metrics that Mastodon and its ilk have reported over the past month, the fact remains it’s difficult to scale a social network, which keeps Twitter in a relatively strong position for now.

“It’s true that the ‘law of small numbers’ is at play here — it’s easy to double a small number, and hard to double a large one,” Doctorow said. “And it’s likewise true that when you scale something up quickly, you discover lots of new problems, and the hard way. It’s incumbent on decentralization advocates to maintain that momentum and address those problems as they occur.”

What’s also apparent here is the emergence of multiple “competing” protocols: ActivityPub, At Protocol (Bluesky), and Matrix, to name just a few. Off the bat, these different protocols don’t play ball with each other. But it’s far from an insurmountable hurdle, given that these protocols are not proprietary IP: they’re open and can be made interoperable.

“I think diversity of protocol is important, as is diversity of the applications built on top of the protocols,” Cook added. “That said, I strongly believe that interoperability between ActivityPub and Bluesky won’t be difficult. The only thing preventing, for example, interoperability between Twitter and Facebook’s timeline has been protectionist policies by those companies.”

There are many different analogies that can help us understand how things might evolve here. In the email realm, there are different protocols for accessing email such as IMAP and POP, while the telecommunications sector has also thrived on interoperable protocols for routing and carrying phone calls and text messages. Once upon a time it wasn’t possible to send a text message between different carriers, but today it’s something most people take for granted.

There’s no real reason why social networks developed on different protocols should be any different.

Open sesame

Image: Bryce Durbin / TechCrunch

All this leads us to one interesting pontification: What if Twitter decided to go all-in on open source? Not just a recommendation algorithm or a protocol, but the whole shooting match — codebase, clients ‘n all? It would certainly be a Herculean undertaking, particularly with everything else going on at Twitter right now.

It would also be an almost unprecedented move to see a $44 billion private company open its entire codebase to the world’s masses. That’s not to say that it couldn’t ever happen though, as Musk has form in making radical moves. Eight years ago Musk ripped up the patent playbook when he pledged that Tesla wouldn’t sue any company that infringed any of its patents “in good faith.” At the time, Musk said it was all about expediting electric car adoption and the infrastructure required (e.g. charging stations), an ethos that is broadly aligned with that of open source.

“Technology leadership is not defined by patents, which history has repeatedly shown to be small protection indeed against a determined competitor, but rather by the ability of a company to attract and motivate the world’s most talented engineers,” Musk wrote at the time. “We believe that applying the open source philosophy to our patents will strengthen rather than diminish Tesla’s position in this regard.”

While the “attract and motivate the world’s most talented engineers” facet stands out like a sore thumb when juxtaposed against the turmoil at Twitter today, the fact that Musk was willing to make such a left-field move with the company’s patents is notable when you consider where he finds himself today at Twitter. He clearly needs to galvanize a depleted workforce and prevent Twitter from falling apart.

But would going the whole nine yards on open source fix things at Twitter?

“What Musk did at Tesla with the patents was unprecedented,” Heather Meeker, an open source licensing specialist and partner at seed-stage VC firm OSS Capital, told TechCrunch. “But I’m not sure laying the code open would solve their maintenance problem — it might generate a lot of good will though. A lot of the maintenance effort for a company — like Twitter, or any other — is in putting together and managing the platform, not writing or maintaining code.”

Cook agreed that it would make little sense for Twitter to go fully open source, due to the fact that its problems are less about the number of eyeballs on code than it is about infrastructure, as well as the strategic decisions it makes at a business level.

“Nowadays, Twitter isn’t so much a one-source repository, but an immense deployed infrastructure that would likely take weeks to set up from scratch,” Cook said. “I’m not sure outside engineers could contribute in any meaningful way. And most of Twitter’s problems these days are policy, not code per se, as much as Musk is fixated on that aspect.”

In the more immediate term, however, there are major safety and security implications at play, with chief information security officer Lea Kissner recently departing, and content moderation seemingly going out of the window.

Open source could have a part to play here, perhaps best evidenced throughCommunity Notes, formerly known as Birdwatch until Musk decided it was time for a name-change last month. According to Musk, Twitter “needs to become by far the most accurate source of information about the world,” and Community Notes is apparently what will power that mission.

Community Notes is essentially Twitter crowdsourcing information accuracy from its millions of users, with approved contributors able to rate and add “helpful context” to tweets. This was opened to everyone in the U.S. in October, and started rolling out to everyone globally over the weekend.

Community Notes in actionImage Credits: Twitter

The Community Notes ranking algorithm source code is available on GitHub for anyone to peruse, and already there are third-party developers building products on top of it, such as the open source Community Notes Dashboard which serves as a leaderboard for contributors to the Community Notes program.

The ethos behind Community Notes is sound in principle, insofar as one of Twitter’s biggest problems from a content moderation perspective has been scalability: algorithms have limited accuracy and struggle with nuance, while there can only be so many humans on-hand to help internally.

If Twitter was to travel further down the path of open source, it could help bring a little more trust back to the platform, something that has been eroded of late — just today, news emerged that Twitter had disbanded its Trust & Safety Council advisory group.

“There are many benefits of open source, external contribution to the code is just one of many,” Joseph Jacks, founder and seed-stage investor at OSS Capital, told TechCrunch. “Other benefits that would be immediately impactful — at the level of why Signal is more trusted than WhatsApp — include code transparency, and trust and privacy assurance, because the world would know how everything fundamental to the platform is implemented. Open source enables a high degree of provable trust in technology that otherwise is simply not possible.”

There is one particularly alluring aspect of federation that could also appeal to Twitter’s new owner in the near term. A decentralized infrastructure could potentially help combat spam, bots, and other bad actors — something that Musk has persistently complained about before, during, and after the acquisition closed. Different apps on a shared protocol could collaborate and share data.

“[With decentralization] I think we’ll see a whole bunch of shared code, design patterns, and eventually, shared infrastructure to help replicate and improve upon the sort of trust and safety policies that Twitter implemented,” Cook said.

So rather than relying on a single entity to manage bots or abuse, all the companies on a shared protocol could share blocklists and detection models, bypassing the inherent constraints of a single product team at Twitter or Facebook.

With his visions for Twitter 2.0, there are signs that Musk is looking to lean on other facets of the open source sphere, too, including a protocol that’s used by billions of people globally.

Signal of intent

A Signal logo is seen on a smartphone screen.Image Credits: Pavlo Gonchar/SOPA Images/LightRocket via Getty

In response to a planned privacy policy change at Facebook last year that would share some WhatsApp data with Facebook, Elon Musk publicly recommended that people ditch WhatsApp for Signal, an open source messaging alternative backed by WhatsApp cofounder Brian Acton. And it’s clear that Signal remains on Musk’s radar today.

In a response to a question posted on Twitter last month, Musk said that the goal of direct messages (DMs) is to “superset Signal,” a lofty ambition that presumably means he wants to make direct messaging on Twitter more secure than Signal.

The goal of Twitter DMs is to superset Signal

— Elon Musk (@elonmusk) November 9, 2022

But it’s easy to say things on Twitter — it’s a completely different thing executing on such ambitious (and vague) plans. However, new evidence recently emerged that Twitter is actively working to revive a previously shelved project to introduce encryption to DMs, while a report in The Verge also detailed some of Musk’s apparent plans for encrypted DMs as part of Twitter 2.0.

The report, citing comments reportedly made at an all-hands meeting, indicates that Musk had spoken directly with Moxie Marlinspike, the cryptographer, security researcher, and creator of Signal, about helping out with Twitter’s DM encryption roadmap.

For context, Marlinspike, who left Signal back in January, co-authored the open source Signal Protocol that powers encryption in WhatsApp, Facebook Messenger, Skype, and Signal itself. Marlinspike also had a previous career at Twitter after it acquired his enterprise mobile security company Whisper Systems in 2011, with Marlinspike going on to head up Twitter’s cybersecurity operations for a time. Twitter released some of Whisper Systems’ products under an open source license, with Marlinspike subsequently leaving Twitter in 2013 to work on what would eventually become Signal.

All signs so far suggest that Twitter’s encrypted DMs plan will channel the Signal protocol in some form, serving as another nod to how open source is shaping Twitter.

Twitter 2.0

Twitter is at a major crossroads, and nobody really knows what direction Twitter will take, perhaps not even Musk himself.

In some respects, Twitter could be “too big to fail” from an existential perspective as the de facto “global town square,” but that doesn’t necessarily translate into a thriving business. Advertisers are queasy about aligning themselves with hate speech and other forms of questionable content in a light-touch moderation world, and Twitter would be unlikely to attract enough subscribers to replace its lost advertising revenue.

It’s difficult to see a path forward for Twitter as a business in its current form. It will have to evolve in a meaningful way, which may require radical moves beyond trying to grow its subscription base. With growing awareness of — and movements toward — the fediverse, alongside mounting regulatory pressure around interoperability and algorithmic transparency, it feels like significant change is coming to the world of social networking.

“The reality is that federated services are experiencing explosive growth, more growth in the past couple of weeks than in the past several years,” Doctorow said. “That is an opportunity that is ours to seize — or lose.”

But what all this means for Twitter is still anyone’s guess.

Decentralized discourse: How open source is shaping Twitter’s future by Paul Sawers originally published on TechCrunch

SEC, CFTC and SDNY attorney’s office charge FTX’s Sam Bankman-Fried with defrauding investors

The U.S. Securities and Exchange Commission (SEC) has officially charged disgraced FTX founder Sam Bankman-Fried (aka SBF) with defrauding investors, it revealed on Tuesday morning follwing hisarrest in the Bahamas. The SEC said in a press release that in addition to being charges with fraud regarding equity investors in FTX, he’s also being investigated regarding other securities law violations — and noted that there are ongoing investigations pending against others involved as well.

The SEC isn’t the only one getting a hand on this ball, however: Both the Southern District of New York’s Attorney’s office and the Commodity Futures Trading Commission (CFTC) also filed charges against SBF in “parallel actions.”

The complaint from the U.S. securities regulator alleges that while Bankman-Fried presented FTX as “a safe, responsible crypto asset trading platform,” in reality the founder sometimes described as ‘crypto’s white knight’ was engaged in a “years-long fraud” designed to hide from FTX investors the fact that their funds were being redirected to SBF’s Alameda crypto hedge fund, while Alameda enjoyed a kind of favored status that protected it from the usual risk mitigation measures FTX employed. The SEC also takes issue with the degree of exposure FTX had to Alameda’s very large holdings of “illiquid assets such as FTX-affiliated tokens.”

Also included in the complaint are allegations that FTX customer funds were employed via Alameda for other expenditures including VC investments, “lavish real estate purchases,” and political donations, all of which have been documented in numerous reports and in some cases, by SBF’s own admission during his many interviews following the collapse of his businesses.

SEC Chair Gary Gensler reiterated his oft-repeated position that in fact, crypto trading platforms need to comply with existing securities laws in a quote in the release announcing the charges. This stands to be likely the most impactful and significant test of that position to date, since SBF’s specific charges in this action are allegations of violation of the Securities Act of 1933 and the Securities Exchange Act of 1934. One consequence if SBF is convicted could be that he’s banned from future securities trading beyond as an individual, and prevented from acting as a corporate officer or board member, in addition to monetary penalties.

This story is developing…

SEC, CFTC and SDNY attorney’s office charge FTX’s Sam Bankman-Fried with defrauding investors by Darrell Etherington originally published on TechCrunch

China’s generative AI rules set boundaries and punishments for misuse

As text-to-image generators and intelligent chatbots keep blowing people’s minds, China has swiftly moved to lay out what people can do with the tools built on powerful AI models. The country’s regulators clearly verge on the side of caution when it comes to the consequences of generative AI. That’s a contrast to the US, which has so far largely let the private sector make its own rules, raising ethical and legal questions.

The Cyberspace Administration of China, the country’s top internet watchdog, recently passed a regulation on “deep synthesis” technology, which it defines as “technology that uses deep learning, virtual reality, and other synthesis algorithms to generate text, images, audio, video, and virtual scenes.” The regulation applies to service providers that operate in China and will take effect on January 10.

Nothing from the set of rules stands out as a surprise as the restrictions are mostly in line with those that oversee other forms of consumer internet services in China, such as games, social media, and short videos. For instance, users are prohibited from using generative AI to engage in activities that endanger national security, damage public interest, or are illegal.

Such restrictions are made possible by China’s real-name verification apparatus. Anonymity doesn’t really exist on the Chinese internet as users are generally asked to link their online accounts to their phone numbers, which are registered with their government IDs. Providers of generative AI are similarly required to verify users using mobile phone numbers, IDs, or other forms of documentation.

China also unsurprisingly wants to censor what algorithms can generate. Service providers must audit AI-generated content and user prompts manually or through technical means. Baidu, one of the first to launch a Chinese text-to-image model, already filters politically sensitive content. Censorship is a standard practice across all forms of media in China. The question is whether content moderation will be able to keep up with the sheer volume of text, audio, images, and videos that get churned out of AI models.

The Chinese government should perhaps get some credit for stepping in to prevent the misuse of AI. For one, the rules ban people from using deep synthesis tech to generate and disseminate fake news. When the data used for AI training contains personal information, technology providers should follow the country’s personal information protection law. Platforms should also remind users to seek approval before they alter others’ faces and voices using deep synthesis technology. Lastly, this rule should alleviate some concerns around copyright infringement and academic cheating: In the case that the result of generative AI may cause confusion or misidentification by the public, the service provider should put a watermark in a prominent place to inform the public that it is a work by the machine.

Users in violation of these regulations will face punishments. Service operators are asked to keep records of illegal behavior and report them to the relevant authorities. On top of that, platforms should also issue warnings, restrict usage, suspend service, or even shut down the accounts of those who break the rules.

China’s generative AI rules set boundaries and punishments for misuse by Rita Liao originally published on TechCrunch

Vic.ai raises $52M, shows that automating accounting processes can be profitable

AI is an imperfect technology, but one task at which it excels is identifying patterns in vast amounts of data. That’s perhaps why a number of startups have sprung up in recent years offering AI-powered products aimed at automating accounting tasks, like redacting sensitive info in paperwork and filing forms across different departments. Simply put, it’s low-hanging fruit.

That’s not suggest accounting-focused AI isn’t profitable — on the contrary. As something of a case in point, Vic.ai, which bills itself as an accounting automation platform, today announced that it raised $52 million in a Series A funding round led by GGV Capital and ICONIQ Growth with participation from Cowboy Ventures and Costanoa Ventures.

The new cash brings Vic.ai’s total raised to $115 million, which CEO Alexander Hagerup says is being put toward customer acquisition in North America and adding purchase order match, payment execution and “spend intelligence” capabilities to the Vic.ai platform.

“In this next stage of growth, Vic.ai will capitalize on the market’s urgent need to automate other elements of finance by expanding its AI solution to manage and analyze all these tasks,” Hagerup told TechCrunch in an email interview. “‘AI’ has been a hot concept for many years, but large enterprises are just now getting to the point where they’re ready to adopt at scale, and they’re doing so with a focus on specific functions such as accounting and finance.”

Vic.ai was founded in 2017 by Hagerup and Kristoffer Roil, both Norwegian entrepreneurs. Prior to co-launching Vic.ai, Hagerup founded the Online Backup Company, a European backup and disaster recovery service provider. Roil spearheaded the founding of Telipol, a wireless carrier in Norway that was later acquired by Hudya Group, a Nordic fintech company.

Hagerup and Roil say that they built the first iteration of Vic.ai by training the platform on historical accounting data and processes from tens of thousands of public companies. The training data set contained accounting documents and corresponding journal entries that were reviewed by accountants at consultancy firms, including PricewaterhouseCoopers. This “live usage” helped to train Vic.ai’s machine learning algorithms over time, according to Hagerup, enabling it to provide nearly “complete autonomy” for transaction processing.

Vic.ai primarily handles invoice processing, leveraging the aforementioned algorithms to select invoices and expenses that meet a certain confidence threshold and automatically send them to approvers. The platform also determines the number of steps in an invoice approval process and automatically decides which employee needs to review each step.

Hagerup says that Vic.ai uses the invoices that it processes for customers to improve the performance of its algorithms. Data on the platform is retained for seven years, but Vic.ai maintains a “strict separation” of U.S. and EU data to comply with GDPR and makes an effort to discard personally identifiable information, he says.

Unlike some AI vendors, Vic.ai has the good fortune of occupying an industry that’s beginning to embrace automation. A 2021 survey of roughly 200 companies and financial institutions found that, while management priorities and IT availability remain the top blockers to automated workflows, just over a third of respondents said that they planned to spend “more or significantly more” on accounts payable automation technology within the next two years.

Vic.ai’s customer base reflects this. According to Hagerup, the company now has 60 enterprise customers, including HSB, Intercom and Armanino, with an active user base that’s grown 280% compared to 2021. Vic.ai’s contracted annual recurring revenue tripled in 2022 as compared to 2021 ($5 million), he added.

“As a true AI company, Vic.ai is changing accounts payable automation into true autonomy. While some of our competitors offer solutions based on rules and templates, our unique approach sets us apart from the status quo,” Hagerup said. “Moving operations from on-prem manual routines via email or spreadsheet into a cloud based solution with audit trails and compliance features is favorable to IT C-level managers … We’re well positioned for an economic downturn.”

Vic.ai competes against vendors such as Upflow, Glean AIand Quadient-owned YayPay in the accounts receivables management and automation space. (For context, the accounts payable automation market alone is estimated to grow from $1.9 billion in 2019 to $3.1 billion by 2024, according to MarketsandMarkets.) Tipalti is perhaps the most formidable, having raised $270 million at an $8.3 billion valuation last December.

To beat back its rivals, New York–based Vic.ai has expanded rapidly — it tripled its headcount to 106 employees this year — and invested in building out its AI-powered purchase order matching technology, which it sees as a key differentiator.

Vic.ai raises $52M, shows that automating accounting processes can be profitable by Kyle Wiggers originally published on TechCrunch

Image-generating AI can copy and paste from training data, raising IP concerns

Image-generating AI models like DALL-E 2 and Stable Diffusion can — and do — replicate aspects of images from their training data, researchers show in a new study, raising concerns as these services enter wide commercial use.

Co-authored by scientists at the University of Maryland and New York University, the research identifies cases where image-generating models, including Stable Diffusion, “copy” from the public internet data — including copyrighted images — on which they were trained.

The study hasn’t been peer reviewed yet, and the co-authors submitted it to a conference whose rules forbid media interviews until the research has been accepted for publication. But one of the researchers, who asked not to be identified by name, shared high-level thoughts with TechCrunch via email.

“Even though diffusion models such as Stable Diffusion produce beautiful images, and often ones that appear highly original and custom tailored to a particular text prompt, we show that these images may actually be copied from their training data, either wholesale or by copying only parts of training images,” the researcher said. “Companies generating data with diffusion models may need to reconsider wherever intellectual property laws are concerned. It is virtually impossible to verify that any particular image generated by Stable Diffusion is novel and not stolen from the training set.”

Images from noise

State-of-the-art image-generating systems like Stable Diffusion are what’s known as “diffusion” models. Diffusion models learn to create images from text prompts (e.g., “a sketch of a bird perched on a windowsill”) as they work their way through massive training data sets. The models — trained to “re-create” images as opposed to drawing them from scratch — start with pure noise and refine an image over time to make it incrementally closer to the text prompt.

It’s not very intuitive tech. But it’s exceptionally good at generating artwork in virtually any style, including photorealistic art. Indeed, diffusion has enabled a host of attention-grabbing applications, from synthetic avatars in Lensa to art tools in Canva. DeviantArt recently released a Stable Diffusion–powered app for creating custom artwork, while Microsoft is tapping DALL-E 2 to power a generative art feature coming to Microsoft Edge.

On the top are images generated by Stable Diffusion from random captions in the model’s training set. On the bottom are images that the researchers prompted to match the originals. Image Credits: Somepalli et al.

To be clear, it wasn’t a mystery that diffusion models replicate elements of training images, which are usually scraped indiscriminately from the web. Character designers like Hollie Mengert and Greg Rutkowski, whose classical painting styles and fantasy landscapes have become one of the most commonly used prompts in Stable Diffusion, havedecried what they see as poor AI imitations that are nevertheless tied to their names.

But it’s been difficult to empirically measure how often copying occurs, given diffusion systems are trained on upward of billions of images that come from a range of different sources.

To study Stable Diffusion, the researchers’ approach was to randomly sample 9,000 images from a data set called LAION-Aesthetics — one of the image sets used to train Stable Diffusion — and the images’ corresponding captions. LAION-Aesthetics contains images paired with text captions, including images of copyrighted characters (e.g., Luke Skywalker and Batman), images from IP-protected sources such as iStock, and art from living artists such as Phil Koch and Steve Henderson.

The researchers fed the captions to Stable Diffusion to have the system create new images. They then wrote new captions for each, attempting to have Stable Diffusion replicate the synthetic images. After comparing using an automated similarity-spotting tool, the two sets of generated images — the set created from the LAION-Aesthetics captions and the set from the researchers’ prompts — the researchers say they found a “significant amount of copying” by Stable Diffusion across the results, including backgrounds and objects recycled from the training set.

One prompt — “Canvas Wall Art Print” — consistently yielded images showing a particular sofa, a comparatively mundane example of the way diffusion models associate semantic concepts with images. Others containing the words “painting” and “wave” generated images with waves resembling those in the painting “The Great Wave off Kanagawa” by Katsushika Hokusai.

Across all their experiments, Stable Diffusion “copied” from the training data set roughly 1.88% of the time, the researchers say. That might not sound like much, but considering the reach of diffusion systems today — Stable Diffusion had created over 170 million images as of October, according to one ballpark estimate — it’s tough to ignore.

“Artists and content creators should absolutely be alarmed that others may be profiting off their content without consent,” the researcher said.

Implications

In the study, the co-authors note that none of the Stable Diffusion generations matched their respective LAION-Aesthetics source image and that not all models they tested were equally prone to copying. How often a model copied depended on several factors, including the size of the training data set; smaller sets tended to lead to more copying than larger sets.

One system the researchers probed, a diffusion model trained on the open source ImageNet data set, showed “no significant copying in any of the generations,” they wrote.

The co-authors also advised against excessive extrapolation from the study’s findings. Constrained by the cost of compute, they were only able to sample a small portion of Stable Diffusion’s full training set in their experiments.

More examples of Stable Diffusion copying elements from its training data set. Image Credits: Somepalli et al.

Still, they say that the results should prompt companies to reconsider the process of assembling data sets and training models on them. Vendors behind systems such as Stable Diffusion have long claimed that fair use — the doctrine in U.S. law that permits the use of copyrighted material without first having to obtain permission from the rightsholder — protects them in the event that their models were trained on licensed content. But it’s an untested theory.

“Right now, the data is curated blindly, and the data sets are so large that human screening is infeasible,” the researcher said. “Diffusion models are amazing and powerful, and have showcased such impressive results that we cannot jettison them, but we should think about how to keep their performance without compromising privacy.”

For the businesses using diffusion models to power their apps and services, the research might give pause. In a previous interview with TechCrunch, Bradley J. Hulbert, a founding partner at law firm MBHB and an expert in IP law, said he believes that it’s unlikely a judge will see the copies of copyrighted works in AI-generated art as fair use — at least in the case of commercial systems like DALL-E 2. Getty Images, motivated out of those same concerns, has banned AI-generated artwork from its platform.

The issue will soon play out in the courts. In November, a software developer filed a class action lawsuit against Microsoft, its subsidiary GitHub and business partner OpenAI for allegedly violating copyright law with Copilot, GitHub’s AI-powered, code-generating service. The suit hinges on the fact that Copilot — which was trained on millions of examples of code from the internet — regurgitates sections of licensed code without providing credit.

Beyond the legal ramifications, there’s reason to fear that prompts could reveal, either directly or indirectly, some of the more sensitive data embedded in the image training data sets. As a recent Ars Technica report revealed, private medical records — as many as thousands — are among the photos hidden within Stable Diffusion’s set.

The co-authors propose a solution in the form of a technique called differentially private training, which would “desensitize” diffusion models to the data used to train them — preserving the privacy of the original data in the process. Differentially private training usually harms performance, but that might be the price to pay to protect privacy and intellectual property moving forward if other methods fail, the researchers say.

“Once the model has memorized data, it’s very difficult to verify that a generated image is original,” the researcher said. “I think content creators are becoming aware of this risk.”

Image-generating AI can copy and paste from training data, raising IP concerns by Kyle Wiggers originally published on TechCrunch

California’s finance department confirms breach as LockBit claims data theft

California’s Department of Finance has confirmed it’s investigating a “cybersecurity incident” after the prolific LockBit ransomware group claims to have stolen confidential data from the agency.

The California Office of Emergency Services (Cal OES) in a statement on Monday described the threat as an “intrusion” that was “identified through coordination with state and federal security partners.”

The statement did not provide any specifics about the nature of the incident, who was involved, or whether any information had been stolen. The California Department of Finance did not respond to TechCrunch’s questions prior to publication.

“While we cannot comment on specifics of the ongoing investigation, we can share that no state funds have been compromised, and the department of finance is continuing its work to prepare the governor’s budget that will be released next month,” the statement said.

While state officials remain tight-lipped about the incident, the notorious LockBit ransomware gang on Monday claimed responsibility for the attack. In a post on its dark web leak site seen by TechCrunch, the Russia-affiliated group claims to have stolen 76GB of files from the agency, including “databases, confidential data, financial documents, certification, IT documents, and sexual proceedings in court.”

Screenshots shared by LockBit lend some weight to its claim, but the ransomware gang’s claims should still be taken with skepticism. In June, the group claimed it breached cybersecurity company Mandiant, which was later revealed as false. The ransomware group faked the incident in response to a Mandiant investigation that demonstrated significant overlaps between LockBit and the U.S.-sanctioned Evil Corp group.

LockBit has given California’s finance department a December 24 deadline to pay its as-yet unspecified ransom demand. If the agency fails to pay, the ransomware gang is threatening to leak the entire cache of stolen data.

This latest breach comes just weeks after the U.S. Department of Justice in November charged a dual Russian and Canadian citizen linked to LockBit over his alleged involvement in attacks targeting critical infrastructure and large industrial groups worldwide. At the time, the DOJ said that LockBit has claimed at least 1,000 victims in the United States and has extracted tens of millions of dollars in actual ransom payments from their victims.

California’s finance department confirms breach as LockBit claims data theft by Carly Page originally published on TechCrunch

A quick guide to all the checkmarks and badges on Twitter

Elon Musk-led Twitter is shaking up its verification system. Instead of one checkmark, now there are multicolored checkmarks to denote different things. This could be very confusing for users to track. So here’s a handy guide to all checkmarks and badges on the social network.

Checkmarks

Blue checkmark: It currently means two things. 1) Account with this checkmark is a legacy verified account (read: verified in the pre-Musk era). This was used to mark a notable account representing a politician, a celebrity, or an activist. This was to prove that the said person is indeed who they are claiming to be. Musk has said that the legacy checkmark will go away in a few months. 2) Account with the blue check mark can also mean that the person has subscribed to it. The only way to know the difference between the two of them is to click on the blue checkmark.

Image Credits: Twitter

Gold checkmark: Twitter debuted this checkmark earlier this week to note that the account belongs to a company or an organization. The social network also said that it is working on a Twitter Blue for the Business plan so companies can apply to get a checkmark.

Image Credits: Twitter

Official (Grey) checkmark: This newly introduced secondary checkmark is a way to certify certain profiles, such as accounts from governments, political parties, media houses, and brands. Twitter also says this applies to “some other public figures” without any specification.

The official checkmark seemingly serves the same purpose as the legacy verification system. But it can exist alongside a blue or a gold checkmark.

Labels and badges

State-affiliated media:Twitter applies this label (podium icon) to media houses that don’t have editorial independence. That means the state has editorial and financial control of that media entity. So, entities like BBC and NPR don’t come under that purview. The label, introduced two years ago, applies to media accounts along with their prominent editors and reports. The social network also doesn’t amplify or recommend these accounts or their tweets.

Image Credits: Twitter

Government accounts:Introduced along with state-affiliated media labels, the government account label (flag icon) aims to notify that this is an account belonging to a government entity or operated by an official. Twitter doesn’t label accounts that are not official communication channels for government personnel.

Image Credits: Twitter

Twitter says that in limited cases, where a government is depressing people’s voices, it doesn’t recommend of amplifying that account.

US election candidates:This label notes that the account belongs to a person who is participating in the US midterms or running for House of Representatives, U.S. Senate, or Governor. The company has used these labels in multiple elections now.

Image Credits: Twitter

Automated labels:Twitter started testing a label for good and useful bots last year and officially launched it this February. However, given Musk’s “war against bots,” there is no guarantee that this label will survive.

Twitter introduces a new label to identify bots on Twitter profiles

Verified phone number badge: Twitter had been testing this label before Musk took over. But recently, the company started rolling it out for users in India. Technically, Twitter doesn’t assign this label to anyone. If a user in India has verified their phone number, they can choose to show it on their profile.

Image Credits: TechCrunch

Twitter profile category: This is another self-attested label for businesses. That means professional accounts can identify themselves with labels like “Coffee Shop,” “Journalist” and “Optician” (see the example above).

As Twitter is a Musk-led company, we never know when we will see a new label or the removal of one of the existing ones. We’ll keep this story updated for that situation.

A quick guide to all the checkmarks and badges on Twitter by Ivan Mehta originally published on TechCrunch

Pin It on Pinterest