VFX studio Perception takes us inside the tech of ‘Black Panther: Wakanda Forever’

Visual effects artists, especially the ones that work on Marvel films, are in more demand than ever and rarely get enough credit. Nearly a dozen VFX houses contribute to one Marvel project. Perception, the Emmy-nominated design and VFX lab that has worked on 33 Marvel films and series, was assigned the challenging task of designing most of the technology seen in “Black Panther: Wakanda Forever.”

We spoke with the team at Perception about their contributions to the movie, which includes holograms, sentient AI, HUDs (head-up displays), and interfaces, along with the captivating main-end-on title sequence and the emotional tribute to the late actor Chadwick Boseman (who plays Black Panther) at the beginning of the movie. Perception worked on approximately 90 shots in total, the company told us.

(Heads up that this TechCrunch story contains movie spoilers.)

According to VFX supervisor Geoffrey Baumann, around 2,233 shots in “Black Panther: Wakanda Forever” required VFX, reported The Hollywood Reporter.

Conceptualizing the visionary technology seen in the film was probably no easy feat. The African nation of Wakanda has by far the most advanced tech in the entire Marvel Cinematic Universe (MCU). For instance, the fictional country has “vibranium” as its main resource, a metallic ore that has energy-manipulating properties, absorbs sound and can be utilized for bullet-proof gear.

Luckily, Perception has been in the industry for 20 years and played an important part in the first “Black Panther” film, designing, developing, animating and rendering Wakandan technology, such as Kimoyo beads, a cutting-edge communication device.

“Since the production of the first ‘Black Panther’ film, our team has been deeply involved with the world of Wakanda,” said Eric Daly, Director of Production at Perception. “Marvel Studios asked us to return for this film to design the characters’ technology and create the main-on-end title sequence due to our deep-rooted connections with Wakanda.”

When creating the main-on-end title sequence for “Black Panther: Wakanda Forever,” Perception wanted it to be “tied into the intense emotions and somber yet joyful tone at the end of the film,” Daly added.

The title sequence plays before the end-credit scene. It begins with a beautiful shot of Shuri’s funeral ceremonial robe igniting into flames. The cloth burns slowly and eventually reveals the Black Panther suit.

Image Credits: Marvel Studios

Throughout the film, the Princess of Wakanda, Shuri (played by Letitia Wright), is grieving the loss of her brother, T’Challa, a.k.a Black Panther. Her mother, Queen Ramonda (Angela Bassett), suggests that Shuri burn her funeral clothing in a ritual. Shuri tells her, “It won’t be just the clothes I burn. It will be the world.” The anger that Shuri has at that moment, contrasted with when she’s in the acceptance stage of grief at the end, is captured perfectly in Perception’s sequence.

“The title sequence for this movie is so emotionally resonant. It allows Shuri to have this moment where she can sit in her grief and mourn her brother, but it’s also for the audience to grieve the loss of Chadwick Boseman. It’s for the creators to grieve the loss of their friend. There were a lot of layers to the sequence that made it very emotionally powerful,” Doug Appleton, Chief Creative Director at Perception, said to TechCrunch.

Boseman, the actor who played Black Panther, died of Stage 4 colon cancer in 2020. Instead of recasting Boseman, who had been such an inspiration for fans– particularly those in the Black community—Marvel decided to incorporate the devastating death of the Black Panther star into “Black Panther: Wakanda Forever.”

In addition to the powerful main-on-end title sequence, Perception also created the opening animation of the Marvel logo, which includes clips of Boseman’s character. The animation was also featured in the first “Black Panther” movie, which can be streamed on Disney+.

Long live the King. #WakandaForever pic.twitter.com/uW1KisOkTq

— Marvel Entertainment (@Marvel) November 29, 2020

Christian Haberkern, Art Director and Cinematographer at Perception filmed the main-on-end title sequence with a Panavised Sony Venice 2 camera with Panavision Auto Panatar Super Speed Anamorphic Lens, which was provided by Autumn Durald Arkapaw, Director of Photography (DP) on “Black Panther: Wakanda Forever” and “Loki.”

(Panavision is a proprietary name for a type of wide-screen camera lens. Anamorphic Panavision lenses allow filmmakers to capture a wider field of view.)

“It is rare that we get to work so closely with the DP and use the actual camera and the lenses that they used. So that’s something very unique that we don’t always have the opportunity to do,” Appleton said.

Sony’s Venice 2 camera can cost up to roughly $55K and is among the top choices of digital cameras for acclaimed DPs because of its high degree of image quality.

The camera Haberkern used, in particular, was custom-built for films that work with fire. Haberkern noted just how “insane” it felt to film with such highly advanced equipment. “The lens is specialized to make the light and fire have this unique bokeh,” he said.

“Fire is unruly and unpredictable, so we had to invent ways to try to control the flames while we filmed,” said Greg Herman, Creative Director at Perception. “One of our methods was using butane fuel to coat the fabric and direct the flame to ignite a certain way. With this, we were able to create a simulation of a flame so we could capture precise and detailed shots.”

Appleton chimed in, saying, “While shooting the sequence, we made sure to get as much footage as possible because we didn’t know exactly what pieces would go where, and when working with something as unpredictable as billowing fabric and fire we wanted to embrace the unexpected moments that we could never have planned for.”

Several applications and software were used for the editing process. For the majority of the shots, the team used Premiere and then After Effects for final color. Meanwhile, the shot with the Black Panther suit was CG, which was accomplished in Cinema 4D, composited in Nuke, and then transferred into After Effects, Appleton explained.

In terms of other scenes in the film, Perception had a part in shaping the entire story from start to finish, collaborating with the director Ryan Coogler, executive producer Nate Moore, as well as the filmmakers, writers and VFX team for a total of two years. Perception claims to have developed “every facet of each piece of technology you see on screen.”

In the opening scene, we see Shuri in the lab trying to recreate the heart-shaped herb to save her brother, T’Challa, who is dying. Shuri interacts with a helix structure, touching LED balls that glow red and green as she gives Griot, the sentient AI, various commands. The physical form of the helix was done by another VFX studio, Rise, however, Perception helped form that idea, plus all the other tech like Griot, the head-up displays, and other graphics.

Perception also helped design Riri’s HUD for when she’s fighting in her superhero suit. Riri (played by Dominique Thorne) is an M.I.T student and brilliant innovator known as “Ironheart” in the comics. The co-founders of Perception, Jeremy Lasky and Danny Gonzalez, pointed out to us that “Iron Man 2” was the first big feature they worked on with Marvel. So, this likely inspired Riri’s Ironheart suit.

Another cool idea that Perception conceptualized was the hydro bombs that the Talokans used as weapons throughout the movie. Marvel wanted it to look like “a lake compressed into a ball,” Appleton explained to TechCrunch. “So we did a little bit of work on that.”

Perception was founded by Lasky and Gonzalez in 2001. The New Jersey-based VFX studio has worked on technology and title sequences for many Marvel titles, such as “The Avengers,” “Thor: The Dark World,” “Captain America: The Winter Soldier,” “Doctor Strange,” “Spider-Man: Homecoming,” “Black Widow,” “WandaVision,” “Loki,” “Moon Knight,” and lots more. Perception is also confirmed to be working on the upcoming films “Ant-Man and The Wasp: Quantumania” and “Guardians of the Galaxy Vol. 3.”

“Black Panther: Wakanda Forever” premiered in theaters on November 11. If you didn’t get the chance to see it yet, the movie will most likely stream on Disney+ sometime in January, but no premiere date has been officially announced yet. The movie grossed over $770 million at the box office worldwide. It also scored two Golden Globes nominations.

VFX studio Perception takes us inside the tech of ‘Black Panther: Wakanda Forever’ by Lauren Forristal originally published on TechCrunch

Holiday shipping is easier this year, but the tech is still lagging

Compared to last year’s holiday season, major maritime trade routes are operating relatively smoothly, while shipping rates are returning to Earth and ports are moving cargo at a steady clip.

Now that’s all good news for businesses and consumers worried about inflation and talk of recession, but those improvements are misleading.

A deeper look reveals global shipping speeds aren’t back to pre-pandemic levels, and serious challenges persist in supply chains that foreshadow even bigger problems. If we don’t act and improve shipping technologies, the logjams we’ve had to endure for the past two years will become commonplace.

New season, new problems

This year’s improvements in shipping largely reflect a pullback in consumption rather than any improvement in the underlying infrastructure.

Businesses are still taking too long to ship their goods from Asia to the United States or Europe. While it’s better than the record delays we saw during the height of the pandemic, it still takes 69 days for businesses to ship goods from China to U.S. ports, nearly double the time it took before the pandemic. This is happening against the backdrop of a slower economy, and shipping company Maersk is forecasting a 2% to 4% drop in world demand for containers this year.

A host of other problems, both new and familiar, are plaguing global logistics in 2022 such as volatile fuel prices, protracted labor negotiations and worker shortages.

Meanwhile, businesses large and small are signaling problems. Retail giants such as Target and Wal-Mart are struggling with inventory build-ups ahead of the holiday season, as many companies were hit by a surge in imports that suddenly arrived after shipping delays eased up. This will squeeze profits and cause unrest among investors while creating pain for small and medium businesses, which are finding it hard to reserve space at warehouses already overrun with goods from big retailers.

Holiday shipping is easier this year, but the tech is still lagging by Ram Iyer originally published on TechCrunch

Even the FBI says you should use an ad blocker

This holiday season, consider giving the gift of security with an ad blocker.

That’s the takeaway message from an unlikely source — the FBI — which this week issued an alert warning that cybercriminals are using online ads in search results with the ultimate goal of stealing or extorting money from victims.

In a pre-holiday public service announcement, the FBI said that cybercriminals are buying ads to impersonate legitimate brands, like cryptocurrency exchanges. Ads are often placed at the top of search results but with “minimum distinction” between the ads and the search results, the feds say, which can look identical to the brands that the cybercriminals are impersonating. Malicious ads are also used to trick victims into installing malware disguised as genuine apps, which can steal passwords and deploy file-encrypting ransomware.

One of the FBI’s recommendations for consumers is to install an ad blocker.

As the name suggests, ad blockers are web browser extensions that broadly block online ads from loading in your browser, including in search results. By blocking ads, would-be victims are not shown any ads at all, making it easier to find and access the websites of legitimate brands.

Ad blockers don’t just remove the enormous bloat from websites, like auto-playing video and splashy ads that take up half the page, which make your computer fans run like jet engines. Ad blockers are also good for privacy, because they prevent the tracking code within ads from loading. That means the ad companies, like Google and Facebook, cannot track you as you browse the web, or learn which websites you visit, or infer what things you might be interested in based on your web history.

The good news is that some of the best ad blockers out there are free, and can be installed and largely forgotten.

If you’re looking for a widely recommended ad blocker, uBlock Origin is a simple, low-memory ad blocker that works for most browsers, like Google Chrome, Mozilla Firefox, Microsoft Edge and Opera, plus the extension is open-source so anyone can look at the code and make sure it’s safe to run.

You can also get content blockers for Android and iOS, which block ads from loading on your device.

Of course, you can switch your ad blocker off any time you want, and even allow or deny ads for entire websites. Ads are still an important part of what keeps the internet largely free and accessible, including TechCrunch, even as subscriptions and paywalls are increasingly becoming the norm.

While you’re here, if an ad blocker sounds right for you, consider these other web browser extensions and features that make browsing the web safer and more private.

Even the FBI says you should use an ad blocker by Zack Whittaker originally published on TechCrunch

Okta confirms another breach after hackers steal source code

Okta has confirmed that it’s responding to another major security incident after a hacker accessed its source code following a breach of its GitHub repositories.

The identity and authentication giant said in a statement on Wednesday that it was informed by GitHub about “suspicious access” to its code repositories earlier this month. Okta has since concluded that hackers used this malicious access to copy code repositories associated with Workforce Identity Cloud (WIC), the organization’s enterprise-facing security solution.

“As soon as Okta learned of the possible suspicious access, we promptly placed temporary restrictions on access to Okta GitHub repositories and suspended all GitHub integrations with third-party applications,” Okta said in a statement.

When asked by TechCrunch, Okta declined to say how attackers managed to gain access to its private repositories.

Okta says there was no unauthorized access to the Okta service or customer data, and products related to Auth0 — which it acquired in 2021 — are not impacted. “Okta does not rely on the confidentiality of its source code for the security of its services. The Okta service remains fully operational and secure,” Okta said.

The company said that since it was alerted to the breach, it has reviewed recent access to Okta software repositories, reviewed all recent commits to Okta software repositories, and rotated GitHub credentials. Okta said it has also notified law enforcement.

Okta did not explicitly say if it has the technical means, such as logs, to detect what, if any, of its own systems were accessed or what other data may have been exfiltrated.

The company’s latest incident was first reported by Bleeping Computer earlier this week, prior to Okta’s announcement.

Earlier this year, Okta was targeted by the now-notorious Lapsus$ extortion group, which gained access to the account of a customer support engineer at Sykes, one of Okta’s third-party service providers, and posted screenshots of Okta’s apps and systems. Okta experienced a second compromise in August this year after it was targeted by another hacking campaign that breached more than 100 organizations, including Twilio and DoorDash.

Okta confirms another breach after hackers steal source code by Carly Page originally published on TechCrunch

The future of milk is … milk?

Milk is polarizing: To some, it’s a refreshing beverage that pairs well with cookies. For others, it’s a cursed liquid that causes tummy troubles. Even with the shelves of alt milks crowding grocery store coolers these days, the U.S. milk industry is a $15 billion category with 90% penetration, according to John Talbot, the CEO of the California Milk Advisory Board.

However, the industry is not known for sustainability or breakthrough creations. Milk processing plants focus on one thing — do it well and do it efficiently — but don’t do small runs or get involved in product development or innovation. It’s taken a while for the industry to realize that there’s a need for ingenuity, but it’s now embracing fresh ideas, Talbot said.

He believes some of that has to do with the fact that the milk category is declining for a number of reasons: Fewer children, the biggest milk drinkers, are being born, and more people are choosing faster breakfasts over sitting down for a bowl of cereal. That’s not to mention the aforementioned number of alt milks out there. But it’s not just dairy alternatives — water plays a big role in people’s drink choices, too, Talbot added.

Seeing a need for innovation, the California Milk Advisory Board turned to those who do it best: startups.

The organization has been hosting the Real California Milk Excelerator for the last four years to find interesting use cases for dairy and to connect entrepreneurs with processors so that there is engagement on both sides.

“We now rely on ideas coming in from these entrepreneurs to help processing plants through the innovation process,” Talbot added.

The future of milk is … milk? by Christine Hall originally published on TechCrunch

YouTube secures NFL Sunday Ticket in landmark streaming deal

YouTube and the National Football League announced on Thursday that the two have reached a deal for the NFL Sunday Ticket. Starting next season, NFL Sunday Ticket will be available on two of YouTube’s subscription businesses as an add-on package on YouTube TV and standalone a-la-carte on YouTube Primetime Channels. The NFL Sunday Ticket has been available in the U.S. exclusively via DirecTV since 1994, but that will soon change in a major shakeup of sports streaming.

The Wall Street Journalreports that YouTube is paying $2 billion per season in a multi-year agreement for the NFL Sunday Ticket package, according to people familiar with the matter.

“YouTube has long been a home for football fans, whether they’re streaming live games, keeping up with their home team, or watching the best plays in highlights,” said YouTube CEO Susan Wojcicki in press release. “Through this expanded partnership with the NFL, viewers will now also be able to experience the game they love in compelling and innovative ways through YouTube TV or YouTube Primetime Channels. We’re excited to continue our work with the NFL to make YouTube a great place for sports lovers everywhere.”

The deal will give YouTube viewers the ability to stream nearly all of the NFL games on Sunday beginning in the 2023 season, with the exception of those aired on traditional television in their local markets. During the week, games will continue to be available on other networks, including ESPN, ABC and Amazon Prime video.

Image Credits: YouTube

The news comes as Apple was considered to be the front-runner for the Sunday Ticket, up until the tech giant reportedly backed out. Now, the package has gone to one of its main rivals.

“We’re excited to bring NFL Sunday Ticket to YouTube TV and YouTube Primetime Channels and usher in a new era of how fans across the United States watch and follow the NFL,” said NFL Commissioner Roger Goodell in a statement. “For a number of years we have been focused on increased digital distribution of our games and this partnership is yet another example of us looking towards the future and building the next generation of NFL fans.”

YouTube’s deal marks a major shift in the media industry, which has been leaning more and more toward streaming over the past few years. Although consumers switched from cable to streamings services like Netflix and Hulu, sports fans still watched games via traditional television. But, we have slowly seen this shift, as Apple has gained the rights to Major League Baseball and Major League Soccer games, and Amazon made a deal last year for the rights to Thursday night NFL games. YouTube’s new deal cements this shift even more.

YouTube secures NFL Sunday Ticket in landmark streaming deal by Aisha Malik originally published on TechCrunch

Automotus raises $9M to scale automated curb management tech

The new mobility landscape has made curb space in cities a hot commodity. No longer are curbs just for buses, taxis, deliveries and parking. Now those traditional use cases have to contend with bike lanes, ride-hail, same-day deliveries, dockless vehicles and more. As a result, cities and investors are starting to prioritize software that helps manage curb space.

Enter Automotus, a four-year-old startup that has just closed a $9 million seed round to advance its automated curb management solution. The company says its tech can reduce congestion and emissions by up to 10%; reduce double-parking hazards by 64%; increase parking turnover by 26%; and increase parking revenue for cities by over 500%.

Automotus works with cities like Santa Monica, Los Angeles, Pittsburgh, Omaha and Bethlehem to automate payments for vehicle unloading and parking, enforce curb violations and manage preferred loading zones and discounted rates for commercial EVs, the startup said.

“We also integrate with other mobility services providers to help cities get a more comprehensive view of how the public right of way is being used and by which modes for planning, policy, and pricing efforts,” Jordan Justus, CEO of Automotus, told TechCrunch.

In March 2021, Automotus raised $1.2 million in seed funding, so the company has managed to tack on an additional $7.8 million in the intervening year and a half. The most recent funds were led by City Rock Ventures, Quake Capital, Bridge Investments, Unbridled Ventures, Keiki Capital, NY Angels, Irish Angles, SUM Ventures, and LA’s Cleantech Incubator Impact Fund.

“The bulk of the funding will be used to execute and support deployments in at least 15 new cities coming online in 2023,” said Justus. “We have a big year of launches ahead of us and are laser-focused on delivering the best possible solutions for our clients and continuing to scale up previous pilots.”

While Automotus is largely offering a Software-as-a-Service product, installing the right hardware is an important element in collecting data. In its partner cities, the startup deploys cellular-enabled cameras equipped with Automotus’s proprietary computer vision technology. The cameras are mounted onto traffic and street lights in areas where you might see plenty of loading and unloading or in zero emissions delivery zones.

With Automotus’s tech, there’s no need to download mobile apps or use meters. The cameras capture images of license plates and automatically collect data, issue invoices for parking or send out citations if a vehicle has been non-compliant to the city’s regulations. The technology blurs any faces and de-identifies data to ensure privacy of street users.

Automotus raises $9M to scale automated curb management tech by Rebecca Bellan originally published on TechCrunch

Avoid 3 common sales mistakes startups make during a downturn

More than 150,000 workers lost their jobs this year as layoffs swept across the tech landscape since June. Constant news cycles have analyzed every aspect of these staff reductions for meaning and lessons. How did we get here? How are companies managing employees? Are there more layoffs on the way?

And, critically, what’s next for tech? Investors are now demanding profitability over growth. This extreme change in the business model investors want has left companies with difficult decisions ahead and no playbook. Without the liberty a low-cost capital environment affords, for investors, new ventures that promise uncertain returns are a thing of the past, or at least, a much smaller focus.

What every company needs now is efficient sales.

But there is a big difference between knowing that you need efficient revenue and knowing how to get it. Leaner teams, fewer resources, and a tough macro environment mean that CROs are forced to make big changes to budgets, staffing and how they market and sell.

But maintaining revenue while the CFO is cutting costs by 5%-20% is not an easy task for anyone — and doing more of the same won’t get you there.

The unfortunate truth is that unless you move beyond the same old buying group, you won’t move the needle.

The biggest mistakes to avoid

Preliminary data from Databook shows that an unusually high percentage of companies globally are in the midst of shifting their strategic priorities. Since these are typically multiyear commitments, this unprecedented shift dramatically changes the sales landscape for tech startups.

Holding tight to traditional sales incentives and levers won’t yield the step change that is needed to win.

Don’t raise pricing

Most startups are reliant on VC funding, and in today’s market, VCs are looking for a clear path to profitability. One seemingly “easy” way to improve margins is to increase pricing.

This is a fix you can only try once; you don’t want to keep raising prices in a competitive market. This is a temporary workaround at best, and it can easily backfire, as higher prices during a downturn can erode customer trust over the long run. It can also result in fewer renewals when there is less budget available.

Avoid 3 common sales mistakes startups make during a downturn by Ram Iyer originally published on TechCrunch

A brief history of diffusion, the tech at the heart of modern image-generating AI

Text-to-image AI exploded this year as technical advances greatly enhanced the fidelity of art that AI systems could create. Controversial as systems like Stable Diffusion and OpenAI’s DALL-E 2 are, platforms including DeviantArt and Canva have adopted them to power creative tools, personalize branding and even ideate new products.

But the tech at the heart of these systems is capable of far more than generating art. Called diffusion, it’s being used by some intrepid research groups to produce music, synthesize DNA sequences and even discover new drugs.

So what is diffusion, exactly, and why is it such a massive leap over the previous state of the art? As the year winds down, it’s worth taking a look at diffusion’s origins and how it advanced over time to become the influential force that it is today. Diffusion’s story isn’t over — refinements on the techniques arrive with each passing month — but the last year or two especially brought remarkable progress.

The birth of diffusion

You might recall the trend of deepfaking apps several years ago — apps that inserted people’s portraits into existing images and videos to create realistic-looking substitutions of the original subjects in that target content. Using AI, the apps would “insert” a person’s face — or in some cases, their whole body — into a scene, often convincingly enough to fool someone on first glance.

Most of these apps relied on an AI technology called generative adversarial networks, or GANs for short. GANs consist of two parts: a generator that produces synthetic examples (e.g. images) from random data and a discriminator that attempts to distinguish between the synthetic examples and real examples from a training dataset. (Typical GAN training datasets consist of hundreds to millions of examples of things the GAN is expected to eventually capture.) Both the generator and discriminator improve in their respective abilities until the discriminator is unable to tell the real examples from the synthesized examples with better than the 50% accuracy expected of chance.

Sand sculptures of Harry Potter and Hogwarts, generated by Stable Diffusion. Image Credits: Stability AI

Top-performing GANs can create, for example, snapshots of fictional apartment buildings. StyleGAN, a system Nvidia developed a few years back, can generate high-resolution head shots of fictional people by learning attributes like facial pose, freckles and hair. Beyond image generation, GANs have been applied to the 3D modeling space and vector sketches, showing an aptitude for outputting video clips as well as speech and even looping instrument samples in songs.

In practice, though, GANs suffered from a number of shortcomings owing to their architecture. The simultaneous training of generator and discriminator models was inherently unstable; sometimes the generator “collapsed” and outputted lots of similar-seeming samples. GANs also needed lots of data and compute power to run and train, which made them tough to scale.

Enter diffusion.

How diffusion works

Diffusion was inspired by physics — being the process in physics where something moves from a region of higher concentration to one of lower concentration, like a sugar cube dissolving in coffee. Sugar granules in coffee are initially concentrated at the top of the liquid, but gradually become distributed.

Diffusion systems borrow from diffusion in non-equilibrium thermodynamics specifically, where the process increases the entropy — or randomness — of the system over time. Consider a gas — it’ll eventually spread out to fill an entire space evenly through random motion. Similarly, data like images can be transformed into a uniform distribution by randomly adding noise.

Diffusion systems slowly destroy the structure of data by adding noise until there’s nothing left but noise.

In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in machine learning aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.

Image Credits: OpenBioML

Diffusion systems have been around for nearly a decade. But a relatively recent innovation from OpenAI called CLIP (short for “Contrastive Language-Image Pre-Training”) made them much more practical in everyday applications. CLIP classifies data — for example, images — to “score” each step of the diffusion process based on how likely it is to be classified under a given text prompt (e.g. “a sketch of a dog in a flowery lawn”).

At the start, the data has a very low CLIP-given score, because it’s mostly noise. But as the diffusion system reconstructs data from the noise, it slowly comes closer to matching the prompt. A useful analogy is uncarved marble — like a master sculptor telling a novice where to carve, CLIP guides the diffusion system toward an image that gives a higher score.

OpenAI introduced CLIP alongside the image-generating system DALL-E. Since then, it’s made its way into DALL-E’s successor, DALL-E 2, as well as open source alternatives like Stable Diffusion.

What can diffusion do?

So what can CLIP-guided diffusion models do? Well, as alluded to earlier, they’re quite good at generating art — from photorealistic art to sketches, drawings and paintings in the style of practically any artist. In fact, there’s evidence suggesting that they problematically regurgitate some of their training data.

But the models’ talent — controversial as it might be — doesn’t end there.

Researchers have also experimented with using guided diffusion models to compose new music. Harmonai, an organization with financial backing from Stability AI, the London-based startup behind Stable Diffusion, released a diffusion-based model that can output clips of music by training on hundreds of hours of existing songs. More recently, developers Seth Forsgren and Hayk Martiros created a hobby project dubbed Riffusion that uses a diffusion model cleverly trained on spectrograms — visual representations — of audio to generate ditties.

Beyond the music realm, several labs are attempting to apply diffusion tech to biomedicine in the hopes of uncovering novel disease treatments. Startup Generate Biomedicines and a University of Washington team trained diffusion-based models to produce designs for proteins with specific properties and functions, as MIT Tech Review reported earlier this month.

The models work in different ways. Generate Biomedicines’ adds noise by unraveling the amino acid chains that make up a protein and then puts random chains together to form a new protein, guided by constraints specified by the researchers. The University of Washington model, on the other hand, starts with a scrambled structure and uses information about how the pieces of a protein should fit together provided by a separate AI system trained to predict protein structure.

Image Credits: PASIEKA/SCIENCE PHOTO LIBRARY/Getty Images

They’ve already achieved some success. The model designed by the University of Washington group was able to find a protein that can attach to the parathyroid hormone — the hormone that controls calcium levels in the blood — better than existing drugs.

Meanwhile, over at OpenBioML, a Stability AI-backed effort to bring machine learning-based approaches to biochemistry, researchers have developed a system called DNA-Diffusion to generate cell-type-specific regulatory DNA sequences — segments of nucleic acid molecules that influence the expression of specific genes within an organism. DNA-Diffusion will — if all goes according to plan — generate regulatory DNA sequences from text instructions like “A sequence that will activate a gene to its maximum expression level in cell type X” and “A sequence that activates a gene in liver and heart, but not in brain.”

What might the future hold for diffusion models? The sky may well be the limit. Already, researchers have applied it to generating videos, compressing images and synthesizing speech. That’s not to suggest diffusion won’t eventually be replaced with a more efficient, more performant machine learning technique, as GANs were with diffusion. But it’s the architecture du jour for a reason; diffusion is nothing if not versatile.

A brief history of diffusion, the tech at the heart of modern image-generating AI by Kyle Wiggers originally published on TechCrunch

Pin It on Pinterest