Text-to-image AI exploded this year as technical advances dramatically enhanced the fidelity of artwork that AI systems can produce. Controversial as systems like stable diffusion and OpenAI DALL-E 2 That is, platforms including DeviantArt and Canva have used them to power creative tools, personalize brands, and even idealize new products.
But the core technology of these systems is capable of doing more than creating works of art. Called diffusion, it is being used by some intrepid research groups to make music, synthesize DNA sequences, and even discover new drugs.
So what exactly is diffusion and why is it such a leap from the previous state of art? As a year goes by, it’s worth looking at the origins of the diffusion and how it evolved over time to become the influential force it is today. Diffusion’s story isn’t over yet — technical improvements come with each passing month — but the past year or two has in particular yielded significant advances.
The Birth of Diffusion
You might recall the deepfaking app trend from a few years ago — the app inserting portraits of people into existing images and videos to create lifelike replacements of the original objects in the content. that goal. Using AI, apps will “insert” a person’s face — or in some cases, their entire body — into a scene, often convincing enough to fool someone at first sight. .
Most of these apps are based on AI technology called spawn rival network, abbreviated as GAN. GAN consists of two parts: a generator generate synthetic examples (e.g. images) from random data and discriminator try to distinguish between synthetic and real examples from the training data set. (The typical GAN training dataset includes hundreds to millions of examples of things the GAN is expected to eventually capture.) Both the generator and the discriminator improve the likelihood. their respective performance until the discriminator is unable to distinguish real examples from synthesized examples better than the expected 50% accuracy of chance.
For example, the best performing GAN can create a snapshot of the fictitious apartment building. StyleGAN, a system that Nvidia developed a few years ago, can generate high-resolution fictitious headshots by learning attributes such as facial posture, freckles, and hair. In addition to image generation, GAN has been applied to the 3D modeling space and vector sketchshow the ability to output recordings as stated and even repeating instrument pattern in songs.
In practice, however, GANs suffer from several shortcomings due to their architecture. Simultaneous training of generator and discriminator models is inherently unstable; sometimes the generator “crashes” and outputs a lot of the same samples. GANs also need a lot of data and computing power to run and train, which makes them difficult to scale.
How Diffusion Works
Physics-inspired diffusion — a process in physics in which something moves from an area of higher concentration to an area of lower concentration, like a sugar cube dissolved in coffee. get high. The sugar particles in coffee are initially concentrated in the upper part of the liquid, but gradually become dispersed.
Diffusion system borrowed from internal diffusion unbalanced thermodynamics especially, where the process increases the entropy – or randomness – of the system over time. Consider a gas – it will eventually spread to fill the entire space through random motion. Similarly, data such as images can be transformed into a uniform distribution by adding random noise.
The diffuse system slowly destroys the data structure by adding noise until nothing but noise is left.
In physics, diffusion is spontaneous and irreversible – the sugars that have diffused in coffee cannot be restored to bulk. But diffusion systems in machine learning aim to learn a kind of “back-diffusion” process to recover destroyed data, achieving the ability to recover data from noise.
Diffusion systems have been around for almost a decade. But a relatively recent innovation from OpenAI called CLIP (short for “Contrast Language Image Pre-Training”) has made them much more practical in everyday applications. CLIP classifies data — e.g. images — to “score” each step of the dissemination process based on the likelihood that the data is classified according to certain text prompts (e.g. “note sketches”). dog in the flowery lawn”).
At the start, the data had a very low CLIP score because it was mostly noise. But as the diffusion system reconstructs the data from the noise, it gradually gets closer to matching the prompt. A useful analogy is uncarved marble — like a master sculptor telling a novice where to carve, CLIP guides the diffusion system toward an image that yields high scores. than.
OpenAI introduced CLIP along with the DALL-E image generation system. Since then, it has become the successor to DALL-E, DALL-E 2, as well as open source alternatives like Diffuse Stable.
What can diffusion do?
So what can the CLIP instruction diffusion model do? Well, as mentioned earlier, they are pretty good at creating art – from realistic art to sketches, drawings and paintings in the style of practically any artist. In fact, there’s evidence to suggest that they were problematically retrieving some of their training data.
But the talent of the models – debatable – doesn’t stop there.
The researchers also experimented with using guided diffusion models to compose new music. chordan organization with financial support from Stable AI, the startup behind London-based Stable Diffusion, has released a diffusion-based model that can generate musical pieces by training hundreds of hours of existing songs. More recently, developers Seth Forsgren and Hayk Martiros created a hobby project called diffusive uses a skillfully trained diffusion model on the spectra – visual representations – of the sound to create small tracks.
Outside of music, several laboratories are trying to apply diffusion technology to biomedicine in the hope of discovering new treatments for diseases. Startup Generate Biomedicines and the University of Washington team trained diffusion-based models to generate designs for proteins with specific properties and functions, like the MIT Tech Review report in the first day of this month.
Models work in different ways. Create Biomeesines’ add noise by unraveling the amino acid sequences that make up proteins and then putting random sequences together to form a new protein, guided by constraints specified by the researchers. The University of Washington model, on the other hand, starts with a scrambled structure and uses information about how the protein fragments fit together provided by a separate AI system trained to predict the protein structure.
They have had some success. The model designed by the University of Washington team was able to find a protein that can bind to parathyroid hormone – the hormone that controls blood calcium levels – better than existing drugs.
Meanwhile, at OpenBioML, an effort supported by Stable Artificial Intelligence to bring machine learning-based approaches to biochemistry, researchers have developed a system called DNA Diffusion to generate sequences Cell type-specific regulatory DNA — segments of nucleic acid molecules that affect the expression of specific genes in an organism. Diffusion of DNA will — if all goes according to plan — produce regulatory DNA sequences from written instructions such as “A sequence will activate a gene to its maximum expression in X cells” and “A sequence that activates a gene in the liver and heart, but not in the brain.”
What could happen in the future for diffusion models? The sky can also be the limit. Currently, researchers have applied it to Make a video, compress images and speech synthesis. That’s not to say diffusion won’t eventually be replaced by a more efficient, more efficient machine learning technique, like GAN with diffusion. But it’s architecture du jour for a reason; Diffusion is nothing if not flexible.