Stanford University Open Sources Controllable Generative Language AI Diffusion-LM

Researchers at Stanford University have open source Diffusion LM, a non-autoregressive generative language model that allows fine-grained control over the model’s output text. When evaluated on controlled text generation tasks, Diffusion-LM outperforms existing methods.

The model and experiments were described in a paper published on arXiv. Diffusion-LM is a generative language model that uses a plug and play control scheme, which fixes the language model and its generation is controlled by an external classification that determines how well the generated text matches the desired parameters. Users can specify various characteristics of the desired output, including required parts of speech, syntax structure, or sentence length. During generation, Diffusion-LM iteratively dampens a series of latent vectors, with the external controller providing gradient updates to direct the latent vectors to generate the desired output. When evaluated on a range of control tasks, Diffusion-LM outperformed the baseline methods “significantly”. According to the research team

We find the complex controls enabled by Diffusion-LM compelling, and we are excited about how Diffusion-LM is a substantial departure from the current discrete autoregressive generation paradigm.

Many generative language models (LM), such as: GPT-3, are autoregressive; that is, they recursively generate text by predicting the next word in a sequence, then add that word to the existing sequence, and use the updated sequence as input for further prediction. These models can generate text indistinguishable from text written by humans, and the models can generate text to solve a wide variety of problems, from answering questions to interactive chat. However, it is difficult to give the user control over the generated output; for example a desired sentence length, structure or sentiment.

One possible solution to this problem is to tune the LM to require an additional control input, but this update can be computationally intensive and cannot be generalized to handle multiple control parameters. Another solution is a plug-and-play technique, which keeps the parameters of the LM frozen and controls the generation with an external classifier that evaluates how close the generated output is to the desired parameters. However, attempts to steer autoregressive models have proved challenging.

Rather than try to steer an autoregressive LM, the Stanford researchers chose to use a new language generation technique: a diffusion model. These models have shown: good results in computer vision and other continuous domains; however, they have not been applied to text generation, which is a discrete domain. According to the team, Diffusion-LM is the first diffusion model for text generation.

To make Diffusion-LM work, the team modified the standard diffusion model in two ways. First, they defined an embedding function that maps words into vectors in the continuous latent space of the diffusion model. Second, they defined a “rounding method” to reduce these vectors to discrete words. To generate text, the model starts with a random vector in the latent space; this is treated as a noisy version of the output sentence embedding. The model then denoises it iteratively; at each step, the embedding is passed to an external classifier, which produces a gradient update of the embedding for the next step of the iteration. When the iterations are complete, the rounding method assigns the final embedding to a text output.

Diffusion LM Architecture

Image source: https://arxiv.org/abs/2205.14217

The Stanford team evaluated Diffusion-LM on five classification-guided text generation control tasks and compared performance to baseline methods using a GPT-2 autoregressive LM, using both plug and play and fine tuning. On all five tasks, Diffusion-LM outperformed the other plug-and-play methods; it also outperformed fine-tuning on two tasks with “similar” performance on the other three. The team also evaluated Diffusion-LM on an unguided text-completion task against three different base models; it outperformed two of them, achieving “comparable” performance with an autoregressive model trained specifically for infilling.

The team did find that Diffusion-LM was slower than other models, both for training and for runtime decoding. The output also scored worse on a perplexity. In a Twitter thread about the work, lead author Xiang Lisa Li commented:

Diffusion-LM shows strong performance in controllable generation, but it remains an open question whether it can match autoregressive LMs in [perplexity] and speed.

The Diffusion LM code is available on GitHub.

Leave a Comment

Your email address will not be published. Required fields are marked *