Existing generative models, such as GAN, VAE, and flow-based models, suffer from mode collapse and must rely on surrogate objectives to approximate maximum likelihood training. Diffusion models have overcome many of these constraints and emerged as a new paradigm for generative models, theoretically supported by non-equilibrium thermodynamics and a score-matching network. The most significant advances have been made in domains that use continuous signals, such as vision and audio.
However, extending continuous diffusion models to natural language remains challenging because texts are inherently discrete. Existing efforts begin customizing diffusion models to text in discrete space on unconditional language modeling based on the whole generation in continuous space (illustrated in Figure 1(a)) (i.e., free text generation). Diffusion-LM (as depicted in Figure 1(b)) models texts in continuous space and proposes using an extra-trained classifier as guidance (i.e., the condition signal x) to impose subtle changes (typically complex, fine-grained constraints) on generated sentences.
However, these models do not naturally generalize to conditional language modeling (i.e., the model assigns probabilities p(w|x) to word sequences w given x). Applying Diffusion-LM in the more general sequence-to-sequence (SEQ2SEQ) setting, where the condition x is also a sequence of words, can be difficult. This is because classifiers are attribute-oriented, and it is only possible to train hundreds of thousands of classifiers to model the semantic meaning between conditions and generated sentences. SEQ2SEQ is an acute NLP setting that covers many essential tasks, such as open-ended sentence generation, dialogue, paraphrasing, and text style transfer. This paper proposes DIFFUSEQ (in Figure 1 (c)), a classifier-free diffusion model that supports SEQ2SEQ text generation tasks.
One advantage of DIFFUSEQ is that it allows a complete model to fit data distribution and use conditional guidance rather than relying on a separate classifier by modeling the conditional probability of the target sentence w given context x using a single model. They conduct experiments on four SEQ2SEQ tasks to validate the effectiveness of their DIFFUSEQ. DIFFUSEQ can achieve significant sentence-level diversity without sacrificing quality when compared to autoregressive (AR) and non-autoregressive (NAR) models, which suffer from the “degeneration” problem and rely on decoding strategies.
To summarise, they make several technical and conceptual contributions:
- They are the first to use the diffusion model for SEQ2SEQ text generation. Their proposed DIFFUSEQ as a conditional language model is trained end-to-end in a classifier-free manner.
- They establish a theoretical link between AR, NAR, and DIFFUSEQ models and justify DIFFUSEQ as an extension of iterative-NAR models.
- They demonstrate the great potential of diffusion models in complex conditional language generation tasks.
Researchers propose DIFFUSEQ to approach SEQ2SEQ tasks in a diffusion manner, which has a high potential for improved generation quality and diversity trade-off. In other words, DIFFUSEQ can generate a wide range of sentences while maintaining a high level of quality. The capability enables DIFFUSEQ’s advantageous characteristics to further improve the quality of final results by leveraging a minimum Bayes risk decoding algorithm. Given the limited progress of current diffusion models on text generation, their research focuses on the promising results of a new sequence-to-sequence learning paradigm.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'DIFFUSEQ: SEQUENCE TO SEQUENCE TEXT GENERATION WITH DIFFUSION MODELS'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link.
Please Don't Forget To Join Our ML Subreddit
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.