Meta Introduces Advanced AI Models for Text and Image Creation

Meta has released five new artificial intelligence (AI) research models, including one that can generate both text and images and recognize AI-generated speech in longer audio snippets.

The models were made publicly available on Tuesday (June 18) by Meta’s Fundamental AI Research (FAIR) team, the company said in a press release on Tuesday.

“By publicly sharing this research, we hope to inspire iterations and ultimately help advance AI in a responsible way,” Meta said in the release.

One of the new models, Chameleon, is a family of mixed-modal models that can understand and generate both images and text, according to the press release. These models can accept inputs that contain both text and images and output a combination of text and images. Meta hinted in the release that the feature could be used to generate captions or create new scenes using both text prompts and images.

Pre-trained code completion models were also released on Tuesday. According to the press release, these models were trained using Meta’s new multi-token prediction approach, which trains large language models (LLMs) to predict multiple future words at once, rather than predicting one word at a time as in the past.

The third new model, JASCO, offers more control over AI music generation. According to the press release, rather than relying primarily on text input to generate music, this new model can accept a variety of inputs, including chords and beats. This capability allows symbols and audio to be incorporated into text music generation models.

Another new model, AudioSeal, features audio watermarking technology that enables localized recognition of AI-generated speech, meaning it can accurately identify AI-generated segments within larger snippets of speech, the press release states. The model recognizes AI-generated speech up to 485 times faster than previous methods.

The fifth new AI research model, announced by Meta’s FAIR team on Tuesday, aims to increase geographic and cultural diversity in text-to-image generation systems, according to a press release. To this end, the company released geographic variance evaluation code and annotations to improve the evaluation of text-to-image models.

Meta said in its April earnings call that it expects capital expenditures for its AI and metaverse development arm, Reality Labs, to be between $35 billion and $40 billion by the end of 2024, $5 billion more than initially expected.

“We’re building a number of different AI services, from our AI assistant to augmented reality apps and glasses, to APIs [application programming interfaces] that help creators engage their communities and that fans can interact with, to business AIs that we think every business eventually on our platform will use,” Meta CEO Mark Zuckerberg said April 24 during the company’s quarterly earnings call.