Meta Introduces Generative AI Speech Generator

Meta has a dedicated team that researches AI for its products, all so tools can be created for users and make it easier for them to use Meta's apps. This led to another innovation called Voicebox, a tool used for speech generation using AI.

Voicebox

The tool that Meta has called Voicebox helps users, edit, sample, and stylize text inputs using text-to-speech. Furthermore, it can also produce high-quality audio clips and edit pre-recorded audio, capable of removing background noises like traffic to preserve its quality.

Through In-context text-to-speech synthesis, users can input an audio sample that the tool will analyze. Using the sample, the text-to-speech synthesis will match the style of the audio and use it for text-to-speech generation.

Parts of audio that have been interrupted by background noise can also be clipped and edited. For instance, after separating a segment, users can instruct Voicebox to re-generate it without the noise, making sure that the entire speech is clear.

Meta also made it so misspoken or mispronounced words can be corrected. Through the tool, you can edit the portion containing the mistake and generate the right words, saving the users from deleting the entire audio and starting from the beginning.

The feature can function in six languages such as English, French, German, Spanish, Polish, and Portuguese, which means it can understand writings in those languages as well as generate a sample speech. It can even be a mix of the mentioned language.

The social media giant believes that its ability to do so can be used in the future to assist people in communicating in a "natural, authentic way," even though they don't speak the same language. It also helps that Voicebox can generate speech in a way that people talk in real life.

Meta claims that the technology can be used in the future to help creators edit audio tracks, as well as make content more accessible to impaired individuals by hearing written messages from friends in their voice, which may mean that the tool will be integrated into Meta products.

Meta's ImageBind

Meta has already made progress in other AI tools as well, such as ImageBind. Much like known AI image generators like Midjourmey and Stable Diffusion, it is able to generate scenes as well, although it will not be limited to text prompts.

It is capable of accepting input from users in the form of texts, images, audio, and video. Not only that, but it can also take in factors such as measurements, temperature, and motion. The company is also set to open-source the tool.

As reported by Engadget, it means that the AI model does not need to be trained with various possibilities by studying available data. Impressively, it can create animations using only the available content like static images and audio prompts.

For example, a user can upload an image of a dog, and it will generate sounds that are associated with the animal. It also works the other way around, wherein a user can upload audio of car horns and it can generate photos of cars.