Capture d’écran du site openai.com

“Baby daikon radish in tutu walks dog”: new AI creates images from simple text description

A new deep learning model allows you to create images from a simple text description. A giant leap that fascinates as much as it worries illustrators and creators who could find themselves facing a new form of unfair competition.

The company OpenAI has just released DALL-E, using artificial intelligence to create images from a simple text description. So by asking the AI to draw an avocado chair, the model was able to produce some pretty stunning illustrations.

A cousin of GPT-3

OpenAI had already made a name for itself in recent months with another of its deep learning applications, the GPT-3, an intelligent language model capable of generating coherent text autonomously.

Thanks to billions of absorbed articles, their model is capable of analysing the probability of each sentence to exist. It thus chooses the next most likely word or character to complete a sentence in a certain context.

Many applications developed from its API (Application Programming Interface)  have caused a stir, such as:

☞ An intelligent search engine that directly gives the answer to the question asked

☞ Applications to automatically generate computer code in javascript or SQL

An article published by the Guardian entirely written by GPT-3

☞ A short film whose script was written by an AI (the dialogues are absurd but fun)

a fake blog which attracted nearly 26,000 readers in 2 weeks

Open AI engineers were even able to develop a tool to reconstruct incomplete images by applying the same word-to-pixel probability of occurrence technique:

DALL-E, the misunderstood surrealist artist

DALL-E, pronounced Dali (a double wink to Salvador Dalí and the little robot from Disney’s WALL-E 🙃) is a new version of the GPT-3, capable of generating images corresponding to the description of a text.

The model has absorbed billions of images and their descriptions from the internet in order to be able to link words and images.

And the results are rather impressive, such as this image of an avocado-chair generated from the description “an armchair shaped like an avocado” or “an armchair looking like an avocado”.

The avocado-chair generated by DALL-E – Source: Open AI blog

What surprises even its designers is its ability to link truly opposing concepts into a more or less sensible picture, as Aditya Ramesh (one of the designers of DALL-E) tells the MIT Technology Review :

“The thing that surprised me the most is that the model can take two unrelated concepts and put them together in a way that results in something kind of functional”

Another rather striking example is this series of images produced by DALL-E from the description “a baby daikon radish in a tutu walking a dog”.

Images generated by DALL-E from the description “a baby daikon radish in a tutu walking a dog”. – Source: Open AI blog

Not all the trials were as successful as this cross between a turtle and a giraffe. But the effort is there.

Images generated by DALL-E from the description “a giraffe looking like a turtle” – Source: Open AI blog

For Ilya Sutskever, chief scientist at Open AI, the long-term vision of such a project is to improve the overall understanding of the language:

“We live in a visual world. In the long run, you’re going to have models which understand both text and images. AI will be able to understand language better because it can see what words and sentences mean”

CLIP

The Open AI team also presented another model called CLIP, a little less impressive on paper, but which contributes to this long-term ambition of a holistic understanding of language.

CLIP, for Contrastive Language Image Pre-training, is able to select the image that best matches a text description from a database of more than 30,000 images. Once DALL-E generates the images, CLIP can rank the results in order of relevance.

It is somewhat different from the usual image recognition models (such as those used for facial recognition) since it has not learned from images labelled in a database but from images and descriptions taken from the Internet in their natural context, a technical feat.

Although the results are still mixed, the rate of progress of GPT-3 suggests that automated production of high-quality images is not far off. This raises several questions.

Firstly, there is the risk of head-on competition with the work of illustrators, photographers or cartoonists. But also, the question of copyright. If there is no human author, there would therefore, in theory, be no distributed copyright? 

We are clearly in a legal void that will have to be cleared up so as not to destabilise an entire sector of the economy.

A second point of doubt is the status of the Open AI society. Founded in 2015 by Elon Musk and Sam Altman (president of the Y combinator, a prestigious start-up accelerator based in California), the company initially had the status of non-profit organisation. In March 2019, it became a capped-profit company.

Basically, this status enables it to ensure a controlled return on investment for its private investors, unlike the associative status.

To justify its decision, Open AI mentions the budgets needed to attract the best talent (in direct competition with private giants such as Google’s Deep Mind or Amazon) and the development costs of its projects (an estimate was made by an Internet user on the Reddit forum). 

But this naturally raises the question of the governance of the company and the risk of privatising such discoveries.

In September 2020, Microsoft also negotiated exclusive access to the GPT-3 code for an undisclosed amount in order to include it in the development of its future products (developing applications based on the API remains accessible to all).

The governance issue is essential for the AI sector and private companies should not be the only ones to decide the path taken by these discoveries which have a very broad impact on society.

There is also the question of mastering such technologies. For Nick Bostrom, Swedish philosopher and Oxford professor, founder of the Future of Humanity Institute and author of the book Superintelligence, which analyses the long-term issues surrounding AI, the risks do indeed exist:

“there is a potential risk that AI itself may do something different from what was intended and could then be detrimental”. 

A breakthrough on the issue: the European Commission has just released a report on the regulation of AI which could establish a legal framework for the use of AI in Europe, somewhat like the GDPR now regulates the use of personal data. 

Legal protection that could inspire the rest of the world and put Europe back at the forefront of these issues.

Leave a Reply

Your email address will not be published. Required fields are marked *