Notebook for ideas and thoughts: avril 2020

lundi 27 avril 2020

A robot language?

Around 2008 I started getting interested in constructed languages. The first one was obviously esperanto. But after a while, I started to notice and deplore, like many, its flaws. I then looked into one its spin-off, Ido, which resolved some of those issues. That language, created in 1907, was maybe well-suited for the time, but sadly it too has aged badly, being way too Eurocentric in a globalizing world.

And so, unsatisfied, I sat down and decided to create my own international language, Poplengwo (that I will shorten as Ppl), the latest version of which was published in 2015.

In April 2020, I discovered another international language which had already been published for 10 years, and which was to me the best notable international language that I'd ever seen: Lingwa de planeta (or LdP).
If you're interesting in promoting a good international language, I would definitely recommend you have a look at LdP. It has the advantage of already having a community and it is surely more notorious than Poplengwo will ever be!

When reading about LdP, I was struck to see how similar some aspects were with Poplengwo. For example:

When creating words, we try to maximize the number of people in the world who will understand it naturally. As a result, the two languages look like a mix of English, Romance languages and Chinese.
Both languages are predominantly isolating languages. "The word form never changes."
In both languages, we try to make the part-of-speech recognizable by the ending of the words, but if it is not convenient, we favor the similarity with the original language. "There are no fixed endings for the word classes, there are preferable, though. Thus most verbs end in i, but there are some exceptions"
We use the principle of necessity: "The use of special particle is optional if its meaning is clear from the context."

All the citations above are taken from LdP's Wikipedia page.

If we focus on the differences between the two languages, we see that LdP chose to stay as close as possible to the existing natural languages, and thus tends to feel more natural and convenient. For example, like many languages, it has possessive pronouns (you say "my" instead of "of me"). Subjects and direct objects don't require to be introduced by a preposition (unlike Japanese) if they are placed in the default SVO order.

Ppl, on the other hand, tends to apply logical rules more strictly, probably influenced by Lojban and by my personal interest in mathematics and formal logic.
This is visible in the alphabet and pronunciation - one letter exactly matches one sound, as well as in the rules of phonological stress, which allows the words to be unambiguously told apart, and finally in the grammar, where the syntax could be strictly defined in a Backus-Naur form.

As a result, Ppl is probably more suitable than most conlangs for a robot, because it can apply systematic algorithms to parse sounds as words and words as sentences. This means that using this language would help humans communicate more efficiently with robots, reducing ambiguities and the risk of misinterpretation, for example when making requests to your smart-home appliances.

Currently, communication between humans and robots (more generally computers) exists in two extremes. On one side, humans must adapt completely to computers when writing code using programming languages, which incidentally have the disadvantage of being only written and not spoken, and when there's a bug, it's the human's job to fix it. Also note that humans would never choose to use a programming language to talk to each other. On the other side, computers must adapt completely to humans: systems controlled by voice must understand humans in their natural languages, in all their varieties, and when the robot gets something wrong, the human gets angry and it's all Siri's or Alexa's fault. Note that computers also never use human languages to talk to each other because it would be highly inefficient for them.

I believe an intermediate language could exist, where both the human and the robot have to make an effort. The human would make an effort to make well-defined and explicitly phrased requests, and the robot would still have to deal with the fuzziness of speech, as voice recognition already does, and could interact with the human to give them opportunities to refine and confirm their requests. At this point, maybe robots could even use such a language to talk to each other, and maybe humans could too.

samedi 25 avril 2020

Deep learning ideas

Here are some of my ideas of amazing things that could be developed with deep learning, and that I will probably never have time to work on myself. So I'll just keep waiting for the deep learning experts to also come up with those ideas themselves and solve them.

1. Phonetics-based speech synthesis

Currently existing text-to-speech applications are based on a text in a given (or auto-detected) language. The given language is used to select the appropriate training model, which is then run to generate the sound. That model is therefore language-specific.
I am not aware if the modern systems still go through the intermediate step of looking up the input words in a dictionary of phonemes to then feed those to the speech synthesis module, or if the text is directly given as input to a neural network.
Let's assume it is the former, and that the representation of those phonemes are not langage-specific but international (1), e.g. the IPA.
The idea is to pack those tools in a program that will expect as input a text as a sequence of phonetic symbols, and output the synthesized speech.

Such a program would allow automatically generating speech for user-defined languages, either lesser-known languages or entirely new ones. This would be useful for language-learning apps to automatically generate audio for each word without requiring the course creator to record audio files themselves, and would also reduce storage space. This would also have an application generating audio samples for new constructed languages, and could for example be used in a video game where the characters speak an imaginary language.

2. Photorealistic image from drawing

Nowadays, deep photo style transfer is a very popular problem and its solutions are pretty good. We can choose a photo and a drawing or painting style and we obtain the restyled picture.

We less frequently mention the reverse problem. Based on a painting or a line drawing, we would like to generate a photorealistic image.
In 2017, the app Pix2Pix became viral and allowed you to generate a photo of a face with photorealistic textures from a simple black lines drawing.
In 2019, NVidia release GauGAN, generating photorealistic landscapes from areas of flat colors representing diverse types of terrain.

Now, we would like a generalization of NVidia's work: the algorithm should learn to recognize landscape elements and objects from a painting or a drawing, without any color code previously agreed on.

3. Description-based image synthesis

The solutions to the problem of describing images are getting pretty good.

What is quite unheard of, however, is the reverse problem: synthesizing a photorealistic image based on a text description. I would very much like to see what GANs would be able to come up with to solve this.

4. 3D scene from 2D image

Rendering a 3D scene onto a 2D image is an extremely common problem, arising everytime a frame has to be rendered in a video game or in a 3D animation movie.

The reverse problem is more challenging, and some work exists for it too. The existing solutions involve depth estimation, possibly supplemented by some background filling, for example to create a Ken-Burns effect out of a still picture. If we allow to use more than one picture, the NeRF algorithm published in March 2020 showed some stunning results.(2)
But all this research mostly focuses on generating just one side of the scene. In December 2019, NVidia published a paper looking at a different aspect of the problem, identifying the main object in the picture and generating a 3D model of it, textured on all sides.

I would like an AI that mixes these two approaches, and that takes its freedom further, filling any missing parts of the scene, maybe even deep-dreaming a 360 degree panorama, so that the 3D scene can be viewed from all sides.

5. Description-based 3D scene synthesis

Finally, if the problems of the last two paragraphs are solved, then put them together and you are able to synthesize an entire 3D scene, solely based on a text description. Alternatively, it would probably be a better idea to by-pass the 2D step, to allow the information contained in the description to influence the content of the 3D scene directly.
Congratulations! you have now become a world creator, akin to Atrus creating new ages from the tip of his quill in the Myst game series.

6. Summary

To sum up, AIs must be able to perform any of the conversions indicated by arrows in the below graph:

7. Another dimension

You can also even expand all the previous problems to another dimension: time. The problems that were dealing with images will then deal with videos, and 3D scenes will then inlcude 3D animations. This is also the topic of a lot of ongoing research.

1. which are definitely wrong assumptions, because the speech synthesis module needs to be informed of what language it is reading, at least to use a correct intonation, and every every little detail that will make it sound less like a robot.
2. But then we fall into the category of photogrammetry, which is a whole different ball game.