Notebook for ideas and thoughts

lundi 27 avril 2020

A robot language?

Around 2008 I started getting interested in constructed languages. The first one was obviously esperanto. But after a while, I started to notice and deplore, like many, its flaws. I then looked into one its spin-off, Ido, which resolved some of those issues. That language, created in 1907, was maybe well-suited for the time, but sadly it too has aged badly, being way too Eurocentric in a globalizing world.

And so, unsatisfied, I sat down and decided to create my own international language, Poplengwo (that I will shorten as Ppl), the latest version of which was published in 2015.

In April 2020, I discovered another international language which had already been published for 10 years, and which was to me the best notable international language that I'd ever seen: Lingwa de planeta (or LdP).
If you're interesting in promoting a good international language, I would definitely recommend you have a look at LdP. It has the advantage of already having a community and it is surely more notorious than Poplengwo will ever be!

When reading about LdP, I was struck to see how similar some aspects were with Poplengwo. For example:

When creating words, we try to maximize the number of people in the world who will understand it naturally. As a result, the two languages look like a mix of English, Romance languages and Chinese.
Both languages are predominantly isolating languages. "The word form never changes."
In both languages, we try to make the part-of-speech recognizable by the ending of the words, but if it is not convenient, we favor the similarity with the original language. "There are no fixed endings for the word classes, there are preferable, though. Thus most verbs end in i, but there are some exceptions"
We use the principle of necessity: "The use of special particle is optional if its meaning is clear from the context."

All the citations above are taken from LdP's Wikipedia page.

If we focus on the differences between the two languages, we see that LdP chose to stay as close as possible to the existing natural languages, and thus tends to feel more natural and convenient. For example, like many languages, it has possessive pronouns (you say "my" instead of "of me"). Subjects and direct objects don't require to be introduced by a preposition (unlike Japanese) if they are placed in the default SVO order.

Ppl, on the other hand, tends to apply logical rules more strictly, probably influenced by Lojban and by my personal interest in mathematics and formal logic.
This is visible in the alphabet and pronunciation - one letter exactly matches one sound, as well as in the rules of phonological stress, which allows the words to be unambiguously told apart, and finally in the grammar, where the syntax could be strictly defined in a Backus-Naur form.

As a result, Ppl is probably more suitable than most conlangs for a robot, because it can apply systematic algorithms to parse sounds as words and words as sentences. This means that using this language would help humans communicate more efficiently with robots, reducing ambiguities and the risk of misinterpretation, for example when making requests to your smart-home appliances.

Currently, communication between humans and robots (more generally computers) exists in two extremes. On one side, humans must adapt completely to computers when writing code using programming languages, which incidentally have the disadvantage of being only written and not spoken, and when there's a bug, it's the human's job to fix it. Also note that humans would never choose to use a programming language to talk to each other. On the other side, computers must adapt completely to humans: systems controlled by voice must understand humans in their natural languages, in all their varieties, and when the robot gets something wrong, the human gets angry and it's all Siri's or Alexa's fault. Note that computers also never use human languages to talk to each other because it would be highly inefficient for them.

I believe an intermediate language could exist, where both the human and the robot have to make an effort. The human would make an effort to make well-defined and explicitly phrased requests, and the robot would still have to deal with the fuzziness of speech, as voice recognition already does, and could interact with the human to give them opportunities to refine and confirm their requests. At this point, maybe robots could even use such a language to talk to each other, and maybe humans could too.

samedi 25 avril 2020

Deep learning ideas

Here are some of my ideas of amazing things that could be developed with deep learning, and that I will probably never have time to work on myself. So I'll just keep waiting for the deep learning experts to also come up with those ideas themselves and solve them.

1. Phonetics-based speech synthesis

Currently existing text-to-speech applications are based on a text in a given (or auto-detected) language. The given language is used to select the appropriate training model, which is then run to generate the sound. That model is therefore language-specific.
I am not aware if the modern systems still go through the intermediate step of looking up the input words in a dictionary of phonemes to then feed those to the speech synthesis module, or if the text is directly given as input to a neural network.
Let's assume it is the former, and that the representation of those phonemes are not langage-specific but international (1), e.g. the IPA.
The idea is to pack those tools in a program that will expect as input a text as a sequence of phonetic symbols, and output the synthesized speech.

Such a program would allow automatically generating speech for user-defined languages, either lesser-known languages or entirely new ones. This would be useful for language-learning apps to automatically generate audio for each word without requiring the course creator to record audio files themselves, and would also reduce storage space. This would also have an application generating audio samples for new constructed languages, and could for example be used in a video game where the characters speak an imaginary language.

2. Photorealistic image from drawing

Nowadays, deep photo style transfer is a very popular problem and its solutions are pretty good. We can choose a photo and a drawing or painting style and we obtain the restyled picture.

We less frequently mention the reverse problem. Based on a painting or a line drawing, we would like to generate a photorealistic image.
In 2017, the app Pix2Pix became viral and allowed you to generate a photo of a face with photorealistic textures from a simple black lines drawing.
In 2019, NVidia release GauGAN, generating photorealistic landscapes from areas of flat colors representing diverse types of terrain.

Now, we would like a generalization of NVidia's work: the algorithm should learn to recognize landscape elements and objects from a painting or a drawing, without any color code previously agreed on.

3. Description-based image synthesis

The solutions to the problem of describing images are getting pretty good.

What is quite unheard of, however, is the reverse problem: synthesizing a photorealistic image based on a text description. I would very much like to see what GANs would be able to come up with to solve this.

4. 3D scene from 2D image

Rendering a 3D scene onto a 2D image is an extremely common problem, arising everytime a frame has to be rendered in a video game or in a 3D animation movie.

The reverse problem is more challenging, and some work exists for it too. The existing solutions involve depth estimation, possibly supplemented by some background filling, for example to create a Ken-Burns effect out of a still picture. If we allow to use more than one picture, the NeRF algorithm published in March 2020 showed some stunning results.(2)
But all this research mostly focuses on generating just one side of the scene. In December 2019, NVidia published a paper looking at a different aspect of the problem, identifying the main object in the picture and generating a 3D model of it, textured on all sides.

I would like an AI that mixes these two approaches, and that takes its freedom further, filling any missing parts of the scene, maybe even deep-dreaming a 360 degree panorama, so that the 3D scene can be viewed from all sides.

5. Description-based 3D scene synthesis

Finally, if the problems of the last two paragraphs are solved, then put them together and you are able to synthesize an entire 3D scene, solely based on a text description. Alternatively, it would probably be a better idea to by-pass the 2D step, to allow the information contained in the description to influence the content of the 3D scene directly.
Congratulations! you have now become a world creator, akin to Atrus creating new ages from the tip of his quill in the Myst game series.

6. Summary

To sum up, AIs must be able to perform any of the conversions indicated by arrows in the below graph:

7. Another dimension

You can also even expand all the previous problems to another dimension: time. The problems that were dealing with images will then deal with videos, and 3D scenes will then inlcude 3D animations. This is also the topic of a lot of ongoing research.

1. which are definitely wrong assumptions, because the speech synthesis module needs to be informed of what language it is reading, at least to use a correct intonation, and every every little detail that will make it sound less like a robot.
2. But then we fall into the category of photogrammetry, which is a whole different ball game.

lundi 23 septembre 2019

Most complex spelling bias

In English as well as in French, spelling is hard because it doesn't just rely on a definite set of rules. People often make mistakes, switching a letter for another one, sometimes omitting one letters, which has the advantage of simplifying things, and one could perhaps imagine in this case that the search for simplicity was indeed at least partly the reason for making the mistake. For this reason, a simplifying mistake could be more easily excused than a complicating one. (My advice, by the way: if you hesitate equally between two things, pick the simplest one.) But interestingly and unintuitively, incorrect additions of letters or symbols are also very common.

For example, some French people, having seen both the spellings "faites" and "faîtes" somewhere, prefer to write "vous faîtes" ("you guys do") with an accent, which is wrong - "faîtes" being an uncommon word meaning "roof" - over the simpler option, which is correct.
Last month, I even saw an even weirder choice of spelling on a magazine cover. The big title was an injunction to say something: it should have read "Dites..." but it read "Dîtes" with the same accent as above. This time, it really seemed like the author could be given no other excuse for their fantasy than an overflowing imagination. Thinking a little more, however, we realize that this other spelling does also exist. It is the same verb, conjugated with the same person, but using an antique, literary past tense: subjonctif parfait.

In short, when people are unsure about the spelling of a word, given the choice between two possible spellings, it seems they tend to prefer the most complex one. This is what I call the most complex spelling bias.

jeudi 26 avril 2018

Comme quoi

samedi 8 octobre 2016

SharePoint rant

We were forced to switch from Google Documents to Microsoft SharePoint at work. This article is just a rant to outlet my deep frustration. 😁

In short, SharePoint is a crappy piece of software, and here is why:

First, let's start by mentioning what happened to some folders during the migration. Some of them got duplicated, yielding one version with the correct name and another one with a cropped name. See below - the top part was how folders looked like in Google Documents and the bottom part is what they became in SharePoint:

Lots of commands are hidden in the FILES and LIBRARY tabs. It took me a long time to find them. To copy a file, you have to go to FILES and Send to. Come on, couldn’t you call it “copy” as in “copy and paste”? There is no browse option when you copy, so you have to know the exact file path. There is no proper command to move a file. You have to copy it and then remove the original. You can't rename the extension of a file.

The search is very unhelpful. Have a look at the example below. The required file only appears in 28th position, although none of the titles of the other file names contain anything remotely related.

The Word web app is way less elaborate than Word. Collaboration with other users is just indescribably buggy. When closing a file, it doesn’t remember if it has been saved so it warns you that you may lose data, although you obviously won’t because you just saved it. In some cases you may even come across this kind of curious error message:

As we just mentioned, SharePoint is a collaborative document edition system. When someone adds comment on documents you've shared to them, you would expect to receive these comments in some way. Good news: you can set up email alerts. Bad news: you have to set them up yourself. Here's how: go to folder view (1), find the file you want an alert to (yes, of course you have to do this for each file), tick the checkbox next to it (2), open the FILES tool bar, click Alert, set up the alert in the dialog, click OK, receive a confirmation by mail and you're done! For comparison, in Google Drive, all this is only a single step: you're done. If you've managed to go through all the hassle, you will finally receive notifications, saying that your file has changed. Click on the links to your file and you're welcome to read the whole document and comments to find out by yourself where the changes are.

(1) Going back to folder view from file view: there is a tiny link in the top left corner of the file view. Only it is very misleading. The text is the name of the top directory of your SharePoint file system. However, when you click on it, it directs you to the containing folder.

(2) Selecting files in the folder view: on the left of each item in the list of files, there is an checkbox to select them. Only the developers decided it was a good idea to make those checkboxes invisible unless you hover them.

Finally, here's a funny graphical bug I saw in Excel. It's not exactly related to SharePoint, but it's still Microsoft, so I might as well share it here. The weird light yellow shape is meant to be a comment box related to the cell on the left:

In conclusion, if you ever have to choose a document management and storage system, don't choose SharePoint.

dimanche 19 janvier 2014

Mobile app to meet one another

Edit: No, it's not Tinder.

Two people who know each other want to meet up. They both have a smartphone with a GPS device.

There could be an app to spare them planning a meeting point, and that would instead calculate for each of them the shortest path from one person to the other and update it every now and then.

samedi 18 janvier 2014

Communication AI

I have an idea of an AI with which you could communicate to ask questions, teach facts, give orders, and that would have an interactive behavior.

In a console you could write a sentence using a specific syntax. This sentence would be analyzed and interpreted in three different ways. If it ends with a point, then the AI will recognize it as a statement that it should learn and store in its database. If it ends with an interrogation mark, then the AI will interpret it as a question and try to answer it. If it ends with an exclamation mark, then the AI will interpret it as a command and will have to execute it.

To answer questions, it would query its own local knowledge database, as well as online semantic web databases, that is, convert your sentence into SPARQL to query the databases. For example the sentence what has-family Crambidae? (the syntax is not decided yet) would return the list of animals that belong to the Crambidae. The application would implement question words such as what, who, when, where, how-many...

What do I mean by interactive behavior?

The syntax of statements would allow some flexibility and allow you to either ask for something very precise with all details needed to avoid any ambiguity, or omit omit some details which are obvious or which would have a predefined default value. For every other missing piece of information, the AI would first ask for them before executing your command, a bit like a human would do. For example you could ask:

> send email to george

and it would answer something like:

You are not connected to a mail service. Do you want to log in?

> yes

Please enter your email address:

> edward@gmail.com

Please enter your password:

> *********

Do you want to send it to George Jones or George Smith?

> george smith

Ok. Please enter your subject:

and so on.

The guiding rule would be: if someone said that sentence in real life, what would another human answer. But the AI should implement this not trying to be the most realistic possible like some chatbots do to pass the Turing test, but rather trying to consider the best behavior expected from a human: nice, useful, intelligent, obedient...

The whole thing would be developed in Python or Perl because those are powerful languages that I think are adapted for that kind of application.