I appologize to my big fans of my English articles because I didn't write it for four weeks.
I simply forgot to write the Opus 340 in English.
Sorry.

Today is the continuation of the previous English article (opus 330) discussing about TM or Translation Memory.
Now, let's discuss about Machine Translation.

Translater_USA_1978
Machine Translation is quite old computer technology but I think that is not yet so popular.
People might think its output is horrible, unnatural and useless.

But nowadays, the technology has been improving so fast and people are "learning" how to utilize it.
You can purchase the stand-alone comercial application (that is rather expensive).
But nowadays, you can use the online (or on the "cloud") version.

Let's use the one of the most popular service -- Google translation.

Click here to obtain the English translation of the Japanese "橋下市長の軽率な発言は批判されるべきだ。"

The Google translation of my Japanese sentence is "Thoughtless remarks Hashimoto mayor should be criticized."
I want to mean "The Mayor Hashimoto's thoughtless remarks should be criticized."

The Google translation is wrong, but it is not so useless.
I could learn the english term "thoughtles", "remarks", "mayor", and "criticized".
Then I could re-order the MT and create the correct English so fast.
Luckily, I could write the English sentence faster than I write it from scratch.

To correct the raw MT output by human is post-edition.
Post-edition is thought to be quicker than translation.
The thought is true the MT quality is good enough.

Let's try EnJa (from English to Japanese) translation.
The result is here.

The Google's Japanese is "市長橋本軽率な発言は批判されるべきである。".
It is not fluent.
But I can read it and know what it means.
My post-edition is "橋本市長の軽率な発言は批判されるべきである。".
The post-edition is quite faster then the translation from scratch.

To evaluate the quality (ability) of MT, you can use Levenstein distance, or edit distance.(Wikipedia)
(Let's shorten it as EdD.)
This is the measurement how difference it is between MT (machine translation) and its human post-edition.

In other words, you can utilize EdD to measure the quality of MT if you accept the premise that the human post-edition is the ideal sentence and MT is better if it is close to the post-edition.

Let's calculate EdD between the Google's Japanese translation and my post-edition.
You can do it with online calculator supported by planet calc.
EdD

The number is four.
I think it is the good score.

If you find any MT program / service, you can major its quality with EdD.
But to obtain the realiable data, you should translate as many sentences as you can.

The planet calc's service seems to calculate EdD with characters.
It fits to the purpose to calculate the EdD of Japanese sentences.
If you want to calculate the EdD of English or other Latin-like languages, you should find the program that calculates EdD based on words.

Levenstein distance is the general technique to compare the data so easily.
For example, you can compare DNAs of human and gollira, and the ones of human and chimpanzee, and you can find out which of gollira and chimpanzee is closer to the human, I think :-D

Let's talk more about MT again.


Subscribe with livedoor Reader
Add to Google
RSS
このエントリーをはてなブックマークに追加