complexity of languages

Guest   Sat Sep 08, 2007 3:18 am GMT
<<I agree that French spelling is easier than English. Given a particular French word, there is usually only one or two possible pronunciations. To go from a spoken word to the written form though is much harder. >>

Yeah but the Spanish one is the easiest one of all.
Sam II   Sat Sep 08, 2007 5:30 am GMT
<<<Why shouldn't we consider the speech of people in cities, and why do you think that people in cities have more complex speech than those who live in rural areas? >>>

I think that in the last 150 Years the life and language have been massively changed by modern techniques (TV, newspapers, industry) and institutions as well as by the intensified contact with people from around the world. In rural areas one can hope to find people speaking a more pure language or dialect that is more representative and nearer to the language of our ancestors.
Speech of people in cities is not more complex than those who live in rural areas, in the contrary! It is just less pure and more poluted by modern trends.
Guest   Sat Sep 08, 2007 5:52 am GMT
<<Yeah but the Spanish one is the easiest one of all. >>

Granted that Spanish spelling is more phonetic and regular than English spelling, but it still has its difficulties. On this side of the pond at least, there's the 's' vs 'c/z' and 'b' vs 'v' dilemmas, when you try to look up a new word that you've heard.

Are there any languages with a true 1-1 mapping between pronuncuation and writing, at least in most mainstream dialects? These would be languages where you couldn't have spelling bees, I suppose :)
Adolfo   Sat Sep 08, 2007 1:01 pm GMT
"Granted that Spanish spelling is more phonetic and regular than English spelling, but it still has its difficulties. On this side of the pond at least, there's the 's' vs 'c/z' and 'b' vs 'v' dilemmas, when you try to look up a new word that you've heard.
"



European Spanish still distinguish between s and c/z . Spanish is very near to be a complete phonetic language. Even some people pronounce b and v differently, despite it may sound somewhat pedantic.
K. T.   Sun Sep 09, 2007 1:34 am GMT
Well, I rarely hear anyone say that Spanish is difficult. LOL! It's not like someone's Mom gets compliments like "OOOH, your son is learning Spanish. That's SOOOOOOOOOOOOOO difficult." The only complaint I hear is that the natives (i.e. Cubans and Mexicans) speak too fast for American speakers of English.

I actually do NOT think Spanish is difficult or too fast. I also do not think it is the closest one to perfection. I think each language is a little puzzle to learn. Spanish is an easy little jigsaw.
Tiffany   Sun Sep 09, 2007 5:53 am GMT
I'd nominate standard Italian for a near 1-to-1 mapping. The only exception I can think of is the word "negligenza" (neh-glee-jen-zah). The "gli" sound in all other cases gives thee sound "ly" (ex. negli [nel-yee], luglio [lewl-yoh], tagliare [tal-yar-reh]). Otherwise, everything is spelled as it sounds and no letters sound the same. You just have to get used to double consonants (not hard IMO).
greg   Sun Sep 09, 2007 10:38 am GMT
C'est vrai que <gli> peut représenter trois monogrammes successifs (trois graphèmes = lettres et trois sons), ou bien un digramme <gl> (deux graphèmes et un son) accolé au monogramme <i> (un graphème et un son), ou encore un trigramme (trois graphèmes et un seul son), ou enfin un trigramme (trois graphèmes mais pour deux sons géminés).

Exemples :

<gloria> {gloire} → /glɔrja/ → le phonème /g/ individuellement rendu par le digramme <g>, le phonème /l/ individuellement rendu par le monogramme <l> et le phonème /ɔ/ individuellement rendu par le monogramme <o>

<gli> {les} → /ʎi/ → le phonème /ʎ/ individuellement rendu par le digramme <gl> et le phonème /i/ individuellement rendu par le monogramme <i>.

<figlio> {fils} → /fiʎo/ → le phonème /ʎ/ individuellement rendu par le trigramme <gli> et le phonème /o/ individuellement rendu par le monogramme <o>.

<famiglia> {famille} → /famiʎːʎa/ → les deux phonèmes géminés /ʎː/ & /ʎ/ collectivement rendus par le trigramme <gli> et le phonème /a/ individuellement rendu par le monogramme <a>.

On peut noter un point particulier : quand on entend /ʎ/, le graphème <g> (incorporé au digramme <gl> ou au trigramme <gli>) n'est pas étymologique. Par exemple :
It <gli> → La <illi> — Es *<glos> mais <los> — Fr *<gles> mais <les>
It <figlio> → La <filius> — Fr *<figls> mais <fils> — piedmontais *<fieugl> mais <fieul>
It <famiglia> → La <familia> — Es *<famiglia> mais <familia>— Fr *<famigle> mais <famille>.

Autre point : <gli> se prononce /ʎi/ quand le <i> est accentué. C'est évident avec It <gli> = Fr <les> puisqu'il s'agit d'un monosyllabe. C'est peut-être le seul mot italien dans ce cas ? Qu'en penses-tu Tiffany ?

Dans It /glɔrja/, c'est la première syllabe qui porte l'accent (et le <g> est étymologique).
Dans It /fiʎo/, c'est la première aussi.
Dans It /famiʎːʎa/, c'est la deuxième qui le porte.

Comme le diasait Tiffany, le scripto-italien n'est pas un code phonologiquement transparent : il subsiste des zones d'opacité entre la langue orale et la façon conventionnelle de la rendre à l'écrit. Ceci dit, le scripto-italien est bien moins opaque que le scripto-anglais ou le scriptofrançais.
greg   Sun Sep 09, 2007 10:47 am GMT
Désolé mais It <gloria> n'était pas un bon exemple. À remplacer par It <negligere> = Fr <négliger>.

Dans It <negligere>, c'est <gli> /gli/ qui est accentué. D'autre part le <g> est étymologique (donc pas de son /ʎ/).

D'aileurs La <negligere> ← La <neg> + La <legere>.
furrykef   Sun Sep 09, 2007 12:16 pm GMT
<< I'd nominate standard Italian for a near 1-to-1 mapping. >>

Italian doesn't always mark stress, the letter 'z' can be either voiced or devoiced, and whether a vowel is open or closed is not always marked, so I'd say Italian is about on par with Spanish as far as being "phonetic", just in the other direction: with Spanish, it's trivial to know how to say any word given its spelling; with Italian, it's trivial to know how to write a word given its pronunciation. Both are still pretty easy going the other way around, but there is more room for error.

- Kef
Guest   Fri Sep 14, 2007 6:30 pm GMT
Is it possible to "measure" the complexity of a language by noting how difficult it is to get computers to extract the meaning? Also how about the relative difficulty of outputting words from an extracted meaning. (consider computer language translators, etc.)
furrykef   Fri Sep 14, 2007 11:23 pm GMT
<< Is it possible to "measure" the complexity of a language by noting how difficult it is to get computers to extract the meaning? >>

As far as I know, computers are completely unable to extract meaning from text. Automatic translators do little to figure out meaning... they just apply a hopelessly limited set of rules for transforming one text into another.

The problem isn't really in the extraction of meaning, but in handling meaning in the first place. In other words, the reason why computers suck at understanding language is because they don't understand the world. A brick is a simple object, but how can you tell a computer what it is? Here's an excerpt from an essay by Chris Crawford, a game designer:

* * *

The real problem, the killer, is this: if you want to put English into a computer, you must also squeeze in the rest of the universe.

Consider, for example, the simple word, "brick". Five letters long; you compaction freaks could probably squeeze it into a couple of bytes. The problem comes when you try to use the word. Do you realize just how much intellectual baggage comes along with that simple word? A brick is hard; if you intend to talk about bricks you had better understand what hardness means. It has flat sides; make sure that your program understands flatness, too. It is massive, so you'd better include all the laws of physics regarding the behavior of massive objects if you want to understand sentences such as, "Can you tie the paper to the brick and throw it across the chasm between us?" Oh, yes, it also has color, so be sure to include everything about color and while you're at it, be sure to include such things as texture and thermal conductivity, as well as magnetic susceptibility and ignition temperatures and combustibility...my, my, it does get out of hand quickly, doesn't it?

Language does not exist in isolation from reality. Language mirrors reality. We live in an immensely complex world, and short, simple words such as "brick" carry much of that complexity with them.

* * * END QUOTE

Now it may seem silly that you have to "squeeze in the rest of the universe" in this way, but this is how the human brain parses information. If I say something like "Time flies like an arrow", what is the meaning? We understand what the meaning is, but the computer could think that you mean "Time flies in the manner that you would time an arrow" (i.e., that "time" is being used as a verb and "flies" is a noun), or even "Time flies are pleased by an arrow" (i.e., that there is a species of fly called the "time fly", and they like arrows.) How can the computer tell which meaning is the correct one? It would have to use common sense. But it doesn't have any. Common sense is that "rest of the universe" element.

But I think once computers can really handle meaning, then extraction of meaning from text will be simple, no matter the language's complexity.

- Kef
Sam II   Mon Sep 17, 2007 4:02 pm GMT
<<<Is it possible to "measure" the complexity of a language by noting how difficult it is to get computers to extract the meaning? Also how about the relative difficulty of outputting words from an extracted meaning. (consider computer language translators, etc.) >>>

If you mean by to "extract the meaning" the process of translation, lets say from language x to a standard basic language, then I think this method should be a very good way to measure the inherent degree of logic of a language.
Guest   Tue Sep 18, 2007 11:39 am GMT
how could a computer optimized and consequently logical "standard basic language" look like?
Guest   Tue Sep 18, 2007 6:37 pm GMT
?(language(a
,and(optimized(computer),logical(consequently))
,standard
,basic
)
could_look
(like(how))
)
furrykef   Tue Sep 18, 2007 6:55 pm GMT
I doubt that. You didn't even match the number of opening and closing parentheses correctly. The computer would then see total gibberish! ;)

Lojban is an artificial language that's easier for computers to work with than natural languages, since it's designed to remove syntactic ambiguity. Semantic ambiguity is still possible, though, but you're much more aware of it with Lojban. The basic principle of Lojban is that any given sentence can only be parsed exactly one way, but the information in that sentence can still be incomplete or even nonsensical. (Indeed, some information must be omitted, because infinite precision requires infinite words.) You still have the problem that computers understand nothing about the universe, but at least now it doesn't have to in order to parse things correctly. It will still have to if it wants to understand any of the information contained in the sentence, though.

It does make for a poor intermediate language for translation purposes, though. Doing something like English -> Lojban -> Spanish will produce worse results than doing English -> Spanish, because it doesn't eliminate any of the problems in interpreting the English text, while it introduces problems of its own.

- Kef