11-13-2007, 11:11 PM | #1 (permalink) |
Insane
Location: New Zealand
|
An Internet Language
I had this thought while walking home and thinking about Mandarin/Cantonese. They are technically dialects of the same language, but are diverse enough to prevent verbal conversation between them. However, they are united by written Chinese. A written sentence will be pronounced totally differently in each dialect but will mean the same thing (actually, not quite, but it usually works well enough).
I thought it'd be great if you could do this to unite totally different languages, and with the globe-spanning power of the internet, why not? So I began thinking about a visual-only internet language. A cyberlanguage would probably have the following properties: 1) It never has to be pronounced, so we can use symbolic pictographs to make them universally recognisable. 2) Because it doesn't ever have to be written in pen, unlike Blissymbols, they can be more detailed and less stylised to make recognition more natural. 3) Like Blissymbols, learning is much faster than natural languages because there is no pronunciation to learn at the same time. 4) If the language has an isolated grammar (unlike Earth Language which merges symbols) it can be easily encoded as a font and presented in normal text fields. 5) The language can be designed around electronic input methods. Considerations can be taken like ease of use with qwerty keyboards and phone keypads, appropriateness for writing on-the-fly software translators and generalising to be as natural-language-inclusive as possible. 6) A simple, logic-based grammar means ambiguity is minimised. 7) Mathematical operators, (conveniently also globally consistent) like +, -, [], {} etc can be used when long sentences begin to degrade readability. Compare "1 + 2 x 3 + 4" with "1 + [2 x 3] + 4". A simple example, yes, but both are correct but one is more readable. 8) Borrow domain and namespace ideas from computer science, in particular, the internet-relevant bits of it like ., / and :. Although stemming from computer-jargon, most internet users have an idea of what these represent. 9) If we think beyond normal text fields, we could actually construct sentences in two or more dimensions, like in sign languages. 10) We could incorporate colour or text size for greater flexibility and a digital analogue of implicit communication like tone of voice and body language ie: sarcasm, intensity, etc. All of these things suggest the possibility of a modern, forward-thinking, tech-savvy international language with a focus of ease of learning and ease of typing. Pretty much all general constructed languages have been conceived under a misty-eyed dream for a united world language. I would consider this more like a short-term solution: not appropriate for a united world language because it can't be spoken (although there's no reason it can't be signed), but also because the speech would be logically constructed and probably jilted and ugly. It would also probably be longwinded. In the long term, a naturalistic language would be the better option, but this is the internet age, man! Everything is wired these days and the only thing stopping absolutely anyone from interacting is language. It might even be possible to design the language to be easier to write natural language translators for, either creating a pivot point to go to other languages or just something which can be read by anyone. The possibilities are pretty exciting and the thing that would set this system apart from other efforts is the total embrace of technology, which I think should be a championed point of this system. Please discuss! I'd really value criticism from speakers of languages that aren't english. Being a native english speaker I will naturally have a bias towards english ideas and grammars, and english grammar is terrifyingly convoluted. Key points i'd like to focus on are: - complete exploitation of present and future technologies (multi-touch surfaces anyone?) - simplicity and unambiguity (hopefully will naturally stem from logic foundations) - symbols to embody concepts that should at least be easy to remember, if not totally intuitive. - malleability: can we construct new words on the fly, and can others work out what they mean (think latin-derived english words like 'arachnophobia' and 'psychology') I really look forward to discussing this, mods please feel free to flick this to a more appropriate section (I couldnt pick one). Oh, and if anyone wants to help me set up a website or find a space for collaborative development of this, I'd really appreciate that too.
__________________
ignorance really is bliss. |
11-15-2007, 09:13 PM | #2 (permalink) |
Custom User Title
|
A very interesting idea. How to get from ground zero to actually using this is beyond me but it is intriguing. But why restrict it to the internet? Yes, develop it primarily for the internet but since its a print only language, why not use it as a rudimentary language to allow travelers in foreign countries to better understand signage and the like?
|
11-15-2007, 10:14 PM | #3 (permalink) |
Insane
Location: New Zealand
|
Yeah the technicalities are coming along OK but propagation will be the tricky bit.
As far as usage outside the internet, its perfectly possible to print, project, decal etc but not as easy to write by hand. It's something which Blissymbols are good at that I wanted to move away from, but printability is certainly retained.
__________________
ignorance really is bliss. |
11-15-2007, 11:01 PM | #5 (permalink) |
Young Crumudgeon
Location: Canada
|
propagation is the only issue, but it's nigh insurmountable.
It's relatively easy to develop an artificial language. A trade language can be a form of pidgin with simplified grammar and syntax; while it won't have the subtlety of an organic language, it's a trade-off for ease of learning and use. However, unlike an organic language, an artificial language is very hard to sell because there's no group that identify with it. German exists today and will continue to exist as long as there are German's to speak it. Same for English, Polish, French, Japanese, etc. Your internet language, on the other hand, has no ethnic group associated with it and very few people will be willing to learn it when the language(s) they speak already serve perfectly well for the vast majority of their needs. Even those whose native tongue doesn't allow them to communicate effectively (ie, immigrants who have yet to learn the local language) are better served by learning a language that's already in use. In other words, nobody wants to learn a constructed language because nobody speaks it, and nobody speaks it because nobody wants to learn it. It seems you're aware that this idea has been presented many times throughout history, but they've all inevitably hit the same stumbling block. The technical side is fine, but propagation just doesn't occur. EDIT - for further thoughts. The downside to using a pictographic language is that one must necessarily have an individual symbol for every concept. This is mitigated to a degree in written pictographic languages by modifying existing symbols to embody new concepts; however, when one attempts to design a language within the constraints of currently widely accepted technology this problem becomes a major hurdle. You're right that by using individual symbols you could create a language as a font; if we ignore the software side of it (which has it's own technical issues I'm not going to get into here) we still have a problem: if we attempt to adapt a standard 104 key QWERTY keyboard, using alt, shift and ctrl as modifiers and space as a delineater we have (104-7)x4 or 388 available keystrokes to assign. If each keystroke represents a seperate and distinct idea, one must find a way to limit the number of symbols to the available keystrokes without causing the meanings of each symbol to become so vague as to be useless in any practical application. Note as well that this is using every available key minus the system specific ones, which isn't really practical. If we assume that keys such as Esc and the lock keys (Caps lock, Scroll lock and Num lock) are out of bounds, the number of available keystrokes is even further reduced. One could, of course, assign symbols to individual phonemes; however, this defeats the purpose of designing the language for the internet, as one then has to decide what phonemes in what combinations carry which meanings (in other words, one must design a verbal language). I have no idea how one would surmount this problem; to my knowledge, nobody to date has attempted to design a language that's used in such a fashion. EDIT 2 - Lojban was developed phonetically and thus is not an internet-only language. Any language based on phonemes can be spoken. Whether or not anybody actually speaks lojban, I have no idea; it's another example of how constructed languages inevitably fail, because nobody has any motivation to use them. The most successful constructed language to date that I'm aware of is Esperanto, and even that is really more of a curiosity than anything else; upper limits put the number of speakers at about two million, which means that aside from obscure tribal dialects it is quite probably one of the least-spoken languages in use today. Personally, I think even two million is probably wildly optimistic, and expect that globally the number of people who are actually fluent is somewhere in the high tens or low hundreds of thousands.
__________________
I wake up in the morning more tired than before I slept I get through cryin' and I'm sadder than before I wept I get through thinkin' now, and the thoughts have left my head I get through speakin' and I can't remember, not a word that I said - Ben Harper, Show Me A Little Shame Last edited by Martian; 11-15-2007 at 11:43 PM.. |
11-16-2007, 02:06 AM | #6 (permalink) | |
Insane
Location: New Zealand
|
A comprehensive post, Martian, and some good points, but i think you're missing some key ideas here:
Propagation is a big ask. No argument there. I really really don't have any actual expectation that this will succeed. I'm more interested in the development of the language for the sheer academic sport of it. So, very well put. However: Quote:
As far as input methods go, I imagine that the user would only have to know the grammar, and type words in his own native language. When i type "tree" and press the spacebar, it turns it into the symbol for tree. I don't know if you've ever typed in japanese, but something similar to that, where when you finish a word it pops up a list of kanji that match it and you choose, or leave it alone. So if I were to type "Man :[cut tree ++]" the parser will: - render 'man' - leave ":[" cos its grammatical - ask me if i want 'to cut', 'cuts', 'cutting', 'cuttable' etc - render 'tree' - render '++' as 'many'. (alternatively I could type 'many' or if the program is advanced enough, use 'trees' instead of 'tree'.) Pretty straight forward, all I have to know the right way to build the sentence. The parser ought to be specific to the language of the person using it. Kinda esoteric stuff, I know. Its all in good fun tho. Also yeah, Lojban is very refined and unambiguous as I intend this project to be, but its phonetic; vocabulary must be memorised and phonemes mastered. It is therefore quite a bit harder than an unpronounced language.
__________________
ignorance really is bliss. |
|
11-16-2007, 09:42 PM | #7 (permalink) |
Young Crumudgeon
Location: Canada
|
Okay, I'm a little more clear on that. My mistake was in thinking that you expected the user to do his or her own parsing. Of course, if it's a language designed strictly for use on the internet, there's no reason that's necessary, which as you note allows the user interface to be adapted or altered as necessary; so long as the parser is there to act as an intermediate layer, the type of interface is almost wholly irrelevant.
However, I do still see an issue, which is in deciding how to render certain concepts so that they're universally agreed upon. You claim English as your native language, but seem familiar with Eastern languages as well. Do you speak anything other than English? I myself am bi-lingual and speak French as well as English. As such, I'm all too aware that some concepts that are easy to verbalize and/or differentiate in one language can be much harder to distinguish in another. Even if you do only speak English, you only have to watch one poorly dubbed Japanese film to realize exactly how true this is. But perhaps a more concrete example. French actually does have a fair amount in common grammatically with English, although they're by no means identical. However, there are some words in English that can translate into multiple different words in French, depending on context and the same goes the other way. The French verb faire, for example, can mean either to do or to make, contextually, which are two related concepts that are nonetheless distinct in English. Essentially, in English one does not say that one does a sweater (at least, it's not grammatically proper to do so), whereas in French it would be perfectly correct to say exactly that (je fais une chemise). This is a grade school level example, but it serves to illustrate I think. With human interpretation of the language, this is hardly an insurmountable issue; one simply translates the word contextually to it's correct analogue. Machine parsers, however, are somewhat poor at reading context, which in turns leads to mistranslations; at worst, this can render an output that says something markedly different from the input. While hardly an insurmountable problem, I'd imagine one may have difficulty creating a parser and interface that is robust enough to allow for this without it becoming unwieldy.
__________________
I wake up in the morning more tired than before I slept I get through cryin' and I'm sadder than before I wept I get through thinkin' now, and the thoughts have left my head I get through speakin' and I can't remember, not a word that I said - Ben Harper, Show Me A Little Shame |
11-17-2007, 12:35 PM | #8 (permalink) |
Insane
Location: New Zealand
|
Deciding which concepts to render and which ones to throw away is proving to be properly confusing. English distinguishes on a lot of concepts which are absent from other languages. Also there are a few elegant concepts which English bungled or implements implicitly which make it difficult for me to turn into nice symbols because my language makes it hard for me to grasp properly. What I need are people with a decent understanding of a variety of languages to provide representation for me.
I am natively english speaking, and while I wouldn't call myself bilingual I have some experience with Japanese and a passing knowledge of Chinese. I'm delving deeper into chinese because I think the grammar would be useful for this project, but studying Lojban and Interlingua is also proving quite insightful. Your 'faire' example I think is valid, but even a little tame. What we're dealing with is an idiom - in french the concepts 'create' and 'do' are not the same thing, they just have the same word. English is fucking terrifying for this sort of thing. Not only with an enviable collection of homonyms like that, but also a vast array of, when you think about it, really stupid idioms. Consider 'go to sleep', 'go fishing' and 'make some time'. None of these literally mean what they should, but are nonetheless the most correct way to say them. These are the things that cloud my brain when I'm trying to build my simplified grammar. As for interpretation, my first goal is human readability. Machine readability is a distant second on that list, but would certainly be very useful for example as a step in an automatic universal translator, using this new language as a pivot. Something to consider further along tho, definitely.
__________________
ignorance really is bliss. |
11-17-2007, 01:31 PM | #10 (permalink) |
Lover - Protector - Teacher
Location: Seattle, WA
|
I think developing an input device would also be difficult; what option would be available for people who utilize their machines for productive work in their native language (writing documents in English in Microsoft Word), for example, who also want to communicate using this "international" Internet language?
What is available to them? Some sort of complicated keyboard switching? For those who can't touch type, the labeling of the keys is essential.
__________________
"I'm typing on a computer of science, which is being sent by science wires to a little science server where you can access it. I'm not typing on a computer of philosophy or religion or whatever other thing you think can be used to understand the universe because they're a poor substitute in the role of understanding the universe which exists independent from ourselves." - Willravel |
11-17-2007, 06:59 PM | #11 (permalink) | |
Insane
Location: New Zealand
|
Quote:
CASE 1: The word matches one symbol When the user hits the spacebar it turns the natural word into the symbol. CASE 2: The word matches multiple symbols In the case where the input word is used for more than one concept, eg: the English 'break' as in 'destroy' and 'break' as in 'recess', the spacebar activates a small drop down box containing possible symbols, which can be selected with cursor keys and spacebar/return key. Reminiscent of typing Kanji in Japanese. CASE 3: The word matches no known symbols There are a few reasons the parser may not recognise a word, for example if its too specific for the programmer to have anticipated. In any of these case it'll say "No, I don't know this one" and the user will go "Oh," and be presented with the option to leave the word as it appears, (example: for personal or place names) or to start the word over again, where they will use simpler words to build a compound symbol to be the word to embody the concept they want. Example: (bear with the ASCII) -> symbols shown in *...* "I knew a guy who played guitar." User: I <space> Parser: *self* User: knew <space> Parser: *know*+(past inflection) User: man <space> Parser: *man* User: :[played <space> Parser: play ... *play a game or sport*, *play about by ones self* (like kids do) or *play music/instrument*? User: (select *play music*) Parser: *play music*+(past inflection) User: guitar] <space> Parser: *guitar*] If you can make sense of that, the user typed "I knew man :[played guitar]" in english, and got "*Self*know(past)*man:[*play-as-in-music(past)*guitar*]". which could be read by someone who knows not a single word of english. Thats the idea anyway.
__________________
ignorance really is bliss. |
|
11-17-2007, 09:42 PM | #13 (permalink) | |
Young Crumudgeon
Location: Canada
|
Quote:
The parsing, however, is still going to be a problem I think. Your language has to be machine-readable, because it has to be machine-writable. Have you ever used Babel Fish? Ever done any round-trip translations? Machines just are not currently able to effectively render languages. If you could actually design a program that translates properly from one language to another with no errors, you'd be the first. The problem, as you've so rightly recognized, is largely idiomatic.
__________________
I wake up in the morning more tired than before I slept I get through cryin' and I'm sadder than before I wept I get through thinkin' now, and the thoughts have left my head I get through speakin' and I can't remember, not a word that I said - Ben Harper, Show Me A Little Shame |
|
11-17-2007, 09:56 PM | #14 (permalink) |
Insane
Location: New Zealand
|
That is, in fact, a much more concise way of explaining it.
My parser will not have to read human language in the same way that Babelfish and its sibling Google Translate have to, because the user enters natural words, but not natural sentences. The only thing the user should really have to learn is the grammar. The grammar is constructed with symbols, which I plan will be not dissimilar from mathematics and symbols commonly found in internet-culture like : and /. The parser does not have to read "I knew a guy who played guitar", which is not machine-friendly, but simply "I knew man :[played guitar]", which is (or should be). The key points are minimising the learning curve and exploiting internet-age technology. The user will have some time getting used to the symbols but it should be fairly intuitive, and will have to learn the grammar, which will be fairly simple. The is no vocab to memorise, no twisted conjugation to master and no pronouncition to stumble over. In thoery, at least.
__________________
ignorance really is bliss. |
11-21-2007, 03:54 AM | #15 (permalink) |
Insane
Location: New Zealand
|
Holy crap, my fucking brain. I've spent the last few days knee-deep in wikipedia, but i've made a lot of progress from where I started. I have to admit that I'm kinda disappointed at the mediocre response my thread received, because what I really need is lots of people to bounce ideas off. I'm not blaming anyone, if its just not interesting then I totally understand, but how about this:
Does anyone know where I can find a community of people who would be interested enough to keep up an ongoing discussion so I can get this project critiqued and refined in the good old open-source style? Appreciated. EDIT: Big thanks to Martian tho! I appreciate your participation, it helped
__________________
ignorance really is bliss. |
12-03-2007, 07:38 PM | #16 (permalink) | |
Eat your vegetables
Super Moderator
Location: Arabidopsis-ville
|
Quote:
Lak, this is an awesome idea. I wish I could contribute in some way. Unfortunately I know nil about creating a universal symbolic written method of communicating.
__________________
"Sometimes I have to remember that things are brought to me for a reason, either for my own lessons or for the benefit of others." Cynthetiq "violence is no more or less real than non-violence." roachboy Last edited by genuinegirly; 12-03-2007 at 07:41 PM.. |
|
Tags |
internet, language |
|
|