Miscellaneous > The Lounge

Who invented char sets?

<< < (2/4) > >>

muzzy:
I don't quite understand the question. Is the question, who invented the currently used charsets or who invented the concept of mapping a number to a symbol?

In case of the concept, it already exists in alphabet as well. There exists a rather intuitive mapping of 1,2,3 into A,B,C and so on. Even before the latin alphabet, there were phoenicians and so on. Thus, the concept of a "character map" goes way back, their use in computers is just a natural adaption of this.

Regarding the currently used charsets, you could check ASCII from wikipedia for a good starting place: http://en.wikipedia.org/wiki/Ascii

Typically, charsets refer to the mapping of 8 bits into a character, and that's where the mess comes. ASCII only defines characters for 7 bits, so half of the space is free for others to use. As the result, zillions of different mappings exist for each nationality to have their own special characters. It's quite a mess.

Thankfully, there exists an ASCII compatible way to extend 8 bit tokens to represent larger than 8 bit symbol space, and it's called UTF-8. Multiple bytes are used to encode characters above the ascii range, and in theory any other characters sets would no longer be needed at all.

So, screw the whole charset crap and move to UTF-8. It's the wave of the future, unicode will take over the world!

cymon:
And thankfully that's possible, because it has support for different alphabets.

Calum:
in my opinion they should just have one huge international 16 bit convention with every conceivable letter in it and then there'd be no problem ever for anybody.

so essentially the idea of character mapping is an integral part of literate numeracy then?

Pathos:
hmm, but what if a new mathematics is formed that adds a whole new range of characters?

I do agree that we have to move away from ascii and make a multi byte character set standard on all operating systems. Linux could do it, Windows probably couldn't (too may legacy apps).

muzzy:
16 bits isn't enough in the long run IMO. I think 21 bits are currently used out of the 32bit unicode format, which is what utf-8 can provide with 4 octets. The 21 bit glyph space should be large enough, but utf-8 has space for extension if necessary.

Also, it's silly to say that windows couldn't do it because windows kernel has been fully unicode for years already. Also, linux has a lot of issues regarding multibyte character sequence because a lot of things starting from strlen() tend to break. All manually implemented string processing breaks, and there's a lot of that in most C applications. A lot of the same issue apply to windows, too, although there's a compatibility system which tries to make things a little bit transparent and it works fine a lot of the time.

Everything in windows is unicode (if we disregard win9x), and the winapi provides separate versions of all api functions for unicode and ascii strings. If you're a programmer, you've noticed CreateFileA vs CreateFileW and things like that. A and W stand for ASCII and widechar accordingly.

Basically, people just need to learn to write new applications using unicode strings. This would be significantly easier in a high level language that has some sort of string abstraction, so it's time to abandon C as the primary application programming language. Was about the time, too :)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version