Unicode is a character encoding standard that is commonly used in IT in different areas. Unicode is an international standard that is created in 1987 as an alternative to the ASCII and other character sets. As of March 2020, the Unicode character set version is 13.0 and contains 143,859 characters from different languages and alphabets. Currently Unicode character set covers 154 modern alphabets with set and emoji symbols.
Unicode Versions and History
Unicode is very popular and host encoding standard where the latest version is 13.0 which is released March 2020.
- Even standardized in 1988 the version 1.0 is released in October 1991 which contains 7,129 characters and supports alphabets like Arabic, Bengali, Greek, Lao, Latin, Tibetan, etc.
- Unicode 2.0 is released in July 1996 and contains 38,885 characters and some updates about existing alphabets like Hangul, Tibetan, etc.
- Unicode version 3.0 is released in September 1999 and contains 49,194 characters. This version added Cherokee, Ethiopic, Khmer, Mongolian, etc. alphabets.
- Unicode version 4.0 is released in April 2003 and contains 96,382 characters.
- Unicode version 5.0 is released in July 2006.
- Unicode version 6.0 is released in October 2010.
- Unicode version 7.0 is released in June 2014.
- Unicode version 8.0 is released in June 2015.
- Unicode version 9.0 is released in June 2016.
- Unicode version 10.0 is released in June 2017.
- Unicode version 11.0 is released in June 2018.
- Unicode version 12.0 is released in March 2019.
- Unicode version 13.0 is released in March 2020.
Unicode Encoding Standard
The Unicode standard is created in order to unify different character sets into single, standardized, and clear versions. Unicode character set is implemented in different technologies like operating systems, XML, Java programming language, PHP, Python, .Net, etc. Unicode can be implemented in different character encodings like UTF-8, UTF-16, UTF-32.
Unicode Standard Encoding Formats
Unicode Standard consist of multiple encoding formats with different sizes.
UTF-8 is the smallest encoding format which uses from 1 byte to 4 bytes. UTF-8 is the most popular Unicode standard where 94% of web sites supports and uses it. First 128 characters represents ASCII characters.
UTF-32 are other higher capacity encoding formats that use 4 bytes and a lot of different characters.
Unicode Encoding/Character Set Usage and Adoption
Unicode standard is very popular and commonly adopted into different technologies.
In order to use Unicode Encoding the operating system must support it. Windows NT operating systems like Windows 2000, Windows XP, Windows Vista, Windows 7, Windows 8, and Windows 10 support UTF-8 and UTF-16. Also modern Linux distributions and MacOSX support UTF-8 and UTF-16 too.
Programming languages like Java, Python, PHP, .NET support both UTF-8 and UTF-16 to read and write files.
The internet standardization consortium W3C recommends Unicode as their document character set since HTML version 4.0. Web browsers like Google Chrome, Mozilla Firefox, Microsoft Edge, Opera, Safari supports UTF-8 for many years.
Unicode Support For Emoji
Ununicode standard support different emojies whose are popularly used in todays text messaging and chat applications. Also these unicode emojies also used for comments and normal text in websites and forums. Below you can see different emojies and related unicode value.