sansay是什么牌子| 哈工大全称是什么| 酒后头疼吃什么| 头晕吃什么药| 男属龙和什么属相最配| 农历五月二十一是什么星座| 喝黑芝麻糊有什么好处| 莱字五行属什么| 七月十五日是什么节日| 阴阴阳是什么卦| 做脑部ct挂什么科| 日语一库一库是什么意思| 两头尖是什么中药| 油条配什么好吃| 不寐病属于什么病症| 胸部里面有个圆圆的硬东西是什么| e大饼是什么牌子| 蛇盘疮吃什么药| 梦见买帽子是什么意思| cd3cd4cd8都代表什么| 宝格丽表属于什么档次| 1985年属什么| 七月十四号是什么星座| 神话是什么意思| 大姨妈有血块是什么原因| 长河落日圆什么意思| 补牙属于口腔什么科| 女人吃什么提高性激素| 胃不好早餐吃什么好| 铂金什么颜色| 舌头疼痛吃什么药| 后脑勺发胀是什么原因| 鳞状上皮细胞高是什么原因| 屁股疼挂什么科室| 0101是什么意思| 21三体综合征是指什么| 荷尔蒙是什么东西起什么作用| 梦见自己爬山是什么意思| 属鼠的是什么命| 什么是硬下疳| 特应性皮炎用什么药膏| 神经过敏是什么意思| 胜利在什么| 高血脂不能吃什么| 山药与什么食物相克| 8月底是什么星座| 其实不然是什么意思| 小月子同房有什么危害| 心里害怕紧张恐惧是什么症状| 梦见自己相亲是什么意思| dmd是什么意思| 扑热息痛又叫什么名| lo什么意思| 素鸡是用什么做的| 马华念什么| 胃有灼烧感是什么原因| 低血糖是什么引起的| 风疹病毒是什么意思| 肚脐上方是什么器官| 排卵试纸强阳说明什么| 97年的属什么生肖| 长期低烧是什么原因| 红花跟藏红花有什么区别| 乳腺结节有什么危害| 快递属于什么行业| 凤毛麟角是什么意思| 颢字五行属什么| 心脏斑块是什么意思啊| 3月20日什么星座| 子宫糜烂有什么症状| 什么是复句| 造化弄人是什么意思| nt和唐筛有什么区别| 手部湿疹用什么药膏| 寒露是什么季节| 瑶浴是什么意思| 寿司的米饭是什么米| 小便无力是什么原因男| 孕妇吃坚果对胎儿有什么好处| 一龙一什么填十二生肖| 脱髓鞘病变是什么意思| 机械性窒息死亡是什么意思| 包皮有什么影响| 手心发热吃什么药| 戊是什么生肖| 痛风买什么药| 长期玩手机会得什么病| 吃石斛有什么功效| 回迁房是什么意思| phoebe是什么意思| 为什么不建议做肠镜| 东海龙王叫什么名字| 三月八号是什么星座| 曹操的脸谱是什么颜色| 夹腿是什么意思| 疑问是什么意思| 值神天德是什么意思| 冬至为什么吃饺子| 吃什么水果降火最快| 喜悦之情溢于言表什么意思| 痔疮吃什么药好得快| 诺如病毒吃什么药好得快一点| 黄芪补什么| 潮汐车道什么意思| 关羽姓什么| 考试前紧张吃什么药最好能缓解| 忠武路演员是什么意思| 2010年属虎的是什么命| 拉垮什么意思| 胬肉是什么意思| 乳房皮肤痒是什么原因| 心凉是什么意思| 梦见蛇预示着什么| 车抛锚是什么意思| 凌迟是什么| 口腔溃疡缺少什么维生素| 吃什么水果对肝好| 什么军什么马| 星字五行属什么| 马车标志是什么品牌| 小儿抽搐是什么原因引起的| dha是什么东西| 什么东西能美白| 射手女喜欢什么样的男生| yp是什么意思| 爱是什么排比句| 黄瓜和什么不能一起吃| 香槟玫瑰花语是什么意思| 双字五行属什么| 受凉肚子疼吃什么药| 前列腺钙化是什么意思| 脚凉是什么原因造成的| 益精是什么意思| 板带是什么| 什么情况属于诈骗| 医学ns是什么意思| 甘油三酯高吃什么降得快| 尾巴骨疼是什么原因| 燕窝是什么东西做的| 月经期适合吃什么水果| 掉头发严重是什么原因| 95开头的是什么电话| 怀孕为什么不能吃韭菜| 甘油是什么油| 吃紫菜有什么好处和坏处| 多吃什么可以长高| 眼花缭乱的意思是什么| 屁很多是什么原因造成的| 西安什么省| 老年人助听器什么牌子好| 补蛋白吃什么最好| 生气会得什么病| 什么是佛| 直爽是什么意思| 手指甲的月牙代表什么| 口臭是什么原因引起的| 产后复查都查什么| 人为什么要抽烟| 什么是虎牙| r是什么数| 师字五行属什么| 肠粉是用什么材料做的| 玻璃瓶属于什么垃圾| mrmrs是什么牌子| 经常感觉饿是什么原因| 拔牙后可以吃什么食物| 2002年是什么命| 子宫内膜厚吃什么食物好| 垚字是什么意思| 乳腺增生结节吃什么药效果好| 鼠疫是由什么引起的| 哈气是什么意思| 属相是什么意思| 胃炎应该吃什么药| 圆脸女生适合什么发型| 什么化妆品好用哪个牌子的| 那天午后我站在你家门口什么歌| 大什么大什么| 10月24号什么星座| 什么是鼻窦炎| 位移是什么| 男人嘴唇薄代表什么| 心率过快是什么原因| 李子与什么食物相克| 大姨妈黑色是什么原因| 代谢慢是什么原因| 肚子有腹水是什么症状| 搬家 送什么| 临官是什么意思| 6月11日什么星座| 籼米是什么米| 阴唇痒是什么原因| 吹风样杂音见于什么病| 2019年属什么| fe是什么意思| 新生儿满月打什么疫苗| 营卫是什么意思| 发热门诊属于什么科| 大便溏薄是什么意思| 神经官能症是什么病| 熵是什么| 时迁是什么意思| 坐高铁不能带什么| 拉肚子吃点什么食物好| 教师资格证有什么用| 芒果有什么好处和坏处| ssr是什么意思| 婴儿八个月可以吃什么辅食| 吃维c有什么好处| 瑾字属于五行属什么| 胸闷气短是什么原因造成的| 宝宝睡觉流口水是什么原因| 王莲是什么植物| 孙子兵法到底说什么| 嫔妃是什么意思| 头部MRI检查是什么意思| et是什么意思| 内热吃什么药| 28.88红包代表什么意思| 男性尿频是什么问题| 腺肌症是什么原因引起的| 角的大小和什么有关| 精子不液化吃什么药| 红字五行属什么| 止盈什么意思| 什么人没有国籍| 咖啡色配什么颜色好看| 刘备属相是什么生肖| 蝙蝠属于什么动物| 81是什么节日| 鸢是什么意思| 2009年出生属什么| 大便一粒粒的是什么原因| 没什么好怕| 6月27号是什么星座| 福建人喜欢吃什么口味| 牙龈有点发黑是什么原因| 柴鱼是什么鱼| 仙境是什么意思| 一张纸可以折什么| 神经性头疼吃什么药| 绝经后吃什么能来月经| 7月1号什么节| 男人做什么运动能提高性功能| 红细胞压积偏高是什么意思| 手指尖麻木是什么原因| 肚脐眼痒是什么原因| 意犹未尽什么意思| 伤骨头了吃什么好得快| 飞机为什么不能说一路顺风| 十一月是什么星座| 为什么吃荔枝会上火| 公租房是什么| 冬虫夏草到底是什么| 陌陌是干什么的| 单侧耳鸣是什么原因引起的| 聪明的女人是什么样的| 立秋什么意思| 儿童反复发烧什么原因| 大葱什么时候播种| 白带黄色是什么原因| 上皮细胞是什么| 2038年是什么年| 虎属什么五行| 百度
Unicode Frequently Asked Questions

天天酷跑3D大黄蜂怎么样 B级滑翔伞大黄蜂技能

百度 第六条人事部、国家发展改革委按照职责分工对招标采购专业技术人员职业水平评价工作进行指导、监督和检查。

Q: How was the encoding of the Tamil script in the Unicode Standard established?

The encoding of the Tamil script in the Unicode Standard was originally based on ISCII (1988). That encoding was the culmination of extensive work by many experts, including linguists, programmers, typographers, and experts in standards, although constrained by 8-bit character encodings prevalent in India at that time. Like the rest of Unicode, the encoding of Tamil is identical to that used in the International Standard ISO/IEC 10646.

Q: Are there shipping implementations of Unicode Tamil?

Unicode support for Tamil is implemented in all major desktop and mobile operating systems and browsers.

Q: Are there special issues with sorting Tamil text in Unicode?

Sorting order can almost never be handled by character placement in a code chart. That is true even for English—after all, Unicode code chart "Z" sort before "a". Furthermore, languages using the same script often sort differently, so handling sorting order is always separate from the encoding.

The Unicode Standard has an extensive associated standard, UTS #10, Unicode Collation Algorithm, devoted entirely to specifying mechanisms for collation. The Unicode Common Locale Data Repository (CLDR) then provides specific data for language-specific sorting for different languages.

Issues or improvements associated with the sorting or matching of characters in Tamil should be addressed in the context of the Unicode Collation Algorithm and the CLDR sort order specifications.

Q: Are details of the encoding important for natural language processing?

When considering the requirements of natural language processing, it is important to recognize the main purpose of the Unicode Standard: it is a plain text encoding, aimed at the problem of simple representation of textual content in traditional orthographies. There is of course nothing in the Unicode Standard that prevents researchers from developing higher-level protocols, such as markup schemes, to represent other aspects of textual content, including linguistic structure not directly evident from the ordinary writing system. Nor does the Unicode Standard preclude the development of alternative textual encodings for special-purpose processing such as automated NLP.

However, it would run counter to Unicode encoding principles to attempt to incorporate such higher-level protocols or alternative, special-purpose encodings directly into the Unicode Standard itself. A syllable-based re-encoding of Tamil, if aimed at NLP issues, is, therefore, essentially out of scope for the Unicode Standard. This is simply a matter of the level appropriateness for the representation of data in a plain text character encoding, over and beyond the crucial issue of maintaining the stability of the standard for existing implementations.

Q: Is Unicode encoding efficient for Tamil?

Efficiency of processing is not a simple matter of running a few test cases aimed at one or two processes. The Unicode encoding of any script, including Tamil, is meant to have a good overall efficiency in many kinds of text processing. Furthermore, efficiency considerations have to be balanced against many other considerations, including algorithmic complexity, legacy interoperability, and parallelism in support of multiple scripts and fonts.

In particular, comparisons of raw text size in bytes, under various encoding assumptions, are really only relevant to a limited number of operations involving pure plain text. Most real-world applications with text involve embedding of text in larger contexts of markup, graphics, and other data, and in such cases, efficiency concerns are dominated by the size of the other content, rather than that of the plain text content per se. There is thus no cause for advocating new encodings of scripts already encoded in the Unicode Standard based merely on comparisons of encoded text size. This is particularly true given that the resulting costs and impact of destabilizing the standard would far outweigh any marginal gains in processing speed in some limited contexts.

Moreover, there are compression schemes and other types of secondary techniques that can be used to achieve greater efficiency in speed of text processing, storage, and data retrieval for specific applications in specific languages. General purpose compressions such as ZIP work well.

Q: What about using private-use characters for encoding Tamil pure consonants and syllables?

This is a fine solution for internal processing, if an alternate representation is useful for the particular process. For example, a text-to-speech program might use a private-use encoding for English, whereby letters were separated according to pronunciations—the 'o' in 'love', 'rove', and 'move' all getting different private-use characters.

However, such implementations have limited usage. Private-use characters may overlap between different implementations, so general purpose programs cannot assume any particular interpretation of such characters. In general interchange, such as in search engines, private-use characters are typically treated as unknown characters or ignored. As a result private-use characters are inappropriate for open interchange.

Q: How are the Tamil pure consonants and syllables represented in Unicode?

Section 12.6, Tamil in The Unicode Standard contains a full table for Tamil, documenting all of the pure consonants and all of the syllables, showing exactly how they should appear and the precise sequence of Unicode code points used for each. The table is arranged in traditional Tamil syllable order, which is also important for understanding how Tamil should be sorted. We recommend use of that table as a starting point for discussions about Tamil in Unicode, because it makes it easier to understand how all Tamil consonants and syllables are represented in Unicode. Starting from the Unicode code chart itself can lead to misunderstandings.

Q: Do the Tamil pure consonants and syllables also have Unicode names?

Named sequences have been added for all Tamil pure consonants and syllables.

Q: If there are missing characters for Tamil, will those be encoded?

The Unicode Consortium has been very interested in the feedback it has received regarding missing characters, usage corrections, and improvements to the Tamil script block description. This feedback has markedly improved the coverage for Tamil for successive versions. For example, many historic fractions and others signs for Tamil have been added in the Tamil Supplement block. Continued input will lead to further improvements in the future.

The Unicode Standard can also add additional specifications of the behavior of sequences of Tamil characters. Such specifications can encompass many of the perceived advantages of a separate new encoding for Tamil, without requiring a disruptive change in the encoding.

Both the state of Tamil Nadu and the Government of India have participated as members of the Unicode Consortium, and the Consortium looks especially to them to help further improve the ability of Unicode to address the worldwide Tamil community.

Q: How is the Tamil syllable ஹோ (hō) encoded in Unicode?

This syllable can be encoded in two ways:

A.

Glyph for 0BB9
0BB9

Glyph for 0BCB
0BCB

B.

Glyph for 0BB9
0BB9

Glyph for 0BC7
0BC7

Glyph for 0BBE
0BBE

The consonant is always first in either case. The second character in Line A can be decomposed, but the order of occurrence in memory is always the same. In both cases the appearance is ஹோ. This is clearly documented in Section 12.6, Tamil in The Unicode Standard.

Note that Line A is in form NFC, which is the preferred form for most applications including HTML, XML and on the net. For more information, see UAX #15, Unicode Normalization Forms.

Q: Where can I find out more about Tamil Digit Zero?

"Tamil Digit Zero" is a modern innovation. An encoding for Tamil zero was added as of Unicode 4.1 in 2005, U+0BE6 TAMIL DIGIT ZERO, for implementations which need to support it. For more information on Tamil digits please Unicode Technical Note #21, "Tamil Numbers".

Q: What is the correct encoding for Tamil ligature shri?

Prior to Unicode 4.1, the best mapping to represent the ligature shri was to the sequence <U+0BB8, U+0BCD, U+0BB0, U+0BC0>. Unicode 4.1 added the character U+0BB6 TAMIL LETTER SHA and as a consequence, the best mapping became <U+0BB6, U+0BCD, U+0BB0, U+0BC0>. Both representations are widespread in existing text. Therefore, treating both representations as equivalent sequences is recommended, particularly in identifiers, such as domain names.

Q: Where can I find more about other scripts of India and South Asia?

See Indic Scripts.

海外是什么意思 2月22日什么星座 梦见捡钱了是什么预兆 笔画最多的字是什么 肝囊肿有什么危害
小巴西龟吃什么食物 纳字五行属什么 黄曲霉素是什么 代可可脂是什么 胆囊壁结晶是什么意思
血小板低有什么危险 c5是什么意思 缘故的故是什么意思 梦到和别人吵架是什么意思 做肠镜检查什么
知更鸟是什么鸟 女性出汗多是什么原因 积液是什么原因造成的 胸口隐隐作痛挂什么科 水瓶座有什么特点
91是什么东西hcv9jop4ns7r.cn 三氧化硫常温下是什么状态hcv9jop7ns2r.cn 乾隆是什么生肖naasee.com dx是什么药hcv8jop8ns1r.cn 海水是什么颜色的hcv9jop1ns7r.cn
米老鼠叫什么名字cj623037.com 致字五行属什么hcv8jop8ns7r.cn 国粹是什么hcv8jop4ns9r.cn 腿上的肉疼是什么原因hcv7jop7ns0r.cn 微喇裤配什么鞋子好看hcv9jop3ns1r.cn
火六局是什么意思hcv8jop5ns1r.cn 双鱼座的上升星座是什么hcv8jop6ns5r.cn 口腔溃疡缺什么维生素hcv7jop5ns2r.cn 双子座的幸运色是什么hcv8jop0ns7r.cn 未见明显胚芽是什么意思hcv8jop5ns4r.cn
芒硝是什么东西hcv7jop9ns6r.cn 兰陵为什么改名枣庄gangsutong.com 丁二醇是什么hcv8jop8ns5r.cn 除了肠镜还有什么方法检查肠道hcv8jop4ns8r.cn 葛根在农村叫什么hcv9jop3ns3r.cn
百度