[TF-AIDN] [Ext] Arabic Script LGR 2 report by TF-AIDN

Ali M. AlHoshaiyan ahoshaiyan at citc.gov.sa
Tue Jun 18 11:22:13 CEST 2019


Dear Dr. Sarmad,

These are just examples of the changes that are done to the original team work, please refer to the LGR sent by the team as it defines the rules needed to generate variants for the Arabic language

The blocking of variants such as Alef variants are is acceptable and these variants are needed.

Also, for example, the variant U+0648 of the code point U+0624 is incorrect as U+0624 (ؤ) is in fact a variant of 0621 (ء) in the Arabic language

So, I strongly advice to refer back to TF-AIDN work on the second level LGR as a lot of thoughtful hours has been put into making it
http://lists.meswg.org/pipermail/tf-aidn/2018-August/002531.html

Sincerely,

From: Sarmad Hussain <sarmad.hussain at icann.org>
Date: Thursday, June 13, 2019 at 2:33 PM
To: "Ali M. AlHoshaiyan" <ahoshaiyan at citc.gov.sa>, Ahmed Bakht Baloch <ahmedbakhat at gmail.com>, TF-AIDN ICANN <tf-aidn at meswg.org>
Subject: Re: [Ext] Arabic Script LGR 2 report by TF-AIDN

1) Let us start with U+0622 (ARABIC LETTER ALEF WITH MADDA ABOVE), it used to have the following variants:

U+0623 - allocatable
U+0625 - allocatable
U+0627 - activated

Now it has the following:

U+0623 - blocked
U+0625 - blocked
U+0627 - allocatable

This change effects variants generation greatly, for example if we take the label آنترنت and got its variants according to the old rules, we will get إنترنت as a variant, which is totally a valid one. But according to the new rules إنترنت will get blocked

>> Is there a significant requirement of آ being used a variant of أ in Arabic language?
>> Is there a significant requirement of آ being used a variant of إ in Arabic language?
 (Making all Alefs allocatable variants will create too many variant labels; a word with two Alefs will have 16 variant labels, with three Alefs will have 64 varinat labels; creating blocked variants helps contain such cases)

2) Code point U+0622 has a variant U+0627 that was changed from activated to allocatable, and it must be activated-extended for international reachability

>> 0622 --> 0627 change from “allocatable” to “activated”
(activated-extended is not suggested as this may be needed within Arabic language community, not just for international reachability)

3) Code points U+0623 and U+0625 have same issue, U+0627 has been changed to allocatable while the rest have been blocked.

>> are there real cases of إ being used a variant of أ in Arabic language?
(Making all Alefs allocatable variants will create too many variant labels; a word with two Alefs will have 16 variant labels, with three Alefs will have 64 varinat labels; creating blocked variants help contain such cases)

4) Code point U+0623 has a variant U+0627 that was changed from activated to allocatable, and it must be activated-extended for international reachability

>> 0623 --> 0627 change from “allocatable” to “activated”
(activated-extended is not suggested as this may be needed within Arabic language community, not just for international reachability)

5) Code point U+0625 has a variant U+0627 that was changed from activated to allocatable, and it must be activated-extended for international reachability

>> 0625 --> 0627 change from “allocatable” to “activated”
(activated-extended is not suggested as this may be needed within Arabic language community, not just for international reachability)

6) Code point U+0624 has a new variant added U+0648 which is an incorrect one, it should not have any variants

If both these characters are needed in Urdu, these would be variants in Urdu and make these variants in Arabic language as well.  Please reconfirm if both these code points should be included for Urdu. Currently 0624 not included

7) Code point U+0626 has the following variants:

U+0649
U+064A
U+06CC

and these variants were not requested by TF-AIDN. Also U+06D3 must be added as a blocked variant. Also the code point must not be limited on some specific positions i.e. initial-or-medial-position

>> remove 0626 from the variant set for Yehs
Why do we need to include 06D2 and 06D3? If being considered for international reachability, non-Arabic language users may not confuse 06D2 and 06D3 with other Yeh variant set members.  Is there a language which would confuse such cases?

8) Code point U+0627 now have all it's variants blocked which has the same affect mentioned before, for example انترنت used to have إنترنت allocatable but it is now blocked

>> changed to activated in one direction, see above
>> In the other direction, this will make U+0622, U+0623, U+0625 and U+0627 all variants of each other through transitivity, causing too many variant labels; also see 1) above.

For the following variants in U+0622, U+0623, U+0625 and U+0627:

U+0671 - blocked
U+0672 - blocked
U+0673 - blocked

These are script variants and deleting them has no affect on Arabic variants generation, the question here is, why were they removed? They have been added for future use, that is when a new language that uses arabic script joins and it use these code points, we don't need to study their relations with current code points again as it's already done.

>> 0671-0673 can be added when relevant languages are added.  They are redundant for Arabic language at this time

9) Code point U+0629 have two missing blocked variants U+06BE and U+06D5

>> Add 06BE to the variant set for Heh/Teh Marbuta – activated-extended
>> 06D5 can be added when relevant languages are added.  They are redundant for Arabic language at this time

10) U+062A should have U+063E and U+067A added as blocked variants

>> U+063E and U+067A can be added when relevant languages are added.  They are redundant for Arabic language at this time

11) U+062B should have U+063F, U+067D and U+06BD added as blocked variants

>> U+063F, U+067D and U+06BD can be added when relevant languages are added.  They are redundant for Arabic language at this time

12) Code point U+0641 has a missing blocked variant U+06A7

>> 06A7 is a variation of Arabic orthography which also uses 06A2 and makes 0641 and 0642 also variants of each other, as has been done for Arabic script LGR.  06A2 and 06A7 are excluded from the core Arabic language LGR to prevent making 0641 and 0642 variants

13) Code point U+0643 has a missing blocked variant U+06AA

>> U+06AA can be added when relevant languages are added.  They are redundant for Arabic language at this time

14) Code point U+0646 has a missing blocked variant U+06BA

>> Add 0646 and 06BA as variant set

15) For code point U+0647 (ARABIC LETTER HEH) variant U+0629 (ARABIC LETTER TEH MARBUTA) has been blocked, and this effects generation for labels such as مكه and السعوديه which have varaints مكة and السعودية respectivly blocked, but they are still valid variants, this variant must be allocatable

>> Make 0629 and 0647 allocatable variants in either direction

16) Code point U+0647 has two missing variants U+06BE (activated-extended) and U+06D5 (blocked)

>> discussed above

17) Code point U+0648 has a varaint U+0624 which is not requested by TF-AIDN

>> discussed above

18) For code point U+0649 (ARABIC LETTER ALEF MAKSURA) variant U+064A (ARABIC LETTER YEH) has been blocked which effects arabic language variants generation, take for example على should have a variant علي but it is now blocked

>> Make 0649 and 064A allocatable variants in either direction

20) Same goes other way for U+064A as its variant U+0649 is blocked

>> See 18)

21) Both code points U+0649 and U+064A have an incorrect variant U+0626 (Not requested by TF-AIDN)

>> See 7)

22) Both code points U+0649 and U+064A should have the following blocked variants:

U+066E
U+067B
U+06CD
U+06D0
U+06D2

These changes change the way arabic variants are generated and doesn't conform with what the team have agreed upon.

>> U+066E, U+067B, U+06CD and U+06D0 can be added when relevant languages are added.  They are redundant for Arabic language at this time
>> for 06D2 see comment in 7) above

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.meswg.org/pipermail/tf-aidn/attachments/20190618/a2a3cc5d/attachment-0001.html>


More information about the TF-AIDN mailing list