[TF-AIDN] [Ext] Arabic Script LGR 2 report by TF-AIDN

Ahmed Bakht Baloch ahmedbakhat at gmail.com
Sat Jun 15 23:17:10 CEST 2019


Dear Sir,

Many thanks for detailed analysis. We will analyze and respond to you soon.

On Thu, Jun 13, 2019 at 4:32 PM Sarmad Hussain <sarmad.hussain at icann.org>
wrote:

> Dear Ali, TF-AIDN members,
>
>
>
> Thank you again for the detailed analysis shared.  Kindly find below
> initial responses (highlighted in the message below) based on the feedback
> in the trailing email.  We would like to seek your further input on the
> response marked in yellow to finalize these points.  For the points
> highlighted in green, we will update the Arabic language LGR as suggested
> in the feedback.
>
>
>
> As you review these comments for further input, please note there are two
> over-arching motivations for the comments in yellow.  As you are all aware
> that RFC6912 suggests multiple principle which we need to adhere to in the
> LGR design even at the second level.
>
>
>
>    - Code points in the LGR are being limited to those which are needed
>    at this time.  Those which will be needed in the future can be added in the
>    future enhancements of this LGR.  This makes the LGR conservative and
>    minimal.  Only those code points are added for international reachability
>    for which corresponding language code point sets are currently included in
>    the LGR definition (Pashto, Persian and Urdu at this time) and will be
>    confused with Arabic code points by these language speakers
>    - Security and Stability Advisory Committee of ICANN in SAC060
>    <https://www.icann.org/en/system/files/files/sac-060-en.pdf>
>    (recommendation 14) advises to minimize the labels which are activated,
>    cautioning that large number of variant labels can cause management
>    problems for registrars and registrants.  Having too many allocatable (or
>    activated) variant labels causes this problem, in many cases over-producing
>    variant labels.
>
>
>
> Based on your responses to the highlighted comments below, we can update
> the solution further. Please feel free to raise any additional points.  We
> can also schedule a call for further discussed, as needed.
>
>
>
> Again, please note that we intend to iterate to get to a solution which is
> agreeable with TF-AIDN while balancing the constraints suggested by the
> technical community, before proceeding further to a public comment.
>
>
>
> We look forward to your further input.
>
>
>
> Regards,
> Sarmad
>
>
>
> 1) Let us start with U+0622 (ARABIC LETTER ALEF WITH MADDA ABOVE), it used
> to have the following variants:
>
>
>
> U+0623 - allocatable
>
> U+0625 - allocatable
>
> U+0627 - activated
>
>
>
> Now it has the following:
>
>
>
> U+0623 - blocked
>
> U+0625 - blocked
>
> U+0627 - allocatable
>
>
>
> This change effects variants generation greatly, for example if we take
> the label آنترنت and got its variants according to the old rules, we will
> get إنترنت as a variant, which is totally a valid one. But according to the
> new rules إنترنت will get blocked
>
>
>
> >> Is there a significant requirement of آ being used a variant of أ in
> Arabic language?
>
> >> Is there a significant requirement of آ being used a variant of إ in
> Arabic language?
>
>  (Making all Alefs allocatable variants will create too many variant
> labels; a word with two Alefs will have 16 variant labels, with three Alefs
> will have 64 varinat labels; creating blocked variants helps contain such
> cases)
>
>
>
> 2) Code point U+0622 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> >> 0622 à 0627 change from “allocatable” to “activated”
>
> (activated-extended is not suggested as this may be needed within Arabic
> language community, not just for international reachability)
>
>
>
> 3) Code points U+0623 and U+0625 have same issue, U+0627 has been changed
> to allocatable while the rest have been blocked.
>
>
>
> >> are there real cases of إ being used a variant of أ in Arabic
> language?
>
> (Making all Alefs allocatable variants will create too many variant
> labels; a word with two Alefs will have 16 variant labels, with three Alefs
> will have 64 varinat labels; creating blocked variants help contain such
> cases)
>
>
>
> 4) Code point U+0623 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> >> 0623 à 0627 change from “allocatable” to “activated”
>
> (activated-extended is not suggested as this may be needed within Arabic
> language community, not just for international reachability)
>
>
>
> 5) Code point U+0625 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> >> 0625 à 0627 change from “allocatable” to “activated”
>
> (activated-extended is not suggested as this may be needed within Arabic
> language community, not just for international reachability)
>
>
>
> 6) Code point U+0624 has a new variant added U+0648 which is an incorrect
> one, it should not have any variants
>
>
>
> If both these characters are needed in Urdu, these would be variants in
> Urdu and make these variants in Arabic language as well.  Please reconfirm
> if both these code points should be included for Urdu. Currently 0624 not
> included
>
>
>
> 7) Code point U+0626 has the following variants:
>
>
>
> U+0649
>
> U+064A
>
> U+06CC
>
>
>
> and these variants were not requested by TF-AIDN. Also U+06D3 must be
> added as a blocked variant. Also the code point must not be limited on some
> specific positions i.e. initial-or-medial-position
>
>
>
> >> remove 0626 from the variant set for Yehs
>
> Why do we need to include 06D2 and 06D3? If being considered for
> international reachability, non-Arabic language users may not confuse 06D2
> and 06D3 with other Yeh variant set members.  Is there a language which
> would confuse such cases?
>
>
>
> 8) Code point U+0627 now have all it's variants blocked which has the same
> affect mentioned before, for example انترنت used to have إنترنت allocatable
> but it is now blocked
>
>
>
> >> changed to activated in one direction, see above
>
> >> In the other direction, this will make U+0622, U+0623, U+0625 and
> U+0627 all variants of each other through transitivity, causing too many
> variant labels; also see 1) above.
>
>
>
> For the following variants in U+0622, U+0623, U+0625 and U+0627:
>
>
>
> U+0671 - blocked
>
> U+0672 - blocked
>
> U+0673 - blocked
>
>
>
> These are script variants and deleting them has no affect on Arabic
> variants generation, the question here is, why were they removed? They have
> been added for future use, that is when a new language that uses arabic
> script joins and it use these code points, we don't need to study their
> relations with current code points again as it's already done.
>
>
>
> >> 0671-0673 can be added when relevant languages are added.  They are
> redundant for Arabic language at this time
>
>
>
> 9) Code point U+0629 have two missing blocked variants U+06BE and U+06D5
>
>
>
> >> Add 06BE to the variant set for Heh/Teh Marbuta – activated-extended
>
> >> 06D5 can be added when relevant languages are added.  They are
> redundant for Arabic language at this time
>
>
>
> 10) U+062A should have U+063E and U+067A added as blocked variants
>
>
>
> >> U+063E and U+067A can be added when relevant languages are added.  They
> are redundant for Arabic language at this time
>
>
>
> 11) U+062B should have U+063F, U+067D and U+06BD added as blocked variants
>
>
>
> >> U+063F, U+067D and U+06BD can be added when relevant languages are
> added.  They are redundant for Arabic language at this time
>
>
>
> 12) Code point U+0641 has a missing blocked variant U+06A7
>
>
>
> >> 06A7 is a variation of Arabic orthography which also uses 06A2 and
> makes 0641 and 0642 also variants of each other, as has been done for
> Arabic script LGR.  06A2 and 06A7 are excluded from the core Arabic
> language LGR to prevent making 0641 and 0642 variants
>
>
>
> 13) Code point U+0643 has a missing blocked variant U+06AA
>
>
>
> >> U+06AA can be added when relevant languages are added.  They are
> redundant for Arabic language at this time
>
>
>
> 14) Code point U+0646 has a missing blocked variant U+06BA
>
>
>
> >> Add 0646 and 06BA as variant set
>
>
>
> 15) For code point U+0647 (ARABIC LETTER HEH) variant U+0629 (ARABIC
> LETTER TEH MARBUTA) has been blocked, and this effects generation for
> labels such as مكه and السعوديه which have varaints مكة and السعودية
> respectivly blocked, but they are still valid variants, this variant must
> be allocatable
>
>
>
> >> Make 0629 and 0647 allocatable variants in either direction
>
>
>
> 16) Code point U+0647 has two missing variants U+06BE (activated-extended)
> and U+06D5 (blocked)
>
>
>
> >> discussed above
>
>
>
> 17) Code point U+0648 has a varaint U+0624 which is not requested by
> TF-AIDN
>
>
>
> >> discussed above
>
>
>
> 18) For code point U+0649 (ARABIC LETTER ALEF MAKSURA) variant U+064A
> (ARABIC LETTER YEH) has been blocked which effects arabic language variants
> generation, take for example على should have a variant علي but it is now
> blocked
>
>
>
> >> Make 0649 and 064A allocatable variants in either direction
>
>
>
> 20) Same goes other way for U+064A as its variant U+0649 is blocked
>
>
>
> >> See 18)
>
>
>
> 21) Both code points U+0649 and U+064A have an incorrect variant U+0626
> (Not requested by TF-AIDN)
>
>
>
> >> See 7)
>
>
>
> 22) Both code points U+0649 and U+064A should have the following blocked
> variants:
>
>
>
> U+066E
>
> U+067B
>
> U+06CD
>
> U+06D0
>
> U+06D2
>
>
>
> These changes change the way arabic variants are generated and doesn't
> conform with what the team have agreed upon.
>
>
> >> U+066E, U+067B, U+06CD and U+06D0 can be added when relevant languages
> are added.  They are redundant for Arabic language at this time
>
> >> for 06D2 see comment in 7) above
>
>
>
>
>
>
>
>
>
>
>
> *From: *Sarmad Hussain <sarmad.hussain at icann.org>
> *Date: *Thursday, May 23, 2019 at 4:47 PM
> *To: *"Ali M. AlHoshaiyan" <ahoshaiyan at citc.gov.sa>, Ahmed Bakht Baloch <
> ahmedbakhat at gmail.com>
> *Cc: *TF-AIDN ICANN <tf-aidn at meswg.org>
> *Subject: *RE: [Ext] Arabic Script LGR 2 report by TF-AIDN
>
>
>
> Dear Ali,
>
>
>
> Thank you for the detailed feedback.
>
>
>
> We will incorporate these suggestions and share another version.  We will
> get back to you in case we have any questions.
>
>
>
> Regards,
> Sarmad
>
>
>
> *From:* Ali M. AlHoshaiyan <ahoshaiyan at citc.gov.sa>
> *Sent:* Thursday, May 23, 2019 4:33 PM
> *To:* Sarmad Hussain <sarmad.hussain at icann.org>; Ahmed Bakht Baloch <
> ahmedbakhat at gmail.com>
> *Cc:* TF-AIDN ICANN <tf-aidn at meswg.org>
> *Subject:* Re: [Ext] Arabic Script LGR 2 report by TF-AIDN
>
>
>
> Dear Dr. Sarmad,
>
>
>
> After comparing the Arabic language with what has been sent on 31 Aug (
> http://lists.meswg.org/pipermail/tf-aidn/2018-August/002531.html) I have
> found a lot of changes in the code points and their variants.
>
>
>
> I will list some of the most important changes discovered:
>
>
>
> 1) Let us start with U+0622 (ARABIC LETTER ALEF WITH MADDA ABOVE), it used
> to have the following variants:
>
>
>
> U+0623 - allocatable
>
> U+0625 - allocatable
>
> U+0627 - activated
>
>
>
> Now it has the following:
>
>
>
> U+0623 - blocked
>
> U+0625 - blocked
>
> U+0627 - allocatable
>
>
>
> This change effects variants generation greatly, for example if we take
> the label آنترنت and got its variants according to the old rules, we will
> get إنترنت as a variant, which is totally a valid one. But according to
> the new rules إنترنت will get blocked
>
>
>
> 2) Code point U+0622 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> 3) Code points U+0623 and U+0625 have same issue, U+0627 has been changed
> to allocatable while the rest have been blocked.
>
>
>
> 4) Code point U+0623 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> 5) Code point U+0625 has a variant U+0627 that was changed from activated
> to allocatable, and it must be activated-extended for international
> reachability
>
>
>
> 6) Code point U+0624 has a new variant added U+0648 which is an incorrect
> one, it should not have any variants
>
>
>
> 7) Code point U+0626 has the following variants:
>
>
>
> U+0649
>
> U+064A
>
> U+06CC
>
>
>
> and these variants were not requested by TF-AIDN. Also U+06D3 must be
> added as a blocked variant. Also the code point must not be limited on some
> specific positions i.e. initial-or-medial-position
>
>
>
> 8) Code point U+0627 now have all it's variants blocked which has the same
> affect mentioned before, for example انترنت used to have إنترنت
> allocatable but it is now blocked
>
>
>
> For the following variants in U+0622, U+0623, U+0625 and U+0627:
>
>
>
> U+0671 - blocked
>
> U+0672 - blocked
>
> U+0673 - blocked
>
>
>
> These are script variants and deleting them has no affect on Arabic
> variants generation, the question here is, why were they removed? They have
> been added for future use, that is when a new language that uses arabic
> script joins and it use these code points, we don't need to study their
> relations with current code points again as it's already done.
>
>
>
> 9) Code point U+0629 have two missing blocked variants U+06BE and U+06D5
>
>
>
> 10) U+062A should have U+063E and U+067A added as blocked variants
>
>
>
> 11) U+062B should have U+063F, U+067D and U+06BD added as blocked variants
>
>
>
> 12) Code point U+0641 has a missing blocked variant U+06A7
>
>
>
> 13) Code point U+0643 has a missing blocked variant U+06AA
>
>
>
> 14) Code point U+0646 has a missing blocked variant U+06BA
>
>
>
> 15) For code point U+0647 (ARABIC LETTER HEH) variant U+0629 (ARABIC
> LETTER TEH MARBUTA) has been blocked, and this effects generation for
> labels such as مكه and السعوديه which have varaints مكة and السعودية
> respectivly blocked, but they are still valid variants, this variant must
> be allocatable
>
>
>
> 16) Code point U+0647 has two missing variants U+06BE (activated-extended)
> and U+06D5 (blocked)
>
>
>
> 17) Code point U+0648 has a varaint U+0624 which is not requested by
> TF-AIDN
>
>
>
> 18) For code point U+0649 (ARABIC LETTER ALEF MAKSURA) variant U+064A
> (ARABIC LETTER YEH) has been blocked which effects arabic language variants
> generation, take for example على should have a variant علي but it is now
> blocked
>
>
>
> 20) Same goes other way for U+064A as its variant U+0649 is blocked
>
>
>
> 21) Both code points U+0649 and U+064A have an incorrect variant U+0626
> (Not requested by TF-AIDN)
>
>
>
> 22) Both code points U+0649 and U+064A should have the following blocked
> variants:
>
>
>
> U+066E
>
> U+067B
>
> U+06CD
>
> U+06D0
>
> U+06D2
>
>
>
> These changes change the way arabic variants are generated and doesn't
> conform with what the team have agreed upon.
>
>
>
>
> ------------------------------
>
> *From:* tf-aidn-bounces at meswg.org <tf-aidn-bounces at meswg.org> on behalf
> of Sarmad Hussain <sarmad.hussain at icann.org>
> *Sent:* Tuesday, May 14, 2019 9:14 AM
> *To:* Ahmed Bakht Baloch
> *Cc:* TF-AIDN ICANN
> *Subject:* Re: [TF-AIDN] [Ext] Arabic Script LGR 2 report by TF-AIDN
>
>
>
> Dear Ahmed, all,
>
>
>
> We would like to thank TF-AIDN again for sharing the second level LGRs for
> the various languages and scripts.
>
>
>
> Please find attached two second level LGRs, currently reviewed and edited
> to make these symmetric and transitive, but still in the spirit to capture
> what was suggested by TF-IDN:
>
>
>
> ·         Arabic script
>
> ·         Arabic language
>
>
>
> For Arabic language, we have introduced an “activated-extended” type to
> distinguish it from “activated” type, former specifically for international
> reachability.  This is set by default to “allocatable” with notes for
> registry that this can be made “blocked” or “activated” based on registry’s
> policy preference.  Kindly note that SSAC in its SAC060 report asks for
> reducing the amounts of variant labels generated for use, as these can
> cause management issues.
>
>
>
> Following our subsequent discussion for clarification, for the Arabic
> script, we have used the script based LGR by Arabic GP and added digits and
> hyphen.  Please let us know if this is sufficient or any further additions
> are needed.
>
>
>
> These are initial drafts, based on input from TF-AIDN.  May we request you
> to review these and share additional changes you would like.  Once we have
> a version which is agreeable with TF-AIDN, we will take it to public
> comment for feedback from wider community.
>
>
>
> We will also work on the remaining second level LGRs shared by TF-AIDN
> after these are finalized.
>
>
>
> We thank TF-AIDN for their effort and look forward to your detailed
> review.
>
>
>
> Regards,
> Sarmad
>
>
>
> *From:* Sarmad Hussain
> *Sent:* Sunday, September 2, 2018 11:12 AM
> *To:* Ahmed Bakht Baloch <ahmedbakhat at gmail.com>
> *Cc:* TF-AIDN ICANN <tf-aidn at meswg.org>; Fahd Batayneh <
> fahd.batayneh at icann.org>; Zied BOUZIRI <zied.bouziri at gmail.com>; Inam
> Ullah (torwalpk at yahoo.com) <torwalpk at yahoo.com>
> *Subject:* RE: [Ext] Arabic Script LGR 2 report by TF-AIDN
>
>
>
> Dear Ahmed, TF-AIDN colleagues,
>
>
>
> We thank you for developing the second level LGRs and sharing with us.
>   We will get back to you in case of any queries.
>
>
>
> As next steps, we will take this as input towards developing the reference
> LGRs for the second level [icann.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_resources_pages_second-2Dlevel-2Dlgr-2D2015-2D06-2D21-2Den&d=DwMF_g&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=SdDmJ1uDf4nQCASKbONLRlP0wLpYhRAIzsczSjRKmnU&s=4C2zhImGk_2hmvp7R01WkqtbyTS3e7h42ZVVc-HJNNk&e=>,
> which are being published by ICANN.  We will keep you updated as we proceed
> forward.
>
>
>
> Regards,
> Sarmad
>
>
>
> *From:* Ahmed Bakht Baloch <ahmedbakhat at gmail.com>
> *Sent:* Saturday, September 1, 2018 1:40 AM
> *To:* Sarmad Hussain <sarmad.hussain at icann.org>
> *Cc:* TF-AIDN ICANN <tf-aidn at meswg.org>; Fahd Batayneh <
> fahd.batayneh at icann.org>; Zied BOUZIRI <zied.bouziri at gmail.com>; Inam
> Ullah (torwalpk at yahoo.com) <torwalpk at yahoo.com>
> *Subject:* [Ext] Arabic Script LGR 2 report by TF-AIDN
>
>
>
> Dear Dr. Sarmad,
>
>
>
> TF-AIDN has deliberated for the completion of Label Generation Rule for
> second level (LGR2) and processed five files, one Combined Arabic Script
> languages file and four individual langues including Arabic, Urdu, Persian
> and Pashto, by the relevant language communities.
>
>
>
> Report containing details of the files & work done by TF-AIDN and language
> table files in XML and pdf formats, for incorporation in the LGR 2  and
> consideration please.
>
>
>
>
>
> Best Regards,
>
> Ahmed Bakht
>


-- 
Best Regards,
Ahmed Bakht
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.meswg.org/pipermail/tf-aidn/attachments/20190616/8efbcee1/attachment-0001.html>


More information about the TF-AIDN mailing list