[TF-AIDN] Urgent: IP feedback on Arabic LGR Proposal and updated version

Sarmad Hussain sarmad.hussain at icann.org
Sat Nov 14 10:54:10 CET 2015


Dear All, 

 

FYI. Here is the output of running the LGR being proposed on the test data we shared.  

 

A summary of the stats on this data:

 

Total strings: 497

Total labels generated (including variants): 109721

Average no. of labels per string: 220

Average no. of allocatable labels: 2  (max 23: "‎أمينة‎" (0623 0645 064A 0646 0629))

Average no. of blocked labels: 203 (max 12388: "‎السياسية‎" (0627 0644 0633 064A 0627 0633 064A 0629))

Average no. of invalid labels: 15 (max 1200: "‎القذافي‎" (0627 0644 0642 0630 0627 0641 064A))

 

We can infer the following from the data:

 

-          Significant no. of variant labels are possible (average 220 labels per string)

-          The blocked variant mapping is contributing significantly to reducing the allocatable labels (203 / 220)

-          The WLE rules are also contributing significantly in containing the allocatable labels (15 / 220)

-          We are getting very few allocatable labels on average ( 2 / 220)

 

Strings with more than a 1000 variants (of course with most of these blocked):

 

12800 variants generated for "‎السياسية‎" (0627 0644 0633 064A 0627 0633 064A 0629)

8192 variants generated for "‎یونیورسٹی‎" (06CC 0648 0646 06CC 0648 0631 0633 0679 06CC)

6400 variants generated for "‎العراقية‎" (0627 0644 0639 0631 0627 0642 064A 0629)

5120 variants generated for "‎ننګیالی‎" (0646 0646 06AB 06CC 0627 0644 06CC)

5120 variants generated for "‎نهاية‎" (0646 0647 0627 064A 0629)

3200 variants generated for "‎القذافي‎" (0627 0644 0642 0630 0627 0641 064A)

2560 variants generated for "‎همسایه‎" (0647 0645 0633 0627 06CC 0647)

1920 variants generated for "‎کڽبۑیا‎" (06A9 06BD 0628 06D1 06CC 0627)

1600 variants generated for "‎ابراهیم‎" (0627 0628 0631 0627 0647 06CC 0645)

1600 variants generated for "‎ابراہیم‎" (0627 0628 0631 0627 06C1 06CC 0645)

1600 variants generated for "‎افغانۍ‎" (0627 0641 063A 0627 0646 06CD)

1600 variants generated for "‎الحياة‎" (0627 0644 062D 064A 0627 0629)

1600 variants generated for "‎الخارجية‎" (0627 0644 062E 0627 0631 062C 064A 0629)

1600 variants generated for "‎السياسي‎" (0627 0644 0633 064A 0627 0633 064A)

1600 variants generated for "‎روښانتيا‎" (0631 0648 069A 0627 0646 062A 064A 0627)

1440 variants generated for "‎کارکردگی‎" (06A9 0627 0631 06A9 0631 062F 06AF 06CC)

1280 variants generated for "‎الوطنية‎" (0627 0644 0648 0637 0646 064A 0629)

1280 variants generated for "‎دیوانه‎" (062F 06CC 0648 0627 0646 0647)

1280 variants generated for "‎گیاه‎" (06AF 06CC 0627 0647)

 

Regards,
Sarmad

 

From: Sarmad Hussain 
Sent: Friday, November 13, 2015 7:19 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org>
Subject: Urgent: IP feedback on Arabic LGR Proposal and updated version

 

Dear All,

 

The IP have done another detailed review of the work submitted by TF-AIDN have provided some feedback.  There are no substantive changes, but they have recommended some corrections in references and significant structural changes in the proposal document to make it more amenable for general readership.  In addition, they have also identified some small typos/edits.  As these documents become part of a permanent record for the Root zone LGR, the IP recommend that TF-AIDN make the suggested changes.

 

The feedback of the IP is given below.  We have updated the proposal and XML files to address it (as per the details in red after each comment).  Kindly review all these changes and let us know if you have any feedback by 18 Nov.  

 

Here are the comments shared by the IP members:

 

1) The main Proposal document needs to be streamlined so that it is easier to locate the actual and final content. There are many excellent suggestions on how this could be achieved in the review comments below. Luckily, most of them do not require detailed rewrites, only block moves.

a.       Section 2.4 is way too long as a main body part. Most of it should be pushed to an annex. 

b.      Section 3.3 Code point repertoire excluded. That section should be moved into an annex. The format is too similar to that of the section before, and it is easy to get confused. The alternative is to merge 3.2 and 3.3 and clearly indicate the status included/excluded in a new column.

c.       Section 4. Initial Analysis of Variant Mappings and Types. That section should be pushed to an Annex. … I have in the past sometimes looked by error as the real variant table (not obvious at first glance that it is preliminary, especially if you skip its first page). 

 

>> The three sections listed in a, b, and c have been moved to the appendices and references to them added to the main body of the text.


2) 3.1 Summary of code point repertoire included and excluded should be replaced with a better version

 

>> A new cleaner version of this image has been added to the document, replacing the older version

 

3) 3.2 Code point repertoire included

All entries have a column with an extract of the Unicode file UnicodeData.txt which contains many redundant information or even worse confusing. No need to repeat the code point (already provided in 2nd column). The properties that matter are the same for all Arabic letters (L0,0,AL). And the rest of the information is confusing (no need to put the canonical decomposition because NFC uses the shortest form. And the Unicode 1.0 names have no value in this context. In other words, that column should just contain the name.

 

>> This change has not been made as the extra information only adds detail and is not impacting the work

 

4) References

U+06A2: ref#132 missing (both doc and xml)

U+06B5: ref#116 in doc is incorrect, should be 106 (xml is correct)

U+0751: ref#130 is incorrect, should be 128 (both doc and xml)

U+0760: ref# 122 is incorrect, should be 121 (both doc and xml)

Bad link for ref#120, should be http://paul-timothy.net/pages/ajamisenegal/primers/je_sais_le_wolofal_harmattan_20-oct-2015_a4.pdf

 

>> The references have been edited and updated

 

5) Variants that should have comments: Table 16a; Table 2a: 06C1<->06C2 and 06D5<->06C0; for these pairs it is again a matter of simplification with more or less ornaments, but the languages affected is not known to me but should be indicated as for the other examples.  Table 13a: 06AF<->0763 (for this case the comment is hidden in the table header when it should really be in the variant table itself)

 

>> The relevant textual explanation has been added for these allocatable variant pairs  Please check relevant rows in Table 16a and Table 2a.

 

 

Regards,
Sarmad

 

 

 

 

 

From: Sarmad Hussain 
Sent: Saturday, November 07, 2015 7:29 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: RE: [TF-AIDN] Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear All, 

 

These files have been handed over to the Integration Panel for evaluation of the proposal.  Thank you all for your contributions.

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Thursday, November 05, 2015 9:48 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: RE: [TF-AIDN] Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear All,

 

Please find attached the final set of files being submitted to the IP on behalf of the TF-AIDN.  Version 3.2 of the document includes versioning information and references to XML and test data files.  In addition, it has some minor edits (capitalization of main headings).  

 

Thanks to Alireza for the updates and reference checks.  

 

This is the final version being submitted to the IP for final review and approval.  

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Wednesday, November 04, 2015 1:56 PM
To: tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> 
Subject: RE: [TF-AIDN] Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear All,

 

Here is an updated version of the proposal document with the latest version of XML for your review.  

 

In the current version some references have been edited or their numbering adjusted.  Raed’s name has been added in the ack. list.

 

Kindly note that this document is in its final call for comments till 5 Nov. 2015.  Please let us know if you have any further feedback.  This will be submitted to the IP for final evaluation after this date.

 

Regards,
Sarmad

 

 

 

 

From: Abdulrahman I. ALGhadir [mailto:aghadir at citc.gov.sa] 
Sent: Saturday, October 31, 2015 6:59 PM
To: Sarmad Hussain <sarmad.hussain at icann.org <mailto:sarmad.hussain at icann.org> >; tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> 
Subject: RE: [TF-AIDN] Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear Sarmad,

I would suggest adding Raed Al-Fayez, as you know both him and Dr. Al-Zoman contributed us with useful information.

Best wishes,

  _____  

From:  <mailto:tf-aidn-bounces at meswg.org> tf-aidn-bounces at meswg.org [tf-aidn-bounces at meswg.org] on behalf of Sarmad Hussain [sarmad.hussain at icann.org]
Sent: Friday, October 30, 2015 2:47 PM
To:  <mailto:tf-aidn at meswg.org> tf-aidn at meswg.org
Subject: Re: [TF-AIDN] Final call for comments on Arabic LGR Proposal by 5 Nov.

Dear All,

 

Kindly note that I have added an Acknowledgement section after the list of TF-AIDN members in the relevant Appendix.  Please suggest any names which should be added to this list, in case we are missing them.

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Friday, October 30, 2015 4:26 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: RE: Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear All,

 

Please attached the sample file containing invalid labels.  Please let us know if you have any feedback or want to suggest any additions.

 

This includes at least one example of each WLE and some random examples of out of repertoire code points.

 

I suggest adding this and the other test word list we had developed earlier as appendices to the proposal, so that the testing data is also part of the proposal, instead of submitting extra excel files (latter being less manageable).  This allows anybody who has the proposal document to also have access to the test data.   

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Thursday, October 29, 2015 12:17 AM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: Final call for comments on Arabic LGR Proposal by 5 Nov.

 

Dear All,

 

The IP has accepted the additional evidence provided by Meikal for the two code points are so we are OK to proceed with finalizing the proposal.  I have updated the references in the repertoire table and addressed other edits suggested by the IP and also added the text in response to Patrik’s comment, shared in the email below.  Alireza has updated reference list in the XML file and added it in the repertoire table and at the end of the document.  Therefore, the attached proposal document and XML now completely addresses the public comments received.  

 

The proposal document and the XML file are being shared with the final call for any comment by TF-AIDN members.  Please carefully review the attached files and let us know if you have any feedback or would like any further changes.  Please respond by 5 Nov. 2015.  After that time we will finalize the proposal and formally hand it over to the Integration Panel for final evaluation.

 

I will update the test data files and share with all of you soon as well.

 

Thanks all for the great work.  Looking forward to any final suggestions.

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Tuesday, October 27, 2015 2:04 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: RE: Important: Agenda for our meeting this Monday

 

Dear All,

 

Here is a summary of our call yesterday. 

 

1.       Meikal has reached out to experts and shared additional sources in support of the two code points 0697 and 0751, as per details below.  We will share the additional information with the IP to get their feedback on whether this addresses the concerns regarding additional evidence.

a.       Lorna Evans provided the (T)chadian standard (attached) for 0697; calendar for 2012 is also shared with this letter (supporting recent use) – given in the Mbe calendar image attached (in the first line of text)

b.      Ngom provided reference to additional discussion in the support of current use of 0751 in Wolof (see pg. 357 second paragraph of the attached text “From Dust to Digital”)

c.       0751 also confirmed by Mbaye who shares the reference picture using this code point at http://eap.bl.uk/database/large_image.a4d?digrec=771119;catid=132903;r=6334 (e.g. see the first word: جلب in the first line of the second column in the two column text portion, the ب has three dots above it)

2.       I will develop a list of test labels to check each of the WLE rules (and variants, as needed).  Some of these labels will be nonsense strings as real world examples are not available for cases where one has to switch the keyboards.  Others can also contribute to this list.

3.       In response to Patrik’s comment regarding the IAB statement, we will add the following text at the end of the Repertoire section of the proposal (please feel free to edit and suggest changes):

a.       The current work is limited to the analysis of the repertoire short-listed in the MSR as per the procedure.  TF-AIDN realizes that the current MSR is limited to version 6.3 and will keep following later versions of MSR updated to include later versions of the Unicode standard.  In these cases, if MSR will include additional Arabic script code points, TF-AIDN will analyze them and make additional proposal(s) for adding the relevant subset to the current repertoire being proposed, also considering any security and stability implications

b.      TF-AIDN is also aware of the recent IAB statement. As TF-AIDN has not included any combining marks in its proposal and only the code points which represent the composed forms, for the Root Zone LGR, the task force considers that the issue does not remain relevant in context of this proposal

4.       We will update the proposal to address the editorial changes and share with TF-AIDN to review

5.       The updated proposal will be shared with TF-AIDN and last call will be made for one week for members to review.  Based on the review, the proposal will be finalized and submitted to the Integration Panel for their final evaluation.  

 

Please let us know if you have any queries or would like to suggest any changes on the proposed next steps.  

 

Regards,
Sarmad

 

 

From: Sarmad Hussain 
Sent: Thursday, October 22, 2015 3:09 PM
To: 'tf-aidn at meswg.org' <tf-aidn at meswg.org <mailto:tf-aidn at meswg.org> >
Subject: Important: Agenda for our meeting this Monday

 

Dear All,

 

Suggested agenda for our meeting on this Monday:  We will review public comments on our LGR process and aim to address them:

 

1.       Two characters still do not sufficient evidence 

a.       0697 ڗ ARABIC LETTER REH WITH TWO DOTS ABOVE

b.      751 ݑ ARABIC LETTER BEH WITH DOT BELOW AND THREE DOTS ABOVE

2.       Additional test cases for WLE rules and variant sets

3.       Response to Patrik’s comments on IAB statement

4.       Editorial updates to the proposal document

5.       Any other items 

 

 

Regards,
Sarmad

 

 


-----------------------------------------------------------------------------------
Disclaimer:
This message and its attachment, if any, are confidential and may contain legally
privileged information. If you are not the intended recipient, please contact the
sender immediately and delete this message and its attachment, if any, from your
system. You should not copy this message or disclose its contents to any other
person or use it for any purpose. Statements and opinions expressed in this e-mail
are those of the sender, and do not necessarily reflect those of the Communications
and Information Technology Commission (CITC). CITC accepts no liability for damage
caused by this email.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.meswg.org/pipermail/tf-aidn/attachments/20151114/ffe91ee0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test results.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 23006 bytes
Desc: not available
URL: <http://lists.meswg.org/pipermail/tf-aidn/attachments/20151114/ffe91ee0/attachment-0001.docx>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5118 bytes
Desc: not available
URL: <http://lists.meswg.org/pipermail/tf-aidn/attachments/20151114/ffe91ee0/attachment-0001.p7s>


More information about the TF-AIDN mailing list