added "todo" dictionaries

This commit is contained in:
aarne
2014-04-06 19:19:51 +00:00
parent 82a333c602
commit 37c3afa9b4
17 changed files with 77456 additions and 17 deletions

View File

@@ -0,0 +1,136 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.org">
<TITLE>Checking GF Translation Dictionaries</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<CENTER>
<H1>Checking GF Translation Dictionaries</H1>
<FONT SIZE="4"><I>Aarne Ranta</I></FONT><BR>
<FONT SIZE="4">April 2014</FONT>
</CENTER>
<H2>Call for contributions: the generic translation dictionaries of GF</H2>
<P>
<B>Wanted</B>: manual checking of TopDict???.gf files in
<A HREF="http://www.grammaticalframework.org/lib/src/translator/todo">this directory</A>.
</P>
<P>
<B>Abstract syntax</B>: <A HREF="./TopDict.gf">TopDict</A>, the top-7000 English words from British National Corpus, as sorted by frequency
<A HREF="http://www.kilgarriff.co.uk/BNClists/lemma.num">here</A>.
</P>
<P>
<B>Usage</B>: part of the general translation dictionaries, used for instance in the
<A HREF="http://www.grammaticalframework.org/demos/translation.html">GF translation demo</A>. The full dictionaties are the Dictionary* modules
in the <A HREF="../">parent directory</A>.
</P>
<P>
<B>Who</B>: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
</P>
<H2>How to do it</H2>
<P>
Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
</P>
<OL>
<LI>Make sure to download the latest version of the file.
<LI>Make sure you can compile the original file:
<PRE>
gf ToCheckFre.gf +RTS -K64M
</PRE>
<LI>Edit the <CODE>lin</CODE> rules line by line, starting from the beginning. Follow the guidelines in the next section.
<LI>Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
<LI>Make sure the resulting file compiles again.
<LI>Perform <CODE>diff</CODE> with the old and the new file, just to make sure your changes look reasonable.
<LI>Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case,
it is enough to send those lin rules that you have processed.
<LI>Inform the gf-dev list that you have done this.
</OL>
<P>
A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
don't spend more than one day on a batch of work.
</P>
<H2>Guidelines</H2>
<P>
When editing a lin rule, do one of the following:
</P>
<UL>
<LI><B>accept the rule as it is</B>: replace the tail comment after the rule's terminating semicolon, if there is one, by your initials
in a systematic way. For example:
<PRE>
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
</PRE>
becomes
<PRE>
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- AR
</PRE>
<UL>
<LI>change the linearization, and if the result is OK for you, just leaving your initials as comment. For example,
<PRE>
lin obviously_Adv = variants{} ;
</PRE>
becomes
<PRE>
lin obviously_Adv = mkAdv "évidemment" ; -- AR
</PRE>
</UL>
<LI><B>suggest split of sense</B>: add a comment prefixed by "--- split" and more senses, explaining them. For example,
<PRE>
lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
</PRE>
might become
<PRE>
lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour -- AR
</PRE>
<LI><B>report an anomaly</B>: change or leave the rule, and add a comment prefixed by "---- ". For example,
<PRE>
lin back_Adv = variants{} ;
</PRE>
might become
<PRE>
lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre -- AR
</PRE>
</UL>
<P>
As general guidelines,
</P>
<UL>
<LI><B>Don't just do nothing</B>, but do one of the things above, until the point where you quit checking.
<LI><B>Maintain the format</B> of one rule per line, prefixed by <CODE>lin</CODE>.
<LI><B>Put the most standard translation as the first variant</B>, deprecated and slang words later.
<LI><B>Make sure that the morphology comes out correct</B>.
<LI><B>Make sure to return correct verb subcategorization</B>.
<LI><B>Don't split senses</B> if the difference is very small, e.g. one in stylistic rather than semantic value.
</UL>
<!-- html code generated by txt2tags 2.6 (http://txt2tags.org) -->
<!-- cmdline: txt2tags -thtml check-dictionary.t2t -->
</BODY></HTML>