mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-29 22:42:52 -06:00
173 lines
5.6 KiB
HTML
173 lines
5.6 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.org">
|
|
<TITLE>Checking GF Translation Dictionaries</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<CENTER>
|
|
<H1>Checking GF Translation Dictionaries</H1>
|
|
<FONT SIZE="4"><I>Aarne Ranta</I></FONT><BR>
|
|
<FONT SIZE="4">April 2014</FONT>
|
|
</CENTER>
|
|
|
|
<P>
|
|
<B>News</B>
|
|
</P>
|
|
<P>
|
|
9/5 Link to the current status: <A HREF="https://docs.google.com/spreadsheets/d/1NuLRp86UPjd298LxjhCAGlHsoPypxKpcBJfDab0De90/edit#gid">https://docs.google.com/spreadsheets/d/1NuLRp86UPjd298LxjhCAGlHsoPypxKpcBJfDab0De90/edit#gid</A>=0
|
|
</P>
|
|
<P>
|
|
9/5/2014 Removed many bogus subcat's revealed by dictionary authors and by FrameNet. Please upgrade your TopDictionary from darcs or github!
|
|
</P>
|
|
|
|
<H2>Call for contributions: the generic translation dictionaries of GF</H2>
|
|
|
|
<P>
|
|
<B>Wanted</B>: manual checking of TopDictionary???.gf files in
|
|
<A HREF="http://www.grammaticalframework.org/lib/src/translator/todo">this directory</A>.
|
|
</P>
|
|
<P>
|
|
<B>Abstract syntax</B>: <A HREF="./TopDictionary.gf">TopDictionary</A>, the top-7000 English words from British National Corpus, as sorted by frequency
|
|
<A HREF="http://www.kilgarriff.co.uk/BNClists/lemma.num">here</A>.
|
|
</P>
|
|
<P>
|
|
<B>Usage</B>: part of the general translation dictionaries, used for instance in the
|
|
<A HREF="http://www.grammaticalframework.org/demos/translation.html">GF translation demo</A>. The full dictionaties are the Dictionary* modules
|
|
in the <A HREF="../">parent directory</A>.
|
|
</P>
|
|
<P>
|
|
<B>Who</B>: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
|
|
</P>
|
|
|
|
<H2>How to do it</H2>
|
|
|
|
<P>
|
|
Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
|
|
</P>
|
|
|
|
<OL>
|
|
<LI>Make sure to download the latest version of the file.
|
|
<LI>Make sure you can compile the original file:
|
|
|
|
<PRE>
|
|
gf ToCheckFre.gf +RTS -K64M
|
|
</PRE>
|
|
|
|
<LI>Edit the <CODE>lin</CODE> rules line by line, starting from the beginning. Follow the guidelines in the next section.
|
|
<LI>Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
|
|
<LI>Put, as the first line of the file, a comment indicating your last edited rule:
|
|
|
|
<PRE>
|
|
---- checked by AR till once_Adv in the BNC order
|
|
</PRE>
|
|
|
|
<LI>Make sure the resulting file compiles again.
|
|
<LI>Perform <CODE>diff</CODE> with the old and the new file, just to make sure your changes look reasonable.
|
|
<LI>Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case,
|
|
it is enough to send those lin rules that you have processed.
|
|
<LI>Inform the gf-dev list that you have done this.
|
|
</OL>
|
|
|
|
<P>
|
|
A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
|
|
don't spend more than one day on a batch of work.
|
|
</P>
|
|
<P>
|
|
The already split senses are explained <A HREF="../senses-in-Dictionary.txt">here</A>.
|
|
</P>
|
|
|
|
<H2>Guidelines</H2>
|
|
|
|
<P>
|
|
When editing a lin rule, do one of the following:
|
|
</P>
|
|
|
|
<UL>
|
|
<LI><B>accept the rule as it is</B>: remove the tail comment after the rule's terminating semicolon, if there is one. For example:
|
|
|
|
<PRE>
|
|
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
|
|
</PRE>
|
|
|
|
becomes
|
|
|
|
<PRE>
|
|
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ;
|
|
</PRE>
|
|
|
|
<UL>
|
|
<LI>change the linearization, and if the result is OK for you, also deleting the comment. For example,
|
|
|
|
<PRE>
|
|
lin obviously_Adv = variants{} ; --
|
|
</PRE>
|
|
|
|
becomes
|
|
|
|
<PRE>
|
|
lin obviously_Adv = mkAdv "évidemment" ;
|
|
</PRE>
|
|
|
|
</UL>
|
|
<LI><B>suggest split of sense</B>: add a comment prefixed by "--- split" and more senses, explaining them. For example,
|
|
|
|
<PRE>
|
|
lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
|
|
</PRE>
|
|
|
|
might become
|
|
|
|
<PRE>
|
|
lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour
|
|
</PRE>
|
|
|
|
To check the meanings of senses that have already been split (by using numbers, e.g. <CODE>time_1_N</CODE>), look up the explanations in
|
|
<A HREF="../Dictionary.gf">Dictionary.gf</A>.
|
|
<P></P>
|
|
<LI><B>report an anomaly</B>: change or leave the rule, and add a comment prefixed by "---- ". For example,
|
|
|
|
<PRE>
|
|
lin back_Adv = variants{} ;
|
|
</PRE>
|
|
|
|
might become
|
|
|
|
<PRE>
|
|
lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre
|
|
</PRE>
|
|
|
|
<P></P>
|
|
<LI><B>report on bad subcategory instance</B> (a common special case of anomaly):
|
|
add a comment prefixed by "---- subcat" to say that the current verb subcat instance
|
|
doesn't make sense to you. This may happen since the subcategories have partly been automatically extracted. It is still good to
|
|
put a suitable verb in place. For example,
|
|
|
|
<PRE>
|
|
lin come_VS = variants {} ;
|
|
</PRE>
|
|
|
|
might become
|
|
|
|
<PRE>
|
|
lin come_VS = mkVS I.venir_V ; ---- subcat
|
|
</PRE>
|
|
|
|
</UL>
|
|
|
|
<P>
|
|
As general guidelines,
|
|
</P>
|
|
|
|
<UL>
|
|
<LI><B>Don't just do nothing</B>, but do one of the things above, until the point where you quit checking.
|
|
<LI><B>Maintain the format</B> of one rule per line, prefixed by <CODE>lin</CODE>.
|
|
<LI><B>Put the most standard translation as the first variant</B>, deprecated and slang words later.
|
|
<LI><B>Make sure that the morphology comes out correct</B>.
|
|
<LI><B>Make sure to return correct verb subcategorization</B>.
|
|
<LI><B>Don't split senses</B> if the difference is very small, e.g. one in stylistic rather than semantic value.
|
|
</UL>
|
|
|
|
<!-- html code generated by txt2tags 2.6 (http://txt2tags.org) -->
|
|
<!-- cmdline: txt2tags -t html check-dictionary.t2t -->
|
|
</BODY></HTML>
|