added "todo" dictionaries

2026-06-08 09:36:31 -06:00 · 2014-04-06 19:19:51 +00:00
parent 82a333c602
commit 37c3afa9b4
17 changed files with 77456 additions and 17 deletions
--- a/lib/src/translator/DictionaryBul.gf
+++ b/lib/src/translator/DictionaryBul.gf
@@ -1,5 +1,3 @@
--# -path=.:../bulgarian:../abstract:../common
-
 concrete DictionaryBul of Dictionary = CatBul ** open MorphoBul, ResBul, (S = StructuralBul), ParadigmsBul, Prelude in {

 flags
--- a/lib/src/translator/DictionaryChi.gf
+++ b/lib/src/translator/DictionaryChi.gf
@@ -1,5 +1,3 @@
--# -path=.:alltenses
-
 concrete DictionaryChi of Dictionary = CatChi ** open ParadigmsChi,
  (S = StructuralChi),
  (L = LexiconChi),
--- a/lib/src/translator/DictionaryFre.gf
+++ b/lib/src/translator/DictionaryFre.gf
@@ -1,6 +1,3 @@
--# -coding=utf8
--# -path=.:../english
-
 concrete DictionaryFre of Dictionary = CatFre ** open ParadigmsFre,
  (S = StructuralFre),
  (L = LexiconFre),
--- a/lib/src/translator/DictionaryHin.gf
+++ b/lib/src/translator/DictionaryHin.gf
@@ -1,7 +1,7 @@
 -- Hindi lexicon for GF, produced from:
 -- Pushpak Bhattacharyya's Hindi WordNet
 -- GF version generated by hdict2gf, Shafqat Virk March 2012
--# -path=.:../english:../abstract:../common:../hindustani
+
 concrete DictionaryHin of Dictionary = CatHin ** open ParadigmsHin, Prelude, (S = StructuralHin), NounHin,ParamX,CommonHindustani in {
 flags
 coding=utf8 ;
--- a/lib/src/translator/bncswe.txt
+++ b/lib/src/translator/bncswe.txt
@@ -1,12 +1,3 @@
-lin listen_V = mkV "lyssnar" ; -- comment=2
-lin show_N = mkN "show" "shower" ;
-lin responsibility_N = mkN "ansvar" neutrum | mkN "tillräknelighet" ; -- SaldoWN -- comment=7
-lin significant_A = mkA "signifikant" "signifikant" ;
-lin deal_N = mkN "affär" "affärer" ;
-lin prime_A = mkA "primär" ; -- comment=8
-lin economy_N = mkN "sparsamhet" | mkN "besparing" ; -- SaldoWN -- comment=6
-lin economy_2_N = mkN "sparsamhet" | mkN "besparing" ;
-lin economy_1_N = mkN "ekonomi" "ekonomin" "ekonomier" "ekonomierna" ;
 lin element_N = mkN "grundämne" | mkN "element" neutrum ; -- SaldoWN -- comment=8
 lin finish_VA = mkVA (mkV "fullfölja") ; -- status=guess, src=wikt status=guess, src=wikt status=guess, src=wikt
 lin finish_V2 = dirV2 (partV (mkV "putsar")"av"); -- comment=3
--- a/lib/src/translator/todo/TopDict.gf
+++ b/lib/src/translator/todo/TopDict.gf
--- a/lib/src/translator/todo/TopDictBul.gf
+++ b/lib/src/translator/todo/TopDictBul.gf
--- a/lib/src/translator/todo/TopDictChi.gf
+++ b/lib/src/translator/todo/TopDictChi.gf
--- a/lib/src/translator/todo/TopDictFin.gf
+++ b/lib/src/translator/todo/TopDictFin.gf
--- a/lib/src/translator/todo/TopDictFre.gf
+++ b/lib/src/translator/todo/TopDictFre.gf
--- a/lib/src/translator/todo/TopDictGer.gf
+++ b/lib/src/translator/todo/TopDictGer.gf
--- a/lib/src/translator/todo/TopDictHin.gf
+++ b/lib/src/translator/todo/TopDictHin.gf
--- a/lib/src/translator/todo/TopDictIta.gf
+++ b/lib/src/translator/todo/TopDictIta.gf
--- a/lib/src/translator/todo/TopDictSpa.gf
+++ b/lib/src/translator/todo/TopDictSpa.gf
--- a/lib/src/translator/todo/TopDictSwe.gf
+++ b/lib/src/translator/todo/TopDictSwe.gf
--- a/lib/src/translator/todo/check-dictionary.html
+++ b/lib/src/translator/todo/check-dictionary.html
@@ -0,0 +1,136 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<HTML>
+<HEAD>
+<META NAME="generator" CONTENT="http://txt2tags.org">
+<TITLE>Checking GF Translation Dictionaries</TITLE>
+</HEAD><BODY BGCOLOR="white" TEXT="black">
+<CENTER>
+<H1>Checking GF Translation Dictionaries</H1>
+<FONT SIZE="4"><I>Aarne Ranta</I></FONT><BR>
+<FONT SIZE="4">April 2014</FONT>
+</CENTER>
+
+
+<H2>Call for contributions: the generic translation dictionaries of GF</H2>
+
+<P>
+<B>Wanted</B>: manual checking of TopDict???.gf files in
+<A HREF="http://www.grammaticalframework.org/lib/src/translator/todo">this directory</A>.
+</P>
+<P>
+<B>Abstract syntax</B>: <A HREF="./TopDict.gf">TopDict</A>, the top-7000 English words from British National Corpus, as sorted by frequency
+<A HREF="http://www.kilgarriff.co.uk/BNClists/lemma.num">here</A>. 
+</P>
+<P>
+<B>Usage</B>: part of the general translation dictionaries, used for instance in the 
+<A HREF="http://www.grammaticalframework.org/demos/translation.html">GF translation demo</A>. The full dictionaties are the Dictionary* modules
+in the <A HREF="../">parent directory</A>.
+</P>
+<P>
+<B>Who</B>: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
+</P>
+
+<H2>How to do it</H2>
+
+<P>
+Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
+</P>
+
+<OL>
+<LI>Make sure to download the latest version of the file.
+<LI>Make sure you can compile the original file:
+
+<PRE>
+    gf ToCheckFre.gf +RTS -K64M
+</PRE>
+
+<LI>Edit the <CODE>lin</CODE> rules line by line, starting from the beginning. Follow the guidelines in the next section.
+<LI>Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
+<LI>Make sure the resulting file compiles again.
+<LI>Perform <CODE>diff</CODE> with the old and the new file, just to make sure your changes look reasonable.
+<LI>Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case, 
+  it is enough to send those lin rules that you have processed.
+<LI>Inform the gf-dev list that you have done this.
+</OL>
+
+<P>
+A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
+don't spend more than one day on a batch of work.
+</P>
+
+<H2>Guidelines</H2>
+
+<P>
+When editing a lin rule, do one of the following:
+</P>
+
+<UL>
+<LI><B>accept the rule as it is</B>: replace the tail comment after the rule's terminating semicolon, if there is one, by your initials
+  in a systematic way. For example:
+
+<PRE>
+    lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
+</PRE>
+
+   becomes
+
+<PRE>
+    lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- AR
+</PRE>
+
+  <UL>
+  <LI>change the linearization, and if the result is OK for you, just leaving your initials as comment. For example,
+
+<PRE>
+    lin obviously_Adv = variants{} ;
+</PRE>
+
+  becomes
+
+<PRE>
+    lin obviously_Adv = mkAdv "évidemment" ; -- AR
+</PRE>
+
+  </UL>
+<LI><B>suggest split of sense</B>: add a comment prefixed by "--- split" and more senses, explaining them. For example,
+
+<PRE>
+    lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
+</PRE>
+
+  might become
+
+<PRE>
+    lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour -- AR
+</PRE>
+
+<LI><B>report an anomaly</B>: change or leave the rule, and add a comment prefixed by "---- ". For example,
+
+<PRE>
+    lin back_Adv = variants{} ;
+</PRE>
+
+  might become
+
+<PRE>
+    lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre -- AR
+</PRE>
+
+</UL>
+
+<P>
+As general guidelines,
+</P>
+
+<UL>
+<LI><B>Don't just do nothing</B>, but do one of the things above, until the point where you quit checking.
+<LI><B>Maintain the format</B> of one rule per line, prefixed by <CODE>lin</CODE>.
+<LI><B>Put the most standard translation as the first variant</B>, deprecated and slang words later.
+<LI><B>Make sure that the morphology comes out correct</B>.
+<LI><B>Make sure to return correct verb subcategorization</B>.
+<LI><B>Don't split senses</B> if the difference is very small, e.g. one in stylistic rather than semantic value.
+</UL>
+
+<!-- html code generated by txt2tags 2.6 (http://txt2tags.org) -->
+<!-- cmdline: txt2tags -thtml check-dictionary.t2t -->
+</BODY></HTML>
--- a/lib/src/translator/todo/check-dictionary.t2t
+++ b/lib/src/translator/todo/check-dictionary.t2t
@@ -0,0 +1,108 @@
+Checking GF Translation Dictionaries
+Aarne Ranta
+April 2014
+
+==Call for contributions: the generic translation dictionaries of GF==
+
+**Wanted**: manual checking of TopDict???.gf files in
+[this directory http://www.grammaticalframework.org/lib/src/translator/todo].
+
+**Abstract syntax**: [TopDict ./TopDict.gf], the top-7000 English words from British National Corpus, as sorted by frequency
+[here http://www.kilgarriff.co.uk/BNClists/lemma.num]. 
+
+**Usage**: part of the general translation dictionaries, used for instance in the 
+[GF translation demo http://www.grammaticalframework.org/demos/translation.html]. The full dictionaties are the Dictionary* modules
+in the [parent directory ../].
+
+**Who**: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
+
+
+==How to do it==
+
+Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
+ Make sure to download the latest version of the file.
+ Make sure you can compile the original file:
+```
+  gf ToCheckFre.gf +RTS -K64M
+```
+ Edit the ``lin`` rules line by line, starting from the beginning. Follow the guidelines in the next section.
+ Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
+ Make sure the resulting file compiles again.
+ Perform ``diff`` with the old and the new file, just to make sure your changes look reasonable.
+ Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case, 
+  it is enough to send those lin rules that you have processed.
+ Inform the gf-dev list that you have done this.
+
+
+A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
+don't spend more than one day on a batch of work.
+
+
+
+
+
+
+
+
+
+==Guidelines==
+
+When editing a lin rule, do one of the following:
+- **accept the rule as it is**: replace the tail comment after the rule's terminating semicolon, if there is one, by your initials
+  in a systematic way. For example:
+```
+  lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
+```    
+   becomes
+```
+  lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- AR
+```    
+  - change the linearization, and if the result is OK for you, just leaving your initials as comment. For example,
+```
+  lin obviously_Adv = variants{} ;
+```    
+  becomes
+```
+  lin obviously_Adv = mkAdv "évidemment" ; -- AR
+```    
+- **suggest split of sense**: add a comment prefixed by "--- split" and more senses, explaining them. For example,
+```
+  lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
+```
+  might become
+```
+  lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour -- AR
+```
+  To check the meanings of senses that have already been split (by using numbers, e.g. ``time_1_N``), look up the explanations in
+  [Dictionary.gf ../Dictionary.gf].
+- **report an anomaly**: change or leave the rule, and add a comment prefixed by "---- ". For example,
+```
+  lin back_Adv = variants{} ;
+``` 
+  might become
+```
+  lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre -- AR
+``` 
+
+
+As general guidelines,
+- **Don't just do nothing**, but do one of the things above, until the point where you quit checking.
+- **Maintain the format** of one rule per line, prefixed by ``lin``.
+- **Put the most standard translation as the first variant**, deprecated and slang words later.
+- **Make sure that the morphology comes out correct**.
+- **Make sure to return correct verb subcategorization**.
+- **Don't split senses** if the difference is very small, e.g. one in stylistic rather than semantic value.
+
+
+
+
+
+
+
+
+
+
+
+
+
+