added "todo" dictionaries

This commit is contained in:
aarne
2014-04-06 19:19:51 +00:00
parent 82a333c602
commit 37c3afa9b4
17 changed files with 77456 additions and 17 deletions

View File

@@ -1,5 +1,3 @@
--# -path=.:../bulgarian:../abstract:../common
concrete DictionaryBul of Dictionary = CatBul ** open MorphoBul, ResBul, (S = StructuralBul), ParadigmsBul, Prelude in {
flags

View File

@@ -1,5 +1,3 @@
--# -path=.:alltenses
concrete DictionaryChi of Dictionary = CatChi ** open ParadigmsChi,
(S = StructuralChi),
(L = LexiconChi),

View File

@@ -1,6 +1,3 @@
--# -coding=utf8
--# -path=.:../english
concrete DictionaryFre of Dictionary = CatFre ** open ParadigmsFre,
(S = StructuralFre),
(L = LexiconFre),

View File

@@ -1,7 +1,7 @@
-- Hindi lexicon for GF, produced from:
-- Pushpak Bhattacharyya's Hindi WordNet
-- GF version generated by hdict2gf, Shafqat Virk March 2012
--# -path=.:../english:../abstract:../common:../hindustani
concrete DictionaryHin of Dictionary = CatHin ** open ParadigmsHin, Prelude, (S = StructuralHin), NounHin,ParamX,CommonHindustani in {
flags
coding=utf8 ;

View File

@@ -1,12 +1,3 @@
lin listen_V = mkV "lyssnar" ; -- comment=2
lin show_N = mkN "show" "shower" ;
lin responsibility_N = mkN "ansvar" neutrum | mkN "tillräknelighet" ; -- SaldoWN -- comment=7
lin significant_A = mkA "signifikant" "signifikant" ;
lin deal_N = mkN "affär" "affärer" ;
lin prime_A = mkA "primär" ; -- comment=8
lin economy_N = mkN "sparsamhet" | mkN "besparing" ; -- SaldoWN -- comment=6
lin economy_2_N = mkN "sparsamhet" | mkN "besparing" ;
lin economy_1_N = mkN "ekonomi" "ekonomin" "ekonomier" "ekonomierna" ;
lin element_N = mkN "grundämne" | mkN "element" neutrum ; -- SaldoWN -- comment=8
lin finish_VA = mkVA (mkV "fullfölja") ; -- status=guess, src=wikt status=guess, src=wikt status=guess, src=wikt
lin finish_V2 = dirV2 (partV (mkV "putsar")"av"); -- comment=3

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,136 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.org">
<TITLE>Checking GF Translation Dictionaries</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<CENTER>
<H1>Checking GF Translation Dictionaries</H1>
<FONT SIZE="4"><I>Aarne Ranta</I></FONT><BR>
<FONT SIZE="4">April 2014</FONT>
</CENTER>
<H2>Call for contributions: the generic translation dictionaries of GF</H2>
<P>
<B>Wanted</B>: manual checking of TopDict???.gf files in
<A HREF="http://www.grammaticalframework.org/lib/src/translator/todo">this directory</A>.
</P>
<P>
<B>Abstract syntax</B>: <A HREF="./TopDict.gf">TopDict</A>, the top-7000 English words from British National Corpus, as sorted by frequency
<A HREF="http://www.kilgarriff.co.uk/BNClists/lemma.num">here</A>.
</P>
<P>
<B>Usage</B>: part of the general translation dictionaries, used for instance in the
<A HREF="http://www.grammaticalframework.org/demos/translation.html">GF translation demo</A>. The full dictionaties are the Dictionary* modules
in the <A HREF="../">parent directory</A>.
</P>
<P>
<B>Who</B>: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
</P>
<H2>How to do it</H2>
<P>
Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
</P>
<OL>
<LI>Make sure to download the latest version of the file.
<LI>Make sure you can compile the original file:
<PRE>
gf ToCheckFre.gf +RTS -K64M
</PRE>
<LI>Edit the <CODE>lin</CODE> rules line by line, starting from the beginning. Follow the guidelines in the next section.
<LI>Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
<LI>Make sure the resulting file compiles again.
<LI>Perform <CODE>diff</CODE> with the old and the new file, just to make sure your changes look reasonable.
<LI>Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case,
it is enough to send those lin rules that you have processed.
<LI>Inform the gf-dev list that you have done this.
</OL>
<P>
A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
don't spend more than one day on a batch of work.
</P>
<H2>Guidelines</H2>
<P>
When editing a lin rule, do one of the following:
</P>
<UL>
<LI><B>accept the rule as it is</B>: replace the tail comment after the rule's terminating semicolon, if there is one, by your initials
in a systematic way. For example:
<PRE>
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
</PRE>
becomes
<PRE>
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- AR
</PRE>
<UL>
<LI>change the linearization, and if the result is OK for you, just leaving your initials as comment. For example,
<PRE>
lin obviously_Adv = variants{} ;
</PRE>
becomes
<PRE>
lin obviously_Adv = mkAdv "évidemment" ; -- AR
</PRE>
</UL>
<LI><B>suggest split of sense</B>: add a comment prefixed by "--- split" and more senses, explaining them. For example,
<PRE>
lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
</PRE>
might become
<PRE>
lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour -- AR
</PRE>
<LI><B>report an anomaly</B>: change or leave the rule, and add a comment prefixed by "---- ". For example,
<PRE>
lin back_Adv = variants{} ;
</PRE>
might become
<PRE>
lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre -- AR
</PRE>
</UL>
<P>
As general guidelines,
</P>
<UL>
<LI><B>Don't just do nothing</B>, but do one of the things above, until the point where you quit checking.
<LI><B>Maintain the format</B> of one rule per line, prefixed by <CODE>lin</CODE>.
<LI><B>Put the most standard translation as the first variant</B>, deprecated and slang words later.
<LI><B>Make sure that the morphology comes out correct</B>.
<LI><B>Make sure to return correct verb subcategorization</B>.
<LI><B>Don't split senses</B> if the difference is very small, e.g. one in stylistic rather than semantic value.
</UL>
<!-- html code generated by txt2tags 2.6 (http://txt2tags.org) -->
<!-- cmdline: txt2tags -thtml check-dictionary.t2t -->
</BODY></HTML>

View File

@@ -0,0 +1,108 @@
Checking GF Translation Dictionaries
Aarne Ranta
April 2014
==Call for contributions: the generic translation dictionaries of GF==
**Wanted**: manual checking of TopDict???.gf files in
[this directory http://www.grammaticalframework.org/lib/src/translator/todo].
**Abstract syntax**: [TopDict ./TopDict.gf], the top-7000 English words from British National Corpus, as sorted by frequency
[here http://www.kilgarriff.co.uk/BNClists/lemma.num].
**Usage**: part of the general translation dictionaries, used for instance in the
[GF translation demo http://www.grammaticalframework.org/demos/translation.html]. The full dictionaties are the Dictionary* modules
in the [parent directory ../].
**Who**: anyone with good knowledge of the target language and with reasonable knowledge of the GF resource grammar paradigms for it.
==How to do it==
Follow these steps for your language. For instance, ToCheckFre.gf, with Fre substituted for any language in this directory.
+ Make sure to download the latest version of the file.
+ Make sure you can compile the original file:
```
gf ToCheckFre.gf +RTS -K64M
```
+ Edit the ``lin`` rules line by line, starting from the beginning. Follow the guidelines in the next section.
+ Mark the last rule you edit with "---- END edits by AR", where AR is your initials.
+ Make sure the resulting file compiles again.
+ Perform ``diff`` with the old and the new file, just to make sure your changes look reasonable.
+ Commit your edits into darcs, if you have access to it, or to GF Contributions, or by email to Aarne Ranta. In the last case,
it is enough to send those lin rules that you have processed.
+ Inform the gf-dev list that you have done this.
A reasonable batch of revisions is 500 words or more, which should be doable in less than 2 hours. To avoid conflicts and overlapping work,
don't spend more than one day on a batch of work.
==Guidelines==
When editing a lin rule, do one of the following:
- **accept the rule as it is**: replace the tail comment after the rule's terminating semicolon, if there is one, by your initials
in a systematic way. For example:
```
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- tocheck
```
becomes
```
lin maintain_V2 = mkV2 (mkV I.entretenir_V2) | mkV2 (mkV I.maintenir_V2) ; -- AR
```
- change the linearization, and if the result is OK for you, just leaving your initials as comment. For example,
```
lin obviously_Adv = variants{} ;
```
becomes
```
lin obviously_Adv = mkAdv "évidemment" ; -- AR
```
- **suggest split of sense**: add a comment prefixed by "--- split" and more senses, explaining them. For example,
```
lin labour_N = mkN "accouchement" masculine | mkN "ouvrage" masculine ; -- tocheck
```
might become
```
lin labour_N = mkN "travail" "travaux" masculine ; --- mkN "accouchement" childbirth labour -- AR
```
To check the meanings of senses that have already been split (by using numbers, e.g. ``time_1_N``), look up the explanations in
[Dictionary.gf ../Dictionary.gf].
- **report an anomaly**: change or leave the rule, and add a comment prefixed by "---- ". For example,
```
lin back_Adv = variants{} ;
```
might become
```
lin back_Adv = mkAdv "en retour" ; ---- no exact translation in Fre -- AR
```
As general guidelines,
- **Don't just do nothing**, but do one of the things above, until the point where you quit checking.
- **Maintain the format** of one rule per line, prefixed by ``lin``.
- **Put the most standard translation as the first variant**, deprecated and slang words later.
- **Make sure that the morphology comes out correct**.
- **Make sure to return correct verb subcategorization**.
- **Don't split senses** if the difference is very small, e.g. one in stylistic rather than semantic value.