Commit Graph

232 Commits

Author SHA1 Message Date
kr.angelov
90c3304147 remove the pgf2yaml tool which was both broken and redundant. The declarations for generic programming from data.c are removed as well 2013-02-11 13:51:12 +00:00
kr.angelov
10ef298fa0 the grammar reader in the C runtime is completely rewritten and it doesn't use the generic programming API 2013-02-11 10:16:58 +00:00
kr.angelov
5e2474e346 This patch removes Gregoire's parse_tokens function in the python binding and adds another implementation which builds on the existing API for lexers in the C runtime. Now it is possible to write incremental Lexers in Python 2013-02-01 09:29:43 +00:00
gregoire.detrez
0aae4702ed Python binding: add a parsing function that accepts directly a list of tokens.
Is allows to define a tokenizer in python (or use an existing one, from nltk for instance.)
2013-01-24 13:31:34 +00:00
kr.angelov
2c169406fc a new reasoner in the C runtime. It supports tabling which makes it decideable for propositional logic. dependent types and high-order types are not supported yet. The generation is still in decreasing probability order 2013-01-07 12:50:32 +00:00
kr.angelov
75696808a7 bugfix: the linearizer should not generate extra space at the end of the sentence 2012-12-19 11:18:34 +00:00
kr.angelov
87360ccc34 bugfix for linearization of metavariables at the root of a tree 2012-12-19 10:03:05 +00:00
kr.angelov
a28ccc965c rename linearize.{h/c} to linearizer.{h/c} which follows the convention used in parser.c and reasoner.c 2012-12-19 09:17:24 +00:00
kr.angelov
490a3f2286 a major reimplementation of the linearizer in the C runtime 2012-12-19 09:07:05 +00:00
kr.angelov
403420be2b the C runtime now can read abstract expressions with literals and meta variables 2012-12-18 12:29:30 +00:00
kr.angelov
d12c604f9a debugging infrastructure in the reasoner 2012-12-14 21:25:00 +00:00
kr.angelov
16a2c38f38 bugfix for the reasoner in the C runtime 2012-12-14 21:24:17 +00:00
kr.angelov
20aaa4a989 The first prototype for exhaustive generation in the C runtime. The trees are always listed in decreasing probability order. There is also an API for generation from Python 2012-12-14 15:32:49 +00:00
kr.angelov
f7a5eb0df1 bugfix in the lexer from the C runtime. the input sentence doesn't have to terminate with whitespace 2012-12-13 16:45:44 +00:00
kr.angelov
14e721dda9 a top-level API for parsing in the C runtime 2012-12-13 14:44:33 +00:00
kr.angelov
68249a11d2 bugfix: the outside probability of a PgfItemConts must always be initialized to zero 2012-12-13 11:11:45 +00:00
kr.angelov
2dc8236170 bugfix: pgf_read_expr no longer requires a semicolon at the end of an abstract expression 2012-12-13 11:09:26 +00:00
kr.angelov
aa13090b66 started an official API to the C runtime 2012-12-12 11:25:58 +00:00
kr.angelov
5779887f96 bugfix for robust parsing with multi-word units 2012-12-11 12:57:22 +00:00
kr.angelov
e174f37940 added experimental script for chunking in the C runtime 2012-12-03 10:07:54 +00:00
kr.angelov
5e3b23325e remove the duplicated definition of PgfProductionIdx in parser.c 2012-11-19 14:16:31 +00:00
kr.angelov
954d7a7ff5 bugfix for the building of bottom-up filter in the C runtime 2012-11-16 13:27:15 +00:00
kr.angelov
5c52eaf0b7 revised heuristic in the statistical parser 2012-11-14 12:34:22 +00:00
kr.angelov
468464faca bugfix in the statistical parser 2012-11-13 09:48:23 +00:00
kr.angelov
d1044b202a two simple heuristics which speed up the statistical parser more than seven times. 2012-11-12 22:17:40 +00:00
kr.angelov
182e366f5d a simple refactoring in the statistical parser 2012-11-12 21:48:22 +00:00
kr.angelov
7ad4436502 more counters in the profiler for the statistical parser 2012-11-12 15:36:21 +00:00
kr.angelov
9b2487243e now we store the state instead of the offset for every continuation in the chart for the statistical parser 2012-11-12 14:04:52 +00:00
kr.angelov
c28056c4e5 in the statistical parser: move the outside probability from the parse items to their continuation. this makes the value slot shared between many items 2012-11-12 13:43:43 +00:00
kr.angelov
56f3ff8202 small refactoring in the C runtime 2012-11-12 13:05:35 +00:00
kr.angelov
cce22a7f7a use size_t consistently as the type for constituent indices in the C runtime 2012-11-12 12:51:27 +00:00
kr.angelov
c679b08b38 use prob_t instead of float in a few places 2012-10-29 08:52:56 +00:00
kr.angelov
118333eee8 forgot to add one #ifdef 2012-10-25 18:37:22 +00:00
kr.angelov
d185938952 a major refactoring in the robust parser: bottom-up filtering and garbage collection for the chart 2012-10-25 14:42:53 +00:00
kr.angelov
bf49f3c246 now the meta probability for a category is explicitly specified in the statistical model instead of computed internally. this avoids rounding errors while computing the sum of a large number of small values. 2012-09-24 09:37:21 +00:00
kr.angelov
8b28b89ffc in the robust parser we don't have to care about trees which yeld empty strings. this makes the parser a lot faster 2012-09-24 09:30:20 +00:00
kr.angelov
a307ed6c75 the C runtime now has a type prob_t which is used only for probability values 2012-09-18 09:18:48 +00:00
kr.angelov
86b5ec7447 bugfix in the C parser 2012-09-06 14:52:19 +00:00
kr.angelov
3ad5493758 Use a separated tag for meta productions in the robust parser. This cleans up the code a lot 2012-06-13 05:49:30 +00:00
kr.angelov
c9c5675e1d now there is a limit of 2000000 items in the chart of the robust parser. This prevents from explosion in the memory size but it will also prevent us from parsing some sentences. 2012-06-12 11:30:01 +00:00
kr.angelov
b27a440ef3 now the robust parser is purely top-down and the meta rules compete on a fair basis with the grammar rules 2012-06-12 09:29:51 +00:00
kr.angelov
06f9965d27 the viterbi probability for the epsilon categories is now updated properly 2012-05-25 07:30:35 +00:00
kr.angelov
f4c17cb7aa another attempt to port the robust parser to MacOS 2012-05-16 15:18:44 +00:00
kr.angelov
a6800fc0da a new unbiased statistical parser. it is still far from perfect use it on your own risk. 2012-05-08 12:13:28 +00:00
kr.angelov
931066f6fc yet another fix for parsing literals 2012-04-18 15:50:55 +00:00
kr.angelov
17bc8e5c89 some fixes in the robust parser and a new API for literals 2012-04-12 06:55:25 +00:00
kr.angelov
6644d93ec2 simple cleanup in the robust parser 2012-04-02 19:01:18 +00:00
kr.angelov
230f309317 libpgf: a new implementation for literals which also allows custom literals. the same mechanism is now used for the metavariables 2012-03-12 14:25:51 +00:00
kr.angelov
1726995921 libpgf: added simple lexer 2012-03-09 09:14:44 +00:00
kr.angelov
ed5de8335b libpgf: implementation for built in literal categories 2012-03-07 16:39:29 +00:00