|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectch.uzh.ifi.attempto.chartparser.Grammar
public class Grammar
This class represents a grammar that is needed to run the chart parser. A grammar can be created either directly in Java or on the basis of a file in the Codeco format.
-->
" by "=>
":
Note that terminal categories are in square brackets.vp => v, np. v => [does, not], verb.
Complex grammar rules in Codeco are different from common Prolog DCG rules in the sense that they are using features rather than arguments with fixed positions. Arguments are not recognized by their position but by their name:
Every feature has the formvp(num:Num,neg:Neg) => v(num:Num,neg:Neg,type:tr), np(case:acc). v(neg:plus,type:Type) => [does, not], verb(type:Type).
Name:Value
where Name
has to be an atom and Value
can be a variable or an atom (but not a compound term). Also terminal categories (in square brackets) can have features
and they can be used as pre-terminals that are instantiated with the concrete word forms outside of the parser:
np => [noun(text:Noun)].
Codeco provides special support for anaphoric references. Anaphoric references are used in (controlled) natural languages to refer to objects earlier in the sentence. For example, in the sentence
A country contains an area that is not controlled by the country.the anaphoric reference "the country" refers to the antecedent "a country". In Codeco, anaphoric references are defined by the special categories "
>>>
" and "<<<
". ">>>
" marks a
position in the text to which anaphoric references can refer to (such positions are called "antecedents").
"<<<
" refers back to the closed possible antecedent. An example is shown here:
Furthermore, the special category "np => [a, noun(text:Noun)], >>>(type:noun, noun:Noun). ref => [the, noun(text:Noun)], <<<(type:noun, noun:Noun).
</<
" can be used to ensure that there is no matching antecedent.
This can be used for example for variables to ensure that no variable is defined twice:
In order to be predicted correctly by the chartparser, the back-referring categories "np => [var(text:Var)], </<(type:var, var:Var), >>>(type:var, var:Var). ref => [var(text:Var)], <<<(type:var, var:Var).
<<<
" and
"</<
" should be immediately preceded by a terminal category.
Anaphoric references are only allowed if the previous text contains a matching antecedent that is accessible. For example, in the case of the partial sentence
A country does not contain a river and borders ...one can refer to "a country", but not to "a river" because being in the scope of a negation makes it inaccessible.
In order to define the accessibility constraints needed for anaphoric references, we
distinguish two types of grammar rules: accessible rules "=>
" and inaccessible rules "~>
".
The following example shows an inaccessible rule:
Inaccessible rules are handled in the same way as accessible rules with the only exception that the components that are in the scope of the rule are not accessible for subsequent anaphoric references.vp(num:Num,neg:plus) ~> v(num:Num,neg:plus,type:tr), np(case:acc).
This can be visualized by the introduction of a special node "~" in the syntax tree whenever an inaccessible rule is used. For the partial sentence introduced before, the syntax tree could look as follows:
In this case, several accessible rules and exactly one inaccessible rule (indicated by the "~"-node) have been used. All preceding components that can be reached through the syntax tree without traversing a "~"-node in the top-down direction are accessible. Thus, "a country" is accessible from the position "*", but "a river'' is not. Furthermore, "a country" would be accessible from the position of "a river" because the "~"-node is in this case traversed only in the bottum-up direction.
Note that the SWI Prolog command might be different on your machine (e.g. "swipl -s generate_java.pl -g "generate_java('my_codeco_grammar.pl', 'my.package', 'MyJavaGrammar', 'my_start_category')" -t halt
plcon
" or "pl
").
The Prolog DCG file can be generated like this:
swipl -s generate_dcg.pl -g "generate_dcg('my_codeco_grammar.pl', 'my_dcg_grammar.pl')" -t halt
Constructor Summary | |
---|---|
Grammar(Nonterminal startCategory)
Creates a empty grammar with the given start category. |
|
Grammar(java.lang.String startCategoryName)
Creates a empty grammar with a start category of the given name. |
Method Summary | |
---|---|
void |
addRule(Rule rule)
Adds the rule to the grammar. |
java.util.ArrayList<Rule> |
getEpsilonRules()
Returns all the rules that have no body categories. |
java.util.ArrayList<Rule> |
getRulesByHeadName(java.lang.String name)
Returns the rules whose head category has the given name. |
Nonterminal |
getStartCategory()
Returns the start category. |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public Grammar(Nonterminal startCategory)
startCategory
- The start category for the grammar.public Grammar(java.lang.String startCategoryName)
startCategoryName
- The name of the start category for the grammar.Method Detail |
---|
public Nonterminal getStartCategory()
public void addRule(Rule rule)
rule
- The rule to be added.public java.util.ArrayList<Rule> getRulesByHeadName(java.lang.String name)
name
- The name of the head category.
public java.util.ArrayList<Rule> getEpsilonRules()
public java.lang.String toString()
toString
in class java.lang.Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |