ch.uzh.ifi.attempto.chartparser
Class Grammar

java.lang.Object
  extended by ch.uzh.ifi.attempto.chartparser.Grammar
Direct Known Subclasses:
ACEGrammar, ExampleGrammar, StandardGrammar

public class Grammar
extends java.lang.Object

This class represents a grammar that is needed to run the chart parser. A grammar can be created either directly in Java or on the basis of a file in the Codeco format.

Codeco Format

Codeco stands for "COncrete and DEclarative grammar notation for COntrolled natural languages" and uses Prolog notation to provide a nice grammar representation. Simple grammar rules in Codeco look almost the same as common Prolog DCG rules. Just replace the operator "-->" by "=>":
 vp => v, np.
 v => [does, not], verb.
Note that terminal categories are in square brackets.

Complex grammar rules in Codeco are different from common Prolog DCG rules in the sense that they are using features rather than arguments with fixed positions. Arguments are not recognized by their position but by their name:

 vp(num:Num,neg:Neg) => v(num:Num,neg:Neg,type:tr), np(case:acc).
 v(neg:plus,type:Type) => [does, not], verb(type:Type).
Every feature has the form Name:Value where Name has to be an atom and Value can be a variable or an atom (but not a compound term). Also terminal categories (in square brackets) can have features and they can be used as pre-terminals that are instantiated with the concrete word forms outside of the parser:
 np => [noun(text:Noun)].

Codeco provides special support for anaphoric references. Anaphoric references are used in (controlled) natural languages to refer to objects earlier in the sentence. For example, in the sentence

A country contains an area that is not controlled by the country.
the anaphoric reference "the country" refers to the antecedent "a country". In Codeco, anaphoric references are defined by the special categories ">>>" and "<<<". ">>>" marks a position in the text to which anaphoric references can refer to (such positions are called "antecedents"). "<<<" refers back to the closed possible antecedent. An example is shown here:
 np => [a, noun(text:Noun)], >>>(type:noun, noun:Noun).
 ref => [the, noun(text:Noun)], <<<(type:noun, noun:Noun).
Furthermore, the special category "</<" can be used to ensure that there is no matching antecedent. This can be used for example for variables to ensure that no variable is defined twice:
 np => [var(text:Var)], </<(type:var, var:Var), >>>(type:var, var:Var).
 ref => [var(text:Var)], <<<(type:var, var:Var).
In order to be predicted correctly by the chartparser, the back-referring categories "<<<" and "</<" should be immediately preceded by a terminal category.

Anaphoric references are only allowed if the previous text contains a matching antecedent that is accessible. For example, in the case of the partial sentence

A country does not contain a river and borders ...
one can refer to "a country", but not to "a river" because being in the scope of a negation makes it inaccessible.

In order to define the accessibility constraints needed for anaphoric references, we distinguish two types of grammar rules: accessible rules "=>" and inaccessible rules "~>". The following example shows an inaccessible rule:

 vp(num:Num,neg:plus) ~> v(num:Num,neg:plus,type:tr), np(case:acc).
Inaccessible rules are handled in the same way as accessible rules with the only exception that the components that are in the scope of the rule are not accessible for subsequent anaphoric references.

This can be visualized by the introduction of a special node "~" in the syntax tree whenever an inaccessible rule is used. For the partial sentence introduced before, the syntax tree could look as follows:

example syntax tree

In this case, several accessible rules and exactly one inaccessible rule (indicated by the "~"-node) have been used. All preceding components that can be reached through the syntax tree without traversing a "~"-node in the top-down direction are accessible. Thus, "a country" is accessible from the position "*", but "a river'' is not. Furthermore, "a country" would be accessible from the position of "a river" because the "~"-node is in this case traversed only in the bottum-up direction.

Transformations

Codeco grammars can be translated automatically into a Java class or into a Prolog DCG using the SWI Prolog programs "generate_java.pl" or "generate_dcg.pl", respectively. Those programs can be found in the directory "src/ch/uzh/ifi/attempto/chartparser/util" of the source code of this package. The Java class can be generated like this:
 swipl -s generate_java.pl -g "generate_java('my_codeco_grammar.pl', 'my.package', 'MyJavaGrammar', 'my_start_category')" -t halt
Note that the SWI Prolog command might be different on your machine (e.g. "plcon" or "pl"). The Prolog DCG file can be generated like this:
 swipl -s generate_dcg.pl -g "generate_dcg('my_codeco_grammar.pl', 'my_dcg_grammar.pl')" -t halt

Author:
Tobias Kuhn

Constructor Summary
Grammar(Nonterminal startCategory)
          Creates a empty grammar with the given start category.
Grammar(java.lang.String startCategoryName)
          Creates a empty grammar with a start category of the given name.
 
Method Summary
 void addRule(Rule rule)
          Adds the rule to the grammar.
 java.util.ArrayList<Rule> getEpsilonRules()
          Returns all the rules that have no body categories.
 java.util.ArrayList<Rule> getRulesByHeadName(java.lang.String name)
          Returns the rules whose head category has the given name.
 Nonterminal getStartCategory()
          Returns the start category.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Grammar

public Grammar(Nonterminal startCategory)
Creates a empty grammar with the given start category.

Parameters:
startCategory - The start category for the grammar.

Grammar

public Grammar(java.lang.String startCategoryName)
Creates a empty grammar with a start category of the given name.

Parameters:
startCategoryName - The name of the start category for the grammar.
Method Detail

getStartCategory

public Nonterminal getStartCategory()
Returns the start category.

Returns:
The start category.

addRule

public void addRule(Rule rule)
Adds the rule to the grammar.

Parameters:
rule - The rule to be added.

getRulesByHeadName

public java.util.ArrayList<Rule> getRulesByHeadName(java.lang.String name)
Returns the rules whose head category has the given name.

Parameters:
name - The name of the head category.
Returns:
A list of rules.

getEpsilonRules

public java.util.ArrayList<Rule> getEpsilonRules()
Returns all the rules that have no body categories.

Returns:
A list of rules.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Copyright 2008-2009, Attempto Group, University of Zurich (see http://attempto.ifi.uzh.ch)