Package ch.uzh.ifi.attempto.codeco

This package contains transformation programs for the Codeco notation.


Package ch.uzh.ifi.attempto.codeco Description

This package contains transformation programs for the Codeco notation. These programs are not written in Java but in SWI Prolog. Among other output formats, Java code can automatically be generated. This package consists of the following programs:

Below, the Codeco notation is briefly described and it is shown how the different transformation programs can be run.

The doctoral thesis "Controlled English for Knowledge Representation" describes the Codeco notation in detail.

Codeco Notation

Codeco stands for "COncrete and DEclarative grammar notation for COntrolled natural languages" and is a Prolog-based notation that allows for defining grammars for controlled natural languages in a convenient way. The most important features of Codeco are introduced below.

Simple grammar rules in Codeco look almost the same as common Prolog DCG rules. The only difference is that the operator "=>" is used instead of "-->":

 vp => v, np.
 v => ['does not'], verb.
Terminal categories are represented in square brackets.

Complex grammar rules in Codeco are different from common Prolog DCG rules in the sense that they are using features rather than arguments with fixed positions. Arguments are not recognized by their position but by their name:

 vp(num:Num,neg:Neg) => v(num:Num,neg:Neg,type:tr), np(case:acc).
 v(neg:plus,type:Type) => [does, not], verb(type:Type).
Every feature has the form Name:Value where Name has to be an atom and Value can be a variable or an atom (but not a compound term).

Codeco has special support for pre-terminal categories. Such categories are marked with the dollar sign "$" and can expand only to terminal categories:

 np => [a], $noun(text:Noun).
 $noun(text:country) => [country].

Codeco also provides special support for anaphoric references. Anaphoric references are used in (controlled) natural languages to refer to objects earlier in the sentence. For example, in the sentence

A country contains an area that is not controlled by the country.
the anaphoric reference "the country" refers to the antecedent "a country". In Codeco, anaphoric references are defined by the special categories ">" and "<". ">" marks a position in the text to which anaphoric references can refer (such positions are called "antecedents"). "<" refers back to the closed possible antecedent. An example is shown here:
 np => [a], $noun(text:Noun), >(type:noun, noun:Noun).
 ref => [the], $noun(text:Noun), <(type:noun, noun:Noun).
Furthermore, the special category "/<" can be used to ensure that there is no matching antecedent. This can be used, for example, for variables to ensure that no variable is introduced twice:
 np => $var(text:Var), /<(type:var, var:V), >(type:var, var:V).
The back-referring categories "<" and "/<" have to immediately follow a terminal or pre-terminal category.

Codeco has some more features, which are explained in the publication mentioned above.


Codeco grammars can be translated automatically into a Java class or into a Prolog DCG grammar, using the SWI Prolog programs "" or "", respectively. These programs can be found in the directory "src/ch/uzh/ifi/attempto/codeco" of the AceWiki system and are linked above. The Java class can be generated like this:
 swipl -s -g "generate_java('', 'my.package.MyGrammarClass')" -t halt
Note that the SWI Prolog command might be different on your machine (e.g. "plcon" or "pl"). The Prolog DCG file can be generated like this:
 swipl -s -g "generate_dcg('', '')" -t halt

Furthermore, this package provides the programs "" and "", which can be used to generate HTML and LaTeX representations of Codeco grammars. These programs are used as follows:

 swipl -s -g "generate_html('', 'my_html_file.html')" -t halt
 swipl -s -g "generate_latex('', 'my_latex_file.tex')" -t halt

Tobias Kuhn

Copyright 2008-2012, AceWiki developers