A Multilingual Semantic Wiki Based on
Attempto Controlled English
and
Grammatical Framework
Kaarel Kaljurand and Tobias Kuhn
Institute of Computational Linguistics, University of Zurich
ESWC 2013, Montpellier
2013-05-30

Presenter Notes

Existing wiki systems

  • wiki
    • user-friendly collaborative environment for knowledge management
    • content typically unconstrained natural language (NL), therefore not easily automatically processable
    • powered by software, e.g. MediaWiki
    • e.g. Wikipedia
  • semantic wiki (= wiki + formal semantics)
    • provides: richer query language, consistency checking (via automatic reasoning)
    • content typically NL + typed links (i.e. RDF triples)
    • software: Semantic Mediawiki, ...
  • controlled natural language (CNL) based semantic wiki
    • semantic wiki using CNL for the content
    • formal languages hidden (=> can use more expressive formal languages)
    • software: AceWiki

Presenter Notes

Shortcomings: cannot copy content from one language to the other, cannot ask questions, cannot check that the different versions of an article in different languages are about the same thing.

Multilingual CNL-based Semantic Wiki

  • multiple languages
    • natural: English, German, ...
    • formal: first-order logic, OWL, ...
    • languages for content vs user interface
  • CNL-based
    • backed by formal grammar(s)
    • formal languages are hidden
  • semantic
    • content automatically kept in sync via precise translation
    • consistency checking, question answering, ... (depending on the domain)
  • wiki
    • user-friendly
    • collaborative

Presenter Notes

Possible use cases

  • multilingual ontology editor
    • e.g. environment where users agree on the content and multilingual vocabulary of an OWL-style geography ontology
  • tourist phrasebook
    • book structure (chapters and sections)
    • multilingual content presented in parallel
  • catalog of museum objects (paintings, painters)
    • each object on its own wiki page
    • rich queries (e.g. "which Dutch painter painted which French painter?")
  • SWIFTTT systems (see: David Karger's talk)

Presenter Notes

Background technologies

Presenter Notes

Attempto Controlled English (ACE)

  • subset of natural English
    • conjunction, disjunction, negation, if-then, ...
    • anaphoric references: pronouns, definite noun phrases, variables
    • quantifiers: every, no, at least 3, ...
    • content words: proper names, common nouns, verbs, adjectives, ...
  • grammar is fixed, but users can change content words
  • deterministic ambiguity handling
    • anaphora resolution (France borders Spain and it borders Portugal.)
    • quantifier scope (Every country borders a country.)
    • attachment (Every EU-country borders a country that is a EU-country and is a NATO-country.)
  • well-defined translation to and from first-order logic, OWL, ...
  • end-user documentation: construction and interpretation rules, as restrictions of English

Presenter Notes

ACE reasoning via translation to OWL

Every country that does not border a sea is a landlocked-country.

SubClassOf(
   ObjectIntersectionOf(
      :country
      ObjectComplementOf(
         ObjectSomeValuesFrom(
            :border
            :sea
         )
      )
   )
   :landlocked-country
)

Which country is a landlocked-country?

ObjectIntersectionOf(
    :country
    :landlocked-country
)

Presenter Notes

AceWiki

  • expressive semantic wiki system
  • front-end language ACE; background reasoning language OWL
  • monolingual, fixed grammar, no ambiguity handling

Presenter Notes

Grammatical Framework (GF)

  • framework for multilingual grammar engineering
    • functional programming language optimized to handle natural language
    • resource grammar library implementing common morphological and syntactic structures
    • mildly context sensitive
  • grammar = language-neutral abstract syntax + multiple concrete syntaxes that implement the abstract functions and categories, specifying words, word order, agreement, etc.
    • border : Country -> Country -> Relation
    • English: border x y = x!Nom + "borders" + y!Nom
    • Estonian: border x y = x!Gen + "naaber on" + y!Nom
  • translation = parse a string in language A to tree(s) + linearize these tree(s) as strings in language B
  • parsing (translation, look-ahead, ...) based on Parallel Multiple Context-Free Grammars
  • various tools + bindings to Python, Java, Javascript, Prolog, ...

Presenter Notes

GF Resource Grammar Library (RGL)

  • morphology and syntax for ~30 languages via language-neutral API
  • developers do not need detailed knowledge of the languages that they want to support in their application

Presenter Notes

Presenter Notes

Implementation of AceWiki-GF

  • integrate ACE with GF (ACE-in-GF)
    • implement a multilingual grammar of ACE in the GF framework
    • cover the languages supported by the GF resource grammar
    • not fine-tuned to any particular language (apart from ACE)
  • integrate AceWiki with GF (AceWiki-GF)
    • implement connection to GF tools (GF Webservice / Cloud Service)
    • add support for the management of multilinguality, ambiguity, grammar

Presenter Notes

ACE-in-GF (main idea)

An ACE grammar implemented in GF adds multiple natural languages as front-ends to ACE. As a result, these languages can be mapped to and from various formal languages already supported by ACE.

Presenter Notes

ACE-in-GF (main idea)

German

Jedes Land, das nicht an ein Meer grenzt, ist ein Binnenland.

ACE-in-GF tree

baseText (sText (s (vpS (everyNP (relCN (cn_as_VarCN country_CN)
  (neg_predRS which_RP (v2VP border_V2 (thereNP_as_NP
   (aNP (cn_as_VarCN sea_CN))))))) (npVP (thereNP_as_NP
    (aNP (cn_as_VarCN landlocked_country_CN)))))))

ACE

Every country that does not border a sea is a landlocked-country.

OWL

SubClassOf(
   ObjectIntersectionOf(
      :country
      ObjectComplementOf(
         ObjectSomeValuesFrom( :border :sea )
      )
   )
   :landlocked-country
)

Presenter Notes

ACE in GF

  • implementation of the ACE syntax
    • extension of Angelov and Ranta (CNL 2009)
    • focus on the subset of ACE that can be mapped to OWL
    • almost 100% coverage at almost 0% ambiguity
    • no direct generation of discourse representation structures (DRS)
  • support most RGL languages
    • Bulgarian, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Italian, Latvian, Norwegian, Polish, Romanian, Russian, Spanish, Swedish, Thai, Urdu
    • RGL-based design provides automatic increase in quality and language-coverage over time
  • status
    • some precision problems, e.g. anaphoric references do not obey DRS accessibility constraints
    • ambiguity and coverage problems in some languages

Presenter Notes

More development effort has gone into German, Spanish and Finnish. Other implementations have holes in the coverage of ACE constructs that are not provided by the RGL.

ACE-in-GF translation example

ACE: every person that speaks a language X does not forget X .
Bul: всеки човек който говори език X не забравя X .
Cat: cada persona que parla una llengua X no oblida X .
Chi: 说 一 种 X 语 言 的 每 个 人 没 忘 X 。
Dan: hver person , som taler et sprog X glemmer ikke X .
Dut: elke persoon , dat een taal X spreekt vergeet niet X .
Fin: jokainen henkilö , joka puhuu kieltä X ei unohda X:ää .
Fre: chaque personne qui parle une langue X n' oublie pas X .
Ger: jede Person , die eine Sprache X spricht vergißt X nicht .
Gre: κάθε πρόσωπο που μιλά μία γλώσσα τον X δεν ξεχνά τον X .
Hin: हर [person_CN] , जो [language_CN] X बोलता है X नहीं भूलता है .
Ita: ogni persona che parla una lingua X non dimentica X .
Lav: ikviena persona , kas saka valodu X neaizmirst X .
Nor: hver person , som snakker et språk X glemmer ikke X .
Pol: każda osoba , która rozmawia z językiem X nie zapomina X .
Ron: orice persoană care vorbeşte o limbă X nu îl uită pe X .
Rus: каждый лицo , который говорит на языке X не забывает X .
Spa: cada persona que habla una lengua X no olvida X .
Swe: varje person , som talar ett språk X glömmer inte X .
Tha: บุคคล ทุก คน ที่ พูด ภาษา X ไม่ ลืม X
Urd: ہر شخص , جو زبان X بولتا ہے X نہیں بھولتا ہے

Presenter Notes

AceWiki integration with GF

  • wiki content is based on a (single) GF grammar
    • provided by GF Webservice / Cloud service
    • optimized for ACE-in-GF (but other GF grammars can also be used)
  • wiki entry is GF abstract tree set
    • viewed via linearization(s)
    • can represent ambiguity
  • multilingual viewing and editing of wiki content
    • grammar-based look-ahead editing that shows next possible tokens
    • ambiguity resolution via another concrete language
  • grammar integrated into the wiki
    • GF grammars are very modular
    • grammar modules as wiki articles (wiki-linking of grammar and content)
    • grammar can be changed while editing the wiki

Presenter Notes

ACE-based geography article

Presenter Notes

Depicted are the ACE version and the German version (containing the look-ahead editor).

Note that the UI is language dependent.

Ambiguity resolution


Presenter Notes

Ambiguity between object and subject relative clause. Occurs in German and Dutch. The wiki users can choose the correct tree by looking at the tree set in a language other than German, e.g. DisambGer (if it exists).

Grammar module page

Presenter Notes

GF source editing is available in the GF Cloud Service. AceWiki-GF just reflects that. Some types of errors can be pinpointed.

Automatic question answering


Presenter Notes

Evaluation of ACE-in-GF

Design

  • generate ~100 ACE sentences/questions and automatically translate them to all the languages
    • full coverage of all the grammar functions
    • large coverage of OWL axiom structures (subclass, range, domain, transitivity, ...)
  • measure translation accuracy from ACE to other languages
  • use Google Translate as the baseline
  • 20 human evaluators (2 per language) as the gold standard

Results

  • participants preferred ACE-in-GF translations to Google translations and post-edited them less
  • many edits were stylistic (e.g. users preferred elliptical sentences)

Presenter Notes

Evaluation of ACE-in-GF


Presenter Notes

Evaluation of AceWiki-GF

Design

  • develop a 500-word geography lexicon
    • 3 languages: English, German and Spanish
    • 3 authors (incl. native speakers of German and Spanish, and a GF engineer)
  • ask users of different languages to supply the wiki with sentences and tag each as true or false
  • ask them then to evaluate others' sentences as true or false
  • measure the user (dis)agreement and how much it is influenced by the automatic translation

Results

  • 30 participants entered 316 sentences, covering all syntactic functions
  • AceWiki-GF user interface was found to be easy to use
  • agreement level was ~83% with no significant influence from the translation

Presenter Notes

Future work

  • generalize to handle other types of grammars and reasoning
  • improve collaborative grammar editing features
  • improve ambiguity management (e.g. automatic reasoning-based ambiguity resolution)
  • use the wiki content to automatically generate documentation, grammar fragments, look-ahead editor customizations, etc. for novice users

Presenter Notes

Links

Presenter Notes

Thank You!

Presenter Notes