Link Grammar Visualized with Python

Introduction

Today we will discuss an approach to study of language syntax called Link Grammar theory, and then visualize the example of link grammar dictionary.

Link grammar was originally developed by Davy Temperley and Daniel Sleator (see [1]). The link grammar is based on making relations (connections) between pairs of words and in some way similar to categorical grammar.

Definitions

The basic notions of link grammar are words, linking requirements, sentences and a dictionary with rules for each word, which describe it’s linking requirements.

Words form a set of terminal symbols of the grammar. Each word has it’s linking requirements describing which other word can be connected to the left or right side of the word.

For a given word, several linking requirements may have to be satisfied at the same time, and we call it a rule.

The sequence of words form a sentence of the language, if a valid combination of rules is satisfied for this sentence, meaning each word in the sequence obeys certain rule. Otherwise, we say that the sentence is not part of language defined by link grammar.

A dictionary contains set of rules corresponding to each word in the language.

To graphically illustrate linking requirements, authors of [1] use connector notion which corresponds to linking requirement for a word. Each connector has it’s type and direction, as we can see on this picture:

To write down the rules of the link grammar, authors of link grammar theory suggested formal language. This language is used to define formulas and apply them for text parsing by a machine. For a given word, consider a formula:

We see groups of symbols:

  • Operators & and or
  • Connector type names A, D, B, O, S
  • Suffixes + and -
  • Braces ), ( (to arrange operation priority)
  • () - special notion for the empty formula with no connectors satisfied

Operators & and or are reminiscent of operators in Boolean algebra, but this formal language obeys different rules and corresponds to a different algebraic structure. Definition of formula satisfaction is recursive:

  1. Formula A & B is satisfied if both formulas A and B satisfied.
  2. Formula C or D is satisfied, if only C or D is satisfied and no connectors are satisfied in the other formula.

Considering this formal representation of words and connectors and also their graphical representation, we can give another definition of sentence in link grammar:

A sequence of words is a sentence of the language defined by the grammar if there exists a way to draw links among the words so as to satisfy each word’s formula, and the following meta-rules:

  1. Planarity: The links are drawn above the sentence and do not cross.
  2. Connectivity: The link ssuffice to connect all the words of the sequence together.
  3. Ordering: When the connectors of a formula are traversed from left to right, the words to which they connect proceed from near to far. In other words, consider a word, and consider two links connecting that word towards to its left. The link connecting the nearer word (the shorter link) must satisfy a connector appearing to the left (in the formula) of that of the other word. Similarly, a link to the right must satisfy a connector to the left (in the formula) of a longer link to the right.
  4. Exclusion: No two links may connect the same pair of words.

Visualization Task

Given the link grammar dictionary achieved with unsupervized learning techniques, we may want to visualize relationships between words, connectors, word clusters and rules they obey to get some insights of learned language structure. There are already exists javascript-based demo visualization of relations in link grammar dictionary available here.

The piece of such a dictionary can be seen below:

We will use sample dictionary provided by OpenCog project for our visualization. To render image we will use PyQtGraph library, built upon popular Qt framework for building GUIs. The task was broken into the 3 parts:

  1. Parsing of the dictionary
  2. Data serialization to feed into the PyQtGraph data structures
  3. Performing visualization with PyQtGraph

The result can be seen below:

Implementation of this visualization is publicly available at Github here.

Future work

Qt provides cross-platform solution to build GUI applications, and it is a major benefit to use it. It also have flexible event-handling system that allows to add more interactivity to visualization. So the next step may be to adapt code for interactive usage and convenient exploration of dictionary parts.

References

[1] Parsing English with a Link Grammar: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/link/pub/www/papers/ps/tr91-196.pdf

[2] Vizualizer code: https://github.com/Buzzefall/LGVizor

[3] An Introduction to the Link Grammar Parser: https://www.abisource.com/projects/link-grammar/dict/introduction.html

Student of NSU at BDA&AI Master's Program