About the OLD

The Online Linguistic Database (OLD) is software for linguistic fieldwork.

Features

Collaboration and data sharing
Advanced search
Automatic morpheme cross-referencing
Configurable validation
Morphological parser & phonology builder
Text creation
User access control
Documentation
Open source
Graphical User Interface: Dative
RESTful JSON API

Technical

The OLD is software for creating RESTful web services that facilitate collaborative linguistic fieldwork and language documentation. An OLD RESTful web service receives inputs and returns outputs in JSON. The OLD is written in Python using the Pylons web framework and MySQL/SQLAlchemy. Its source code can be found on GitHub and releases can be accessed on PyPI and installed with either pip or easy_install.

Adding & Updating Resources

Adding a form to an OLD involves sending a POST request to the path /forms at the OLD's URL. The request body must contain a JSON object that has the same attributes as the following Python dict has keys.

form_template = {
    "grammaticality": "",
    "transcription": "",
    "morpheme_break": "",
    "morpheme_gloss": "",
    "narrow_phonetic_transcription": "",
    "phonetic_transcription": "",
    "translations": [],
    "comments": "",
    "speaker_comments": "",
    "date_elicited": "",
    "speaker": None,
    "source": None,
    "elicitation_method": None,
    "elicitor": None,
    "syntax": "",
    "semantics": "",
    "status": "",
    "syntactic_category": None,
    "verifier": None
}

The following snippet illustrates the creation of a form.

import copy
form = copy.deepcopy(form_template)
form['transcription'] = 'Arma virumque cano.'
form['translations'].append({
    'transcription': 'I sing of arms and a man.',
    'grammaticality': ''
})
r = s.post('%sforms' % url, data=json.dumps(form)).json()
assert r['transcription'] == 'Arma virumque cano.'
print '%s %s created a new form with id %d on %s.' % (
    r['enterer']['first_name'],
    r['enterer']['last_name'],
    r['id'],
    r['datetime_entered'])
# Joel Dunham created a new form with id 1014 on 2016-01-25T06:41:06.

Updating

To update a form, issue a PUT request to /forms/<id>, where <id> is the id of the form to be updated.

The object/dict received from a successful create request can be modified and sent back in the update request. However, its relational attributes (e.g., elicitor, speaker, etc.) must be converted to id values and its date elicited value must be converted to mm/dd/yyyy format.

r['morpheme_break'] = 'arm-a vir-um-que can-o'
r['morpheme_gloss'] = 'arm-ACC.PL man-ACC.PL-and sing-1SG.PRS'
r = s.put('%sforms/%d' % (url, r['id']),
    data=json.dumps(r)).json()
print 'Form %d modified on %s.' % (r['id'],
    r['datetime_modified'])
# Form 1014 modified on 2016-01-25T07:12:51.

Searching Over Resources

The OLD accepts search expressions as JSON arrays. The following array represents a search that will return all forms whose transcription value begins with an “a”.

["Form", "transcription", "regex", "^a"]

Here is how you would request the forms that match the above search, by making a POST request to forms/search:

url = 'https://projects.linguistics.ubc.ca/demoold/'
search_expr = json.dumps({
    "query": {
        "filter": ["Form", "transcription", "regex", "^a"]}})
r = s.post('%sforms/search' % url,
    data=search_expr).json()
print '\n'.join(t['transcription'] for t in r[:3])
# anteferre
# abacus
# abacus

Pagination & Ordering

Search requests can also contain pagination and ordering parameters. The following search request performs the same search as above, except that now we are asking that the results be sorted by transcription in descending order and we want the first three forms only.

search_expr = json.dumps({
    "query": {
        "filter": ["Form", "transcription", "regex", "^a"],
        "order_by": ["Form", "transcription", "desc"]
    },
    "paginator": {
        "page": 1,
        "items_per_page": 3
    }
})
r = s.post('%sforms/search' % url,
    data=search_expr).json()
print r['paginator']['count']
# 720
print '\n'.join(t['transcription'] for t in r['items'])
# azymum
# axitiosus
# axitiosus

Conjunction

You can conjoin two or more search expressions using the syntax illustrated below. Here we are adding the requirement that all returned forms must be grammatical, i.e., have an empty string for their grammaticality value.

search_expr = json.dumps({
    "query": {
        "filter": ["and",
            [
                ["Form", "transcription", "regex", "^a"],
                ["Form", "grammaticality", "=", ""]
            ]
        ]}})
r = s.post('%sforms/search' % url,
    data=search_expr).json()
print '\n'.join(t['transcription'] for t in r[:3])
# anteferre
# abacus
# abacus

Disjunction

To create a disjunction of search expressions, use "or" as the first element of the array, as shown below. Here we are looking for all forms that contain "cat" in their morpheme gloss or comments fields or in any one of their translation values.

search_expr = json.dumps({
    "query": {
        "filter": ["or",
            [
                ["Form", "morpheme_gloss", "regex", "cat"],
                ["Form", "comments", "regex", "cat"],
                ["Form", "translations", "transcription",
                    "regex", "cat"],
            ]
        ]}})
r = s.post('%sforms/search' % url,
    data=search_expr).json()
print ('transcription:  %s\n'
       'morpheme gloss: %s\n'
       'comments:       %s\n'
       'translation(s): %s' % (
           r[0]['transcription'],
           r[0]['morpheme_gloss'],
           r[0]['comments'],
           '\n                '.join(
               t['transcription'] for t in r[0]['translations'])))
# transcription:  abdico
# morpheme gloss:
# comments:
# translation(s): resign, abdicate
#                 abolish
#                 disinherit
#                 renounce, reject, expel, disapprove of

Negation

A search expression can be negated by making it the second element in an array whose first element is "not". In the search below, the first conjunct is negated and is used to ensure that no forms with the ungrammatical value “*” are included in the search results.

The second conjunct shows the use of “!=”, meaning “not equals”, to filter out forms whose status value is set to “requires testing”.

The next four conjuncts ensure that all forms that match this search were entered by someone with the last name “Dunham”, contain a progressive aspectual construction in their translation, contain a specific morpheme “á-”, and were entered some time before January 1, 2011.

search_expr = json.dumps({
    "query": {
        "filter": ["and",
            [
                ["not", ["Form", "grammaticality", "=", "*"]],
                ["Form", "status", "!=", "requires testing"],
                ["Form", "enterer", "last_name", "=", "Dunham"],
                ["Form", "translations", "transcription",
                    "regex", "( )(is|am|are)( ).+ing( |$)"],
                ["Form", "morpheme_break", "regex", "(^| |-)á-"],
                ["Form", "datetime_entered", "<",
                    "2011-01-01T12:00:00"]
            ]
        ]}})
r = s.post('%sforms/search' % url,
    data=search_expr).json()
print ('transcription:  %s\n'
       'morpheme break: %s\n'
       'morpheme gloss: %s\n'
       'translation(s): %s\n'
       'entered:        %s\n'
       'enterer:        %s' % (
            r[0]['transcription'],
            r[0]['morpheme_break'],
            r[0]['morpheme_gloss'],
            '\n                '.join(
                t['transcription'] for t in r[0]['translations']),
            r[0]['datetime_entered'],
            '%s %s' % (
                r[0]['enterer']['first_name'],
                r[0]['enterer']['last_name'])))

# transcription:  oma sinai'koan nitááwaanika otsin'iihka'siimii
# morpheme break: om-wa sina-ikoan nit-á-waanii-ok-wa ot-inihka'sim-yi
# morpheme gloss: DEM-PROX cree-being 1-DUR-say-INV-3SG 3-name-IN.SG
# translation(s): that Cree man is telling me his name
# entered:        2008-02-25T00:00:00
# enterer:        Joel Dunham

Getting started

Once you have access to an OLD instance (cf. installation), you can interact with it using standard RESTful patterns and your programming language of choice. Here are some examples that show how you can interact with an OLD using the Python Requests library.

Log in

To log in to an OLD and create a session that remembers that you've logged in, do the following. Note: all other examples on this web site assume you have logged in using a session object s as shown below.

import requests, json
s = requests.Session()
url = 'https://projects.linguistics.ubc.ca/demoold/'
s.headers.update({'Content-Type': 'application/json'})
resp = s.post('%slogin/authenticate' % url,
    data=json.dumps({
        'username': 'myusername',
        'password': 'mypassword'}))
assert resp.json().get('authenticated') == True

Fetch your data (i.e., forms)

The following will fetch all of the forms in your OLD as a list of dicts.

forms = s.get('%sforms' % url).json()
for form in forms[:3]:
    print '%s "%s"' % (form['transcription'],
        form['translations'][0]['transcription'])
# Qaⱡa kin in? "Who are you?"
# Hun upxni qaⱡa hin in. "I know who you are."
# Qaⱡa ku in? "Who am I?"

Fetch with pagination

To fetch a subset of your forms, supply a paginator dict with items_per_page and page keys. The return value will be a dict with a paginator key that adds a count value to your supplied paginator, telling you how many forms are in your database. The items key valuates to a list of the N form objects that you requested.

paginator = {'items_per_page': 10, 'page': 1}
forms = s.get('%sforms' % url, params=paginator).json()
print forms['paginator']['count']
# 800
print len(forms['items'])
# 10
print forms['items'][0]['transcription']
# maⱡtin haqapsi kanuhus nanas

Fetch collections

Request all of your collections as a list of dicts.

collections = s.get('%scollections' % url).json()
for collection in collections[:3]:
    print '%s. %s' % (collection['title'],
        collection['description'])
# Martina's Fall. Group 4
# Is This A...?. 
# Cup Game. Group 1

Note that usage of the paginator parameter, as described above, works for requests on any OLD resource type, including forms, collections, corpora, etc.

Fetch a collection and its forms

In order to request a collection as well as all of the forms that it contains, do the following. Note that the order of forms in a collection is defined by the order of references in the collection's contents field. This is why we must use a regular expression to identify the form references in the following code example.

collection = s.get('%scollections/1' % url).json()
coll_forms = collection['forms']
print 'Collection %s contains %d forms.\n' % (
    collection['title'], len(coll_forms))
# Collection Going for a visit ... contains 9 forms.
import re
for form_id in map(int, re.findall('form\[(\d+)\]',
    collection['contents'])):
    form = [f for f in coll_forms
        if f['id'] == form_id][0]
    print u'%s\n%s\n' % (form['transcription'],
        form['translations'][0]['transcription'])
# ȼan qakiʔni: xman qawakaxi ka akitⱡa kuȼxaⱡ upxnamnaⱡa
# John says: Come to me for a visit! (Come over to my ...
#
# James qakiʔni: Waha! Huȼxaⱡ qataⱡ ȼ̓inaⱡ upxnisni at ...
# James says: No! I can't go to you for a visit. I ...
#
# ...

Language-specific OLD Applications

The following 21 languages are being collaboratively documented and analyzed using the OLD. These applications are accessible (to their registered users) via any Dative user interface, including the one being served at app.dative.ca.

Data in OLD Applications

This bar graph shows how much data are in the existing OLD applications. Click the squares in the legend to show or hide various data types, e.g., audio files or words.

Activity in OLD Applications

This line graph shows how data have been created and modified in these OLD applications over time. Because of many recent import and migration activities, it may be difficult to view real user activity. You can click-and-drag a selection of the graph to view it in greater detail.

Installing the OLD

You can download and install the OLD on your own computer or on a server and use it to build and serve as many OLD instances as you want.

To avoid dependency conflicts, the OLD should be installed in a Python virtual environment using virtualenv. To install virtualenv and create a virtual environment, run:

pip install virtualenv
virtualenv env
source env

Then, install the OLD with pip:

pip install onlinelinguisticdatabase

Or, install it with easy_install:

easy_install onlinelinguisticdatabase

Or, install from source:

git clone https://github.com/jrwdunham/old.git
cd old
python setup.py develop

Build an OLD Instance

Once you have the OLD installed, you can create as many OLD instances as you want.

mkdir myold
cd myold
paster make-config onlinelinguisticdatabase production.ini
paster setup-app production.ini

Serve your OLD

paster serve production.ini

Use MySQL

By default, the OLD uses SQLite as its relational database (RDBMS). For testing and personal use, this is fine; however, for production environments when serving an OLD to multiple users, MySQL should be used.

Assuming you have both MySQL and Python's MySQLdb installed (pip install MySQL-python), create the MySQL database and user as follows.

CREATE DATABASE myold DEFAULT CHARACTER SET utf8;
CREATE USER 'myuser'@'localhost' IDENTIFIED BY 'mypassword';
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP ON myold.* TO 'myuser'@'localhost';

Then edit the config file that you created earlier by running paster make-config (e.g., production.ini) and comment out the SQLite line and uncomment the MySQL lines and insert the correct database name, username and password:

# sqlalchemy.url = sqlite:///production.db
sqlalchemy.url = mysql://myuser:mypassword@localhost:3306/myold?charset=utf8
sqlalchemy.pool_recycle = 3600

Soft Dependencies

The OLD will work fine if you've followed the installation instructions above. However, in order to use all of the OLD's functionality, the following command-line programs, Python modules and system libraries should be also be installed. Exactly how to do this depends on your operating system. There is an OLD install script (that currently only works on Ubuntu 10.04), which may offer some help in installing these soft dependencies.

Detailed Installation Instructions

For more detailed instructions for installing the OLD and building and serving OLD instances, please see either the Build OLD project or the Official OLD Documentation.

Frequently Asked Questions

show all hide all

How can I use the OLD to document my language?

If a group is already using the OLD to document your language, you can ask an administrator of that OLD for access. Otherwise, you can install and serve your own OLD on a web server.
Do I need to know how to build a web site in order to use the OLD?

We are working on a single-click installer for the OLD (and for Dative/OLD systems). Check back soon.

At present, in order to download, install, and serve an OLD you will need a basic understanding of the Linux/Unix command line. The Build OLD project contains install and build/serve scripts that may help you. Currently these scripts only work on Ubuntu servers.
Is there a demo?

Not yet. We will be releasing a demo Dative/OLD soon so you can play around with it.
What is Dative and how is it different from the OLD?

Dative is a graphical user interface (GUI) that works with the OLD. Dative makes it easy for people (who aren't programmers) to work with the OLD.

The OLD is software for building web services. An OLD web service is designed so that programs (and programmers) can talk to it. If you are a non-technical user, then you will probably want to use the OLD with a Dative interface.
Can I use the OLD offline, i.e., in the field?

Not yet. Offline functionality is a planned feature. Check back soon.
Does it have documentation?

Yes. See the OLD Documentation. See also Dunham (2014).
Does it have a graphical interface?

Yes, see Dative.
How can I migrate my data to the OLD?

You can use the OLD's RESTful JSON API to programmatically upload your existing data structures to the OLD.

If your data are stored on a LingSync corpus, the LingSync-to-OLD migrator program may work for you.
Is it open source? How do I get the code?

Yes, the OLD is open source and is licenced under Apache License 2.0. Its source code is available on GitHub.
How do I install it?

For detailed instructions on installing the OLD as well as its soft dependencies, see the Documentation or the Ubuntu install script.

For the impatient, either pip install onlinelinguisticdatabase or easy_install onlinelinguisticdatabase should work.
Can I search across multiple OLDs at the same time?

Yes. See the Cross-OLD Searches script.

Documentation

The Official OLD documentation
Dunham (2014) is a PhD dissertation that describes and argues for the OLD. It was written by the OLD's primary developer.

Resources

The table below lists all resources that an OLD exposes. The Path column indicates the name of the resource in URL paths when making RESTful HTTP requests (cf. API). The Extra actions column indicates that the requests can be made against this resource beyond the standard retrieve, create, update, delete and search requests.

Resource	Path	Searchable	Read-only	Extra actions
App settings	applicationsettings
Collections	collections
Collection bks	collectionbackups
Corpora	corpora
Corpus bks	corpusbackups
Elicitation methods	elicitationmethods
Files	files
Forms	forms
Form bks	formbackups
Form searches	formsearches
Languages	languages
Morpheme LMs	morphemelanguagemodels
Morpheme LM bks	morphemelanguagemodelbackups
Morphological parsers	morphologicalparsers
Morphological parser bks	morphologicalparserbackups
Morphologies	morphologies
Morphology bks	morphologybackups
Orthographies	orthographies
Pages	pages
Phonologies	phonologies
Phonology bks	phonologybackups
Remembered forms	rememberedforms
Sources	sources
Speakers	speakers
Syntactic categories	syntacticcategories
Tags	tags
Users	users

Form resources

The form is the most commonly accessed OLD resource. It represents a linguistic form, i.e., morpheme, word or sentence. When issuing a request to create or update a form, the request body must contain a JSON object that has certain attributes whose values must match certain criteria. When the OLD returns a form it is also in JSON format; however, it may contain additional attributes that the OLD generates, such as enterer. The table below summarizes these attributes, the types of, and restrictions on, their values, and whether they are read-only, i.e., specified only by the OLD.

Attribute	Type	Requirements/Description
transcription	string	not empty, max 255 characters
phonetic_transcription	string	max 255 characters
narrow_phonetic_transcription	string	max 255 characters
morpheme_break	string	max 255 characters
grammaticality	string	one of grammaticalities listed in app settings
morpheme_gloss	string	max 255 characters
translations	array	array containing at least one object with transcription and grammaticality values
comments	string
speaker_comments	string
syntax	string	max 1023 characters
semantics	string	max 1023 characters
status	string	either “tested” or “requires testing”
elicitation_method	integer	id of existing OLD elicitation method resource
syntactic_category	integer	id of existing OLD syntactic category resource
speaker	integer	id of existing OLD speaker resource
elicitor	integer	id of existing OLD user resource
verifier	integer	id of existing OLD user resource
source	integer	id of existing OLD source resource
tags	array	array of OLD tag resource ids
files	array	array of OLD file resource ids
date_elicited	string	date in MM/DD/YYYY format
id	integer	RDBMS-generated identifier
UUID	string	OLD-generated identifier
datetime_entered	string	date/time in ISO 8601 format
datetime_modified	string	date/time in ISO 8601 format
syntactic_category_string	string	sequence of syntactic categories corresponding to user-specified morphological analysis
morpheme_break_ids	array	cache of matches between morphemes and existing lexical forms
morpheme_gloss_ids	array	cache of matches between glosses and existing lexical forms
break_gloss_category	string	morpheme shape, gloss and category information zipped together
enterer	integer	id of OLD user who entered the form
modifier	integer	id of OLD user who last modified the form

REST API

The OLD exposes a RESTful API with data communicated in JSON format.

Standard API

The table below summarizes the standard OLD API. The OLD provides access to many “resources”, such as (linguistic) forms, collections (i.e., texts), tags, users, speakers, etc. The table below shows how to retrieve, delete, create and update OLD resources. Replace “forms” with the name of another resource, e.g., “tags”, in order to manipulate that resource.

HTTP method	Path	Effect	Parameters
GET	/forms	Retrieve all forms	Optional pagination and ordering parameters
GET	/forms/id	Retrieve form with id=id
GET	/forms/new	Get data needed to create a new form	Optional parameters controlling what is retrieved
GET	/forms/id/edit	Get data needed to edit form with id=id	Optional parameters controlling what is retrieved
DELETE	/forms/id	Delete for with id=id
POST	/forms	Create a new form	JSON object
PUT	/forms/id	Update form with id=id	JSON object

Search API

Certain OLD resources can be searched (cf. the “searchable” column in the table on the Resources page). As the table below indicates, there are two ways to request a search, one using the non-standard HTTP method SEARCH and the other using the POST method.

HTTP method	Path	Effect	Parameters
SEARCH	/forms	Search over forms	JSON object
POST	/forms/search	Search over forms	JSON object

Pagination

When retrieving or searching over an OLD resource, pagination and ordering parameters may be supplied. The pagination parameters are page and items_per_page. In a GET request, these are supplied in the URL's query string. Thus, a request to https://my-old-url.org/forms?page=99&items_per_page=10 would return forms 990 to 1000.

In a search request (using either the POST or SEARCH method), the pagination parameters are supplied in the JSON body of the request. The JSON body should be an object containing both query and paginator attributes, with the value of the paginator attribute being another object with its own page and items_per_page attributes. For example:

{
    "query": {"filter": [], "order_by": []},
    "paginator": {
        "page": 99,
        "items_per_page": 10
    }
}

Ordering

When retrieving or searching over an OLD resource, you can control the ordering of the resources returned. The ordering parameters are order_by_model (i.e., resource), order_by_attribute and order_by_direction. In a GET request, these are once again supplied in the URL's query string. Thus, a request to https://my-old-url.org/forms?order_by_attribute=transcription&order_by_model=form&order_by_direction=desc would return all of the forms in the database sorted by their transcriptions in descending order.

In a search request, the pagination parameters are supplied in the JSON body of the request: the query attribute is an object with an order_by attribute whose value is an array containing model, attribute and direction strings.

{
    "query": {
        "filter": [],
        "order_by": ["Form", "transcription", "desc"]},
    "paginator": {}
}