About the OLD
The Online Linguistic Database (OLD) is software for linguistic fieldwork.
Features
- Collaboration and data sharing
- Advanced search
- Automatic morpheme cross-referencing
- Configurable validation
- Morphological parser & phonology builder
- Text creation
- User access control
- Documentation
- Open source
- Graphical User Interface: Dative
- RESTful JSON API
Technical
The OLD is software for creating RESTful web services that facilitate collaborative linguistic fieldwork and language documentation. An OLD RESTful web service receives inputs and returns outputs in JSON. The OLD is written in Python using the Pylons web framework and MySQL/SQLAlchemy. Its source code can be found on GitHub and releases can be accessed on PyPI and installed with either pip or easy_install.
Adding & Updating Resources
Adding a form to an OLD involves sending a POST request to the path /forms at the OLD's URL. The request body must contain a JSON object that has the same attributes as the following Python dict has keys.
form_template = { "grammaticality": "", "transcription": "", "morpheme_break": "", "morpheme_gloss": "", "narrow_phonetic_transcription": "", "phonetic_transcription": "", "translations": [], "comments": "", "speaker_comments": "", "date_elicited": "", "speaker": None, "source": None, "elicitation_method": None, "elicitor": None, "syntax": "", "semantics": "", "status": "", "syntactic_category": None, "verifier": None }
The following snippet illustrates the creation of a form.
import copy form = copy.deepcopy(form_template) form['transcription'] = 'Arma virumque cano.' form['translations'].append({ 'transcription': 'I sing of arms and a man.', 'grammaticality': '' }) r = s.post('%sforms' % url, data=json.dumps(form)).json() assert r['transcription'] == 'Arma virumque cano.' print '%s %s created a new form with id %d on %s.' % ( r['enterer']['first_name'], r['enterer']['last_name'], r['id'], r['datetime_entered']) # Joel Dunham created a new form with id 1014 on 2016-01-25T06:41:06.
Updating
To update a form, issue a PUT request to /forms/<id>, where <id> is the id of the form to be updated.
The object/dict received from a successful create request can be modified and sent back in the update request. However, its relational attributes (e.g., elicitor, speaker, etc.) must be converted to id values and its date elicited value must be converted to mm/dd/yyyy format.
r['morpheme_break'] = 'arm-a vir-um-que can-o' r['morpheme_gloss'] = 'arm-ACC.PL man-ACC.PL-and sing-1SG.PRS' r = s.put('%sforms/%d' % (url, r['id']), data=json.dumps(r)).json() print 'Form %d modified on %s.' % (r['id'], r['datetime_modified']) # Form 1014 modified on 2016-01-25T07:12:51.
Searching Over Resources
The OLD accepts search expressions as JSON arrays. The following array represents a search that will return all forms whose transcription value begins with an “a”.
["Form", "transcription", "regex", "^a"]
Here is how you would request the forms that match the above search, by making a POST request to forms/search:
url = 'https://projects.linguistics.ubc.ca/demoold/' search_expr = json.dumps({ "query": { "filter": ["Form", "transcription", "regex", "^a"]}}) r = s.post('%sforms/search' % url, data=search_expr).json() print '\n'.join(t['transcription'] for t in r[:3]) # anteferre # abacus # abacus
Pagination & Ordering
Search requests can also contain pagination and ordering parameters. The following search request performs the same search as above, except that now we are asking that the results be sorted by transcription in descending order and we want the first three forms only.
search_expr = json.dumps({ "query": { "filter": ["Form", "transcription", "regex", "^a"], "order_by": ["Form", "transcription", "desc"] }, "paginator": { "page": 1, "items_per_page": 3 } }) r = s.post('%sforms/search' % url, data=search_expr).json() print r['paginator']['count'] # 720 print '\n'.join(t['transcription'] for t in r['items']) # azymum # axitiosus # axitiosus
Conjunction
You can conjoin two or more search expressions using the syntax illustrated below. Here we are adding the requirement that all returned forms must be grammatical, i.e., have an empty string for their grammaticality value.
search_expr = json.dumps({ "query": { "filter": ["and", [ ["Form", "transcription", "regex", "^a"], ["Form", "grammaticality", "=", ""] ] ]}}) r = s.post('%sforms/search' % url, data=search_expr).json() print '\n'.join(t['transcription'] for t in r[:3]) # anteferre # abacus # abacus
Disjunction
To create a disjunction of search expressions, use "or" as the first element of the array, as shown below. Here we are looking for all forms that contain "cat" in their morpheme gloss or comments fields or in any one of their translation values.
search_expr = json.dumps({ "query": { "filter": ["or", [ ["Form", "morpheme_gloss", "regex", "cat"], ["Form", "comments", "regex", "cat"], ["Form", "translations", "transcription", "regex", "cat"], ] ]}}) r = s.post('%sforms/search' % url, data=search_expr).json() print ('transcription: %s\n' 'morpheme gloss: %s\n' 'comments: %s\n' 'translation(s): %s' % ( r[0]['transcription'], r[0]['morpheme_gloss'], r[0]['comments'], '\n '.join( t['transcription'] for t in r[0]['translations']))) # transcription: abdico # morpheme gloss: # comments: # translation(s): resign, abdicate # abolish # disinherit # renounce, reject, expel, disapprove of
Negation
A search expression can be negated by making it the second element in an array whose first element is "not". In the search below, the first conjunct is negated and is used to ensure that no forms with the ungrammatical value “*” are included in the search results.
The second conjunct shows the use of “!=”, meaning “not equals”, to filter out forms whose status value is set to “requires testing”.
The next four conjuncts ensure that all forms that match this search were entered by someone with the last name “Dunham”, contain a progressive aspectual construction in their translation, contain a specific morpheme “á-”, and were entered some time before January 1, 2011.
search_expr = json.dumps({ "query": { "filter": ["and", [ ["not", ["Form", "grammaticality", "=", "*"]], ["Form", "status", "!=", "requires testing"], ["Form", "enterer", "last_name", "=", "Dunham"], ["Form", "translations", "transcription", "regex", "( )(is|am|are)( ).+ing( |$)"], ["Form", "morpheme_break", "regex", "(^| |-)á-"], ["Form", "datetime_entered", "<", "2011-01-01T12:00:00"] ] ]}}) r = s.post('%sforms/search' % url, data=search_expr).json() print ('transcription: %s\n' 'morpheme break: %s\n' 'morpheme gloss: %s\n' 'translation(s): %s\n' 'entered: %s\n' 'enterer: %s' % ( r[0]['transcription'], r[0]['morpheme_break'], r[0]['morpheme_gloss'], '\n '.join( t['transcription'] for t in r[0]['translations']), r[0]['datetime_entered'], '%s %s' % ( r[0]['enterer']['first_name'], r[0]['enterer']['last_name']))) # transcription: oma sinai'koan nitááwaanika otsin'iihka'siimii # morpheme break: om-wa sina-ikoan nit-á-waanii-ok-wa ot-inihka'sim-yi # morpheme gloss: DEM-PROX cree-being 1-DUR-say-INV-3SG 3-name-IN.SG # translation(s): that Cree man is telling me his name # entered: 2008-02-25T00:00:00 # enterer: Joel Dunham
Getting started
Once you have access to an OLD instance (cf. installation), you can interact with it using standard RESTful patterns and your programming language of choice. Here are some examples that show how you can interact with an OLD using the Python Requests library.
Log in
To log in to an OLD and create a session that remembers that you've logged in, do the following. Note: all other examples on this web site assume you have logged in using a session object s as shown below.
import requests, json s = requests.Session() url = 'https://projects.linguistics.ubc.ca/demoold/' s.headers.update({'Content-Type': 'application/json'}) resp = s.post('%slogin/authenticate' % url, data=json.dumps({ 'username': 'myusername', 'password': 'mypassword'})) assert resp.json().get('authenticated') == True
Fetch your data (i.e., forms)
The following will fetch all of the forms in your OLD as a list of dicts.
forms = s.get('%sforms' % url).json() for form in forms[:3]: print '%s "%s"' % (form['transcription'], form['translations'][0]['transcription']) # Qaⱡa kin in? "Who are you?" # Hun upxni qaⱡa hin in. "I know who you are." # Qaⱡa ku in? "Who am I?"
Fetch with pagination
To fetch a subset of your forms, supply a paginator dict with items_per_page and page keys. The return value will be a dict with a paginator key that adds a count value to your supplied paginator, telling you how many forms are in your database. The items key valuates to a list of the N form objects that you requested.
paginator = {'items_per_page': 10, 'page': 1} forms = s.get('%sforms' % url, params=paginator).json() print forms['paginator']['count'] # 800 print len(forms['items']) # 10 print forms['items'][0]['transcription'] # maⱡtin haqapsi kanuhus nanas
Fetch collections
Request all of your collections as a list of dicts.
collections = s.get('%scollections' % url).json() for collection in collections[:3]: print '%s. %s' % (collection['title'], collection['description']) # Martina's Fall. Group 4 # Is This A...?. # Cup Game. Group 1
Note that usage of the paginator parameter, as described above, works for requests on any OLD resource type, including forms, collections, corpora, etc.
Fetch a collection and its forms
In order to request a collection as well as all of the forms that it contains, do the following. Note that the order of forms in a collection is defined by the order of references in the collection's contents field. This is why we must use a regular expression to identify the form references in the following code example.
collection = s.get('%scollections/1' % url).json() coll_forms = collection['forms'] print 'Collection %s contains %d forms.\n' % ( collection['title'], len(coll_forms)) # Collection Going for a visit ... contains 9 forms. import re for form_id in map(int, re.findall('form\[(\d+)\]', collection['contents'])): form = [f for f in coll_forms if f['id'] == form_id][0] print u'%s\n%s\n' % (form['transcription'], form['translations'][0]['transcription']) # ȼan qakiʔni: xman qawakaxi ka akitⱡa kuȼxaⱡ upxnamnaⱡa # John says: Come to me for a visit! (Come over to my ... # # James qakiʔni: Waha! Huȼxaⱡ qataⱡ ȼ̓inaⱡ upxnisni at ... # James says: No! I can't go to you for a visit. I ... # # ...
Language-specific OLD Applications
The following 21 languages are being collaboratively documented and analyzed using the OLD. These applications are accessible (to their registered users) via any Dative user interface, including the one being served at app.dative.ca.
- Blackfoot (bla) https://projects.linguistics.ubc.ca/blaold/
- Chuj (cac) https://projects.linguistics.ubc.ca/cacold/
- Coeur d'Alene (crd) https://projects.linguistics.ubc.ca/crdold/
- Georgian (kat) https://projects.linguistics.ubc.ca/katold/
- Gitksan (git) https://projects.linguistics.ubc.ca/gitold/
- Kabyle (kab) https://projects.linguistics.ubc.ca/kabold/
- Kartuli (kat) https://projects.linguistics.ubc.ca/batumi_kartuliold/
- Khmer (khm) https://projects.linguistics.ubc.ca/khmold/
- Ktunaxa (kut) https://projects.linguistics.ubc.ca/kutold/
- Kwak'wala (kwk) https://projects.linguistics.ubc.ca/kwkold/
- Marka (rkm) https://projects.linguistics.ubc.ca/rkmold/
- Medumba (byv) https://projects.linguistics.ubc.ca/byvold/
- Mi'gmaq (mic) https://projects.linguistics.ubc.ca/micold/
- Moro (mor) https://projects.linguistics.ubc.ca/morold/
- Nata (ntk) https://projects.linguistics.ubc.ca/ntkold/
- Nepali (nep) https://projects.linguistics.ubc.ca/nepold/
- Okanagan (oka) https://projects.linguistics.ubc.ca/okaold/
- Plains Cree (crk) https://projects.linguistics.ubc.ca/crkold/
- Scottish Gaelic (gla) https://projects.linguistics.ubc.ca/glaold/
- Shona (sna) https://projects.linguistics.ubc.ca/snaold/
- Tlingit (tli) https://projects.linguistics.ubc.ca/tliold/
Data in OLD Applications
This bar graph shows how much data are in the existing OLD applications. Click the squares in the legend to show or hide various data types, e.g., audio files or words.

Activity in OLD Applications
This line graph shows how data have been created and modified in these OLD applications over time. Because of many recent import and migration activities, it may be difficult to view real user activity. You can click-and-drag a selection of the graph to view it in greater detail.

Installing the OLD
You can download and install the OLD on your own computer or on a server and use it to build and serve as many OLD instances as you want.
To avoid dependency conflicts, the OLD should be installed in a Python virtual environment using virtualenv. To install virtualenv and create a virtual environment, run:
pip install virtualenv
virtualenv env
source env
Then, install the OLD with pip:
pip install onlinelinguisticdatabase
Or, install it with easy_install:
easy_install onlinelinguisticdatabase
Or, install from source:
git clone https://github.com/jrwdunham/old.git
cd old
python setup.py develop
Build an OLD Instance
Once you have the OLD installed, you can create as many OLD instances as you want.
mkdir myold
cd myold
paster make-config onlinelinguisticdatabase production.ini
paster setup-app production.ini
Serve your OLD
paster serve production.ini
Use MySQL
By default, the OLD uses SQLite as its relational database (RDBMS). For testing and personal use, this is fine; however, for production environments when serving an OLD to multiple users, MySQL should be used.
Assuming you have both MySQL and Python's MySQLdb installed (pip install MySQL-python), create the MySQL database and user as follows.
CREATE DATABASE myold DEFAULT CHARACTER SET utf8; CREATE USER 'myuser'@'localhost' IDENTIFIED BY 'mypassword'; GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP ON myold.* TO 'myuser'@'localhost';
Then edit the config file that you created earlier by running paster make-config (e.g., production.ini) and comment out the SQLite line and uncomment the MySQL lines and insert the correct database name, username and password:
# sqlalchemy.url = sqlite:///production.db sqlalchemy.url = mysql://myuser:mypassword@localhost:3306/myold?charset=utf8 sqlalchemy.pool_recycle = 3600
Soft Dependencies
The OLD will work fine if you've followed the installation instructions above. However, in order to use all of the OLD's functionality, the following command-line programs, Python modules and system libraries should be also be installed. Exactly how to do this depends on your operating system. There is an OLD install script (that currently only works on Ubuntu 10.04), which may offer some help in installing these soft dependencies.
Detailed Installation Instructions
For more detailed instructions for installing the OLD and building and serving OLD instances, please see either the Build OLD project or the Official OLD Documentation.
Frequently Asked Questions
show all hide all-
How can I use the OLD to document my language?
If a group is already using the OLD to document your language, you can ask an administrator of that OLD for access. Otherwise, you can install and serve your own OLD on a web server.
-
Do I need to know how to build a web site in order to use the OLD?
We are working on a single-click installer for the OLD (and for Dative/OLD systems). Check back soon.
At present, in order to download, install, and serve an OLD you will need a basic understanding of the Linux/Unix command line. The Build OLD project contains install and build/serve scripts that may help you. Currently these scripts only work on Ubuntu servers.
-
Is there a demo?
Not yet. We will be releasing a demo Dative/OLD soon so you can play around with it.
-
What is Dative and how is it different from the OLD?
Dative is a graphical user interface (GUI) that works with the OLD. Dative makes it easy for people (who aren't programmers) to work with the OLD.
The OLD is software for building web services. An OLD web service is designed so that programs (and programmers) can talk to it. If you are a non-technical user, then you will probably want to use the OLD with a Dative interface.
-
Can I use the OLD offline, i.e., in the field?Not yet. Offline functionality is a planned feature. Check back soon.
-
Does it have documentation?Yes. See the OLD Documentation. See also Dunham (2014).
-
Does it have a graphical interface?
Yes, see Dative.
-
How can I migrate my data to the OLD?
You can use the OLD's RESTful JSON API to programmatically upload your existing data structures to the OLD.
If your data are stored on a LingSync corpus, the LingSync-to-OLD migrator program may work for you.
-
Is it open source? How do I get the code?
Yes, the OLD is open source and is licenced under Apache License 2.0. Its source code is available on GitHub.
-
How do I install it?
For detailed instructions on installing the OLD as well as its soft dependencies, see the Documentation or the Ubuntu install script.
For the impatient, either pip install onlinelinguisticdatabase or easy_install onlinelinguisticdatabase should work.
-
Can I search across multiple OLDs at the same time?
Yes. See the Cross-OLD Searches script.
Documentation
- The Official OLD documentation
- Dunham (2014) is a PhD dissertation that describes and argues for the OLD. It was written by the OLD's primary developer.
Resources
The table below lists all resources that an OLD exposes. The Path column indicates the name of the resource in URL paths when making RESTful HTTP requests (cf. API). The Extra actions column indicates that the requests can be made against this resource beyond the standard retrieve, create, update, delete and search requests.
Resource | Path | Searchable | Read-only | Extra actions |
---|---|---|---|---|
App settings | applicationsettings | |||
Collections | collections | |||
Collection bks | collectionbackups | |||
Corpora | corpora | |||
Corpus bks | corpusbackups | |||
Elicitation methods | elicitationmethods | |||
Files | files | |||
Forms | forms | |||
Form bks | formbackups | |||
Form searches | formsearches | |||
Languages | languages | |||
Morpheme LMs | morphemelanguagemodels | |||
Morpheme LM bks | morphemelanguagemodelbackups | |||
Morphological parsers | morphologicalparsers | |||
Morphological parser bks | morphologicalparserbackups | |||
Morphologies | morphologies | |||
Morphology bks | morphologybackups | |||
Orthographies | orthographies | |||
Pages | pages | |||
Phonologies | phonologies | |||
Phonology bks | phonologybackups | |||
Remembered forms | rememberedforms | |||
Sources | sources | |||
Speakers | speakers | |||
Syntactic categories | syntacticcategories | |||
Tags | tags | |||
Users | users |
Form resources
The form is the most commonly accessed OLD resource. It represents a linguistic form, i.e., morpheme, word or sentence. When issuing a request to create or update a form, the request body must contain a JSON object that has certain attributes whose values must match certain criteria. When the OLD returns a form it is also in JSON format; however, it may contain additional attributes that the OLD generates, such as enterer. The table below summarizes these attributes, the types of, and restrictions on, their values, and whether they are read-only, i.e., specified only by the OLD.
Attribute | Type | Requirements/Description | Read-only |
---|---|---|---|
transcription | string | not empty, max 255 characters | |
phonetic_transcription | string | max 255 characters | |
narrow_phonetic_transcription | string | max 255 characters | |
morpheme_break | string | max 255 characters | |
grammaticality | string | one of grammaticalities listed in app settings | |
morpheme_gloss | string | max 255 characters | |
translations | array | array containing at least one object with transcription and grammaticality values | |
comments | string | ||
speaker_comments | string | ||
syntax | string | max 1023 characters | |
semantics | string | max 1023 characters | |
status | string | either “tested” or “requires testing” | |
elicitation_method | integer | id of existing OLD elicitation method resource | |
syntactic_category | integer | id of existing OLD syntactic category resource | |
speaker | integer | id of existing OLD speaker resource | |
elicitor | integer | id of existing OLD user resource | |
verifier | integer | id of existing OLD user resource | |
source | integer | id of existing OLD source resource | |
tags | array | array of OLD tag resource ids | |
files | array | array of OLD file resource ids | |
date_elicited | string | date in MM/DD/YYYY format | |
id | integer | RDBMS-generated identifier | |
UUID | string | OLD-generated identifier | |
datetime_entered | string | date/time in ISO 8601 format | |
datetime_modified | string | date/time in ISO 8601 format | |
syntactic_category_string | string | sequence of syntactic categories corresponding to user-specified morphological analysis | |
morpheme_break_ids | array | cache of matches between morphemes and existing lexical forms | |
morpheme_gloss_ids | array | cache of matches between glosses and existing lexical forms | |
break_gloss_category | string | morpheme shape, gloss and category information zipped together | |
enterer | integer | id of OLD user who entered the form | |
modifier | integer | id of OLD user who last modified the form |
REST API
The OLD exposes a RESTful API with data communicated in JSON format.
Standard API
The table below summarizes the standard OLD API. The OLD provides access to many “resources”, such as (linguistic) forms, collections (i.e., texts), tags, users, speakers, etc. The table below shows how to retrieve, delete, create and update OLD resources. Replace “forms” with the name of another resource, e.g., “tags”, in order to manipulate that resource.
HTTP method | Path | Effect | Parameters |
---|---|---|---|
GET | /forms | Retrieve all forms | Optional pagination and ordering parameters |
GET | /forms/id | Retrieve form with id=id | |
GET | /forms/new | Get data needed to create a new form | Optional parameters controlling what is retrieved |
GET | /forms/id/edit | Get data needed to edit form with id=id | Optional parameters controlling what is retrieved |
DELETE | /forms/id | Delete for with id=id | |
POST | /forms | Create a new form | JSON object |
PUT | /forms/id | Update form with id=id | JSON object |
Search API
Certain OLD resources can be searched (cf. the “searchable” column in the table on the Resources page). As the table below indicates, there are two ways to request a search, one using the non-standard HTTP method SEARCH and the other using the POST method.
HTTP method | Path | Effect | Parameters |
---|---|---|---|
SEARCH | /forms | Search over forms | JSON object |
POST | /forms/search | Search over forms | JSON object |
Pagination
When retrieving or searching over an OLD resource, pagination and ordering parameters may be supplied. The pagination parameters are page and items_per_page. In a GET request, these are supplied in the URL's query string. Thus, a request to https://my-old-url.org/forms?page=99&items_per_page=10 would return forms 990 to 1000.
In a search request (using either the POST or SEARCH method), the pagination parameters are supplied in the JSON body of the request. The JSON body should be an object containing both query and paginator attributes, with the value of the paginator attribute being another object with its own page and items_per_page attributes. For example:
{ "query": {"filter": [], "order_by": []}, "paginator": { "page": 99, "items_per_page": 10 } }
Ordering
When retrieving or searching over an OLD resource, you can control the ordering of the resources returned. The ordering parameters are order_by_model (i.e., resource), order_by_attribute and order_by_direction. In a GET request, these are once again supplied in the URL's query string. Thus, a request to https://my-old-url.org/forms?order_by_attribute=transcription&order_by_model=form&order_by_direction=desc would return all of the forms in the database sorted by their transcriptions in descending order.
In a search request, the pagination parameters are supplied in the JSON body of the request: the query attribute is an object with an order_by attribute whose value is an array containing model, attribute and direction strings.
{ "query": { "filter": [], "order_by": ["Form", "transcription", "desc"]}, "paginator": {} }