The current state of ecological research in surveying vegetation in Australia lacks formalisation around concepts of things like cover, community, or even something obvious like height. It is not uncommon for ecologists to engage in hot debates on ecological terminology.

Preface: I am not an ecologist, but I work with ecologists and ecological data as a software developer. The following are simply toy examples for illustrative purposes.

Let's say you are reading a recipe online to make a sponge cake and the recipe instructs you to add in a "handful" of flour. You put it in the oven and you let it bake. When you take it out, the sponge cake immediately collapses. You re-visit the website hosting the recipe and you notice comments left by upset readers claiming the same collapsed cake as yours. You visit the author of the recipe's home page and you realise he is a giant man with hands likely twice the size of yours. This is a classic problem of misinterpreting a term's definition. The subjective definition of a handful of flour will never result in the same cake, leaving the recipe incompatible with many other people in the world. The same problem exists in ecology too.

An ecologist from Western Australia may have a very different definition of the word height when compared to an ecologist from Queensland. Without controlled vocabularies, person A may measure a tree's height from the base of the trunk to its tallest branch, whereas person B may only measure the length of the trunk.

Height of a tree ..?

The Terrestrial Ecosystem Research Network (TERN) aggregates many sources of vegetation survey datasets across Australia. Despite each projects' endeavours in collecting vegetation data for Australian research, the protocol in which each project collects the data are hardly ever the same. Researchers require well-defined definitions of things such as the collection method and the way the height of a tree was measured, to be able to use the data effectively in their research.

Currently the definitions for things are all hidden in manuals describing the project's protocols, usually in PDF form. The first step in creating well-defined controlled vocabularies is to use the SKOS model, a W3C recommendation based on the Semantic Web, and have the concepts be protocol-agnostic, reusable, and easily accessible by all.

The SKOS model enables data architects to capture the definition of concepts within domains such as ecology and allows the expression of hierarchical relationships between these concepts. The vision is to capture all the terminology within each project and form these hierarchical relationships with SKOS so that the concepts between different projects can interoperate.

Once ecological-specific concepts have been defined, they can then be distributed online as Linked Data. Linked Data is a set of technologies which provide data modeled in RDF accessible over the web. This technology will allow ecologists not just in Australia, but all over the world, reuse each others' vocabularies and create a more unified approach toward ecological research.