Rhodri Tomos did a short piece on apertium-cy that appeared on BBC Cymru's news programme on 15 May (the related webpage is here). A transcription and translation, along with apertium-cy's attempted translation, are available here.
Get the Flash Player to see this player.
Apertium was successful in the Google Summer of Code 2009, and one of the projects, being done by Gabriel Synnaeve from Grenoble in France, will look at improving the translation output by combining apertium-cy's output with that of other open-source translation engines. The details are in this 12 May press release.
Fran Tyers and I have had a paper on apertium-cy published in the Prague Bulletin of Mathematical Linguistics. In it we evaluate the quality of the translated output using standard metrics, and draw some conclusions about the benefits of open-source development for marginalised languages.
Geriaoueg (Breton for "vocabulary") is a new language tool developed by Fran Tyers and based on Apertium language resources. Unknown words in the language you are reading pop up in your language in a tooltip. At the moment, Breton-French and Welsh-English are supported, but this could be extended to any other language in the Apertium range.
apertium-cy is a free (GPL) Welsh-to-English translator (an English-to-Welsh translator is also being developed). "Free (GPL)" means that it is licenced under the Free Software Foundation's General Public License, so not only is it free of charge, but you also have the freedom to study it, modify it and share it without breaking the law. Free (GPL) software is now being used all over the world by governments, public sector bodies, companies and individuals, because of the many benefits it offers.
apertium-cy is part of a larger machine translation project called Apertium, producing software to convert text in one language into text in a different language. Apertium was developed by Mikel Forcada's Transducens research group at the University of Alacant and Prompsit Language Engineering in the Region of Valencia in Spain. So far, the multinational Apertium team has released automatic translators for 14 other language pairs (Catalan-English, Spanish-French, etc), and is working on a dozen others. apertium-cy is the first Apertium translator to be released that does not include a Romance language such as Catalan or Spanish. The Apertium software can be downloaded from the Apertium site, which also contains a wiki giving information on installation, etc.
apertium-cy was developed over the past 9 months by Francis Tyers and Kevin Donnelly. Francis, the lead developer, is part of the Transducens research group, and Kevin has been working independently on free (GPL) Welsh-language software for the past 5 years. apertium-cy builds on two of his projects, Eurfa (a Welsh dictionary) and Klebran (a grammar-checker based on Kevin Scannell's Irish grammar-checker, Gramadóir). apertium-cy currently contains around 10,000 words, and about 150 grammatical rules.
An important benefit of using Apertium is that the work done on other language pairs can be re-used to give us a headstart when we come to build other translators for Welsh - see this paper (in Spanish) for more details. For instance, the Spanish translators could be used to help create a Welsh-Spanish translator. If there are any Welsh-speakers in Patagonia who long for a Welsh-Spanish translator, we'd be glad to hear from them!
Don't expect this initial version of apertium-cy to produce perfect translations! On the test page there are 21 sample passages to try. These short passages cover poetry, official statements, novels, newspaper articles and non-fiction, and have not been edited except for punctuation. They give a good indication of apertium-cy's current strengths and weaknesses. Alternatively, you can type in your own passages, but note the limitations listed on the test page.
apertium-cy is being continuously improved, and over the next few months we hope to refine the grammatical rules (particularly for subordinate clauses), expand the dictionaries, and release an initial version of a similar English-to-Welsh translator. One of the key tenets of free (GPL) software development is that you should release software as soon as it works, so that you can take advantage of user feedback to improve the software.
apertium-cy is currently good enough for you to get the gist of a passage (provided there are not too many unknown words in it), so it may be useful for:
In the longer term, and especially when the English-to-Welsh translator is available, apertium-cy could be used by public sector bodies, companies, voluntary groups, etc to provide a "first-pass" translation of publicity material, documents, etc, thus improving the productivity of human translators.
You can give us feedback on our progress so far. You can help test new versions. You can add words to the dictionary, and help to develop new rules to improve the grammar conversion (this will be especially important for the English-to-Welsh translator).
If you are a public body or company that produces bilingual text, you can help by giving us digital copies of this. We don't want to republish it, we just want to store the sentences from it in a database which will help us generate tags and rules.
You can also help by asking your Assembly Member to help ensure that resources that receive public funds are made available under a free licence. The sad thing is that the development of these translators could proceed much more quickly if we didn't have to create freely-distributable word lists for them from scratch. Public money has gone into compiling Welsh dictionaries and lists of terms, and yet apertium-cy and similar projects cannot use these because they are not available under terms which allow them to be freely redistributed. Every minute we spend adding words is a minute we can't spend writing software. It would be a tremendous help to our work if the Welsh Language Board could look again at this issue - after all, if developers in Spain and the USA are prepared to spend time on Welsh, we should surely give them all the help we can!
If you want to see how apertium-cy produces the translation, a viewer is now available here. This allows you to see the results of the various processing stages.