In this digital age, technology is developing at lightning speeds, and thanks to the internet and the IOT, the line between facts and opinion have become significantly distorted. Sure, modern technologies help provide significant benefits and help increase productivity and research speed; however, it can still be difficult to not get lost in the abundance of information.
When it comes to linguistics and trying to siphon out the use of different words and phrases in various languages, the sheer amount of information can simply be too overwhelming. That's where translation tools come into play.
Translation tools that help to increase productivity and speed of a translator include:
· Search Engines
· CAT Tools
· QA Tools
· Speech Recognition Software
· Electronic Corpora etc.
One tool that is waxing stronger is Electronic corpora, and gradually, it is becoming very popular among language professionals.
What is a Text Corpus?
A text corpus
is a massive collection of texts of written or recorded
(which is then transcribed) language that is presented in an electronic form.
It is this collection of texts that are then examined by experts who later try to surmise how that particular language, the associated words, and phrases are typically used. Examples of popular text corpora include:
● American National Corpus
(22 million words)
● Corpus of Contemporary American English
(560 million words)
● British National Corpus
(100 million words) etc.
Search engines like Google, Bing and the likes are also linguistic corpora. While these means can be helpful, translators can quickly become overwhelmed by information that is ranked based on the search engine's algorithms and strategy. This information isn't always useful for translators as it is based on content that is popularly searched for by a much larger audience.
Electronic corpora help simplify this search process. Linguistic corpora are excellent tools to boost the translator's overall productivity and speed.
A monolingual corpus is one of the most commonly used translation tools, and it contains texts in only one language. This corpus is usually tagged for parts of speech and is generally used for the following:
· The correct usage of a word
· Looking up the most natural word combinations
· Scientific use
· Identifying frequent patterns or trends in language
A parallel corpus
consists of two monolingual corpora—where one corpus is the translation of the other. These corpora are generally used to study the nature of the translation process, and for comparing language, A with language B and prove to be a precious tool for solution of numerous translation problems in practice.
Corpora for Better Perception and Natural Use of Languages
Electronic corpora help in the quality of translation by allowing translators to do several things, such as:
· Sort language data by domain
· Search by the intended purpose
· Search by authors
· Filter content by genre
· Sort by time-frame and more
This allows translators to get to the information they are seeking quickly and efficiently. In fact, the current focus is turning towards smaller corpora of specialized text. These corpora can be compiled in as little as ten minutes thanks to free software such as BootCat
and Sketch Engine
. They are capable of bringing together a variety of materials to help linguists determine associations between words, identify best connotations, synonyms and understand grammatical structures.
Sound Like an Expert with Linguistic Corpora
Using corpora translators can be aware of the competence of the authors of the language content, immerse in real communicative situation, identify the frequency of the use of a word or term within a narrow discipline. Corpora make it possible to write like an expert taking as a basis the content written by experts themselves.
Corpora can serve as an excellent tool for assisting language professionals
and help them overcome the ever-growing speed and quality challenges of modern times by reducing the time spent for researching the materials of a particular domain and offering better solutions for terminology and readability issues.