Apresentação em tema: "Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia."— Transcrição da apresentação:
Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia
The study of human translation Traditionally not a hard science Difficult to be systematic With the advances of corpus linguistics, things can change …
What is a corpus? large specific criteria text-retrieval software machine-readable naturally occurring texts
Advantages of using corpora to study human translation An enormous amount of translated texts Systematic analyses Quantifiable results
Corpora used in translation practice and research 1.Bilingual comparable corpora Farmhouse holidays (EN) & Agroturismo (IT) 2. Monolingual comparable corpora Translational English Corpus (EN) 3. Simple parallel corpora Tectra (EN-GL) 4. Bidirectional parallel corpora COMPARA (PT-EN and EN-PT)
Building parallel corpora text selection Genre (scientific, imaginative, technical, etc.) Mode (oral? written?) Variety (standard? regional?) Time (contemporary? older?) Languages (which? just two or more?) Translations (professional? native speakers? different translators? ) Simple or bidirectional? Are there translations?
Building parallel corpora example of interrelated factors PT-EN or EN-PT PT-EN ↔ EN-PT scientific academic tourism literature politics (EP) Languages: PT-EN Genre oral popular
Building parallel corpora Personal use Shared use copyright permissions results verifiable more users and uses copyright no hassle
Building parallel corpora copyright Two permissions, double the work Publishers, authors and translators generally don’t know what a corpus is Protect Advertise
Building parallel corpora alignment Text? Paragraph? Sentence? Clause? Word? Which parts of ST and TT match?
Building parallel corpora tags Alignment tags e.g. textual, grammatical, semantic What do we want tags for? More pre-processing, less post-processing Optional tags Joe watched Robin climb into the trailer and man-handle the calves one by one towards the ramp, their winglike ears pierced with plastic identity tags. Joe ficou a ver Robin subir para o atrelado e encaminhar as vitelas uma a uma para a rampa, com as suas orelhas, que faziam lembrar asas, furadas e umas etiquetas de plástico a identificá-las.
Our options for A bidirectional parallel corpus of English and Portuguese Funding Portuguese Government and European Union (FEDER and FSE) contract ref. POSC/339/1.3/C/NAC Project leaders Ana Frankenberg-Garcia & Diana Santos Research assistants Pedro Sousa, Rosário Silva & Susana Inácio
PT Source texts EN Source texts Corpus structure EN Translations PT Translations parallel bi-directional parallel
Genre Published fiction other genres EXTENSIBLE
Portuguese authors Portugal Camilo Castelo Branco Eça de Queirós José Cardoso Pires José Saramago Jorge de Sena Lídia Jorge Mário de Carvalho Sá Carneiro Brazil Aluísio Azevedo Autran Dourado Chico Buarque Jô Soares José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca Mozambique Mia Couto Angola José Eduardo Agualusa
English authors British Isles David Lodge Ian McEwan Julian Barnes Joseph Conrad Joanna Trollope Kazuo Ishiguro Lewis Carrol Mary Shelley Oscar Wilde United States Henry James Edgar Allan Poe Richard Zimler South Africa Nadine Gordimer
Portuguese translators Ana Maria Amador, Ana Falcão Bastos, Ana Luísa Faria, Aníbal Fernandes, Carlos Grifo Babo, Cristina Ferreira de Almeida, Cristina Rodriguez, Eduardo Guerra Carneiro, Fernanda Pinto Rodrigues, Geraldo Galvão Ferraz, Helena Cardoso, Januário Leite, José Viera Lima, J. Teixeira de Aguilar, Lídia Cavalcante- Luther, Lucinda Santos Silva, Luís Lobo, Manuel João Gomes, M. F. Gonçalves de Azevedo, Maria Carlota Pracana, Maria do Carmo Figueira, Mário Martins de Carvalho, Nina Videira, Paula Reis, Yolanda Artiaga.
English translators Adria Frizzi, Alan Clarke, Alexis Levitin, Alice Clemente, Cliff Landers, David Brookshaw, David Rosenthal, Elizabeth Lowe, Ellen Watson, Helen Caldwell, Giovanni Pontiero, Graeme Mac Nicoll, Gregory Rabassa, Isabel Burton, John Gledson, John Parker, John Byrne, John Vetch, Margaret Jull Costa, Mary Fitton, Natália Costa, Peter Bush, Richard Zenith, Ronald W. Sousa.
Can any text be included in the corpus? Only published source texts and translations Only English translated directly from Portuguese Portuguese translated directly from English Only human translations!
Size 1,549,551 1,436,493 words in in English Portuguese Possibly the largest existing edited parallel corpus
Interface Free Easy to use by people who have never heard of corpora before Powerful and flexible tool for experienced corpus users Results good for research and education www.linguateca.pt/COMPARA/
Distribution of “nodded” in source texts and translations
Users and uses Language learners and anyone working with PT-EN bilingual dictionary with examples Language teachers exercises and tests Translators language equivalents Translation lecturers exercises & problems Translation theorists test translation hypotheses Lexicographers bilingual dictionaries Computational linguists and language engineers machine translation and other applications
Text tags EBJB1.pt ele revelou-me o seu interesse por Gosse Edmund William Gosse (1849- 1928), crítico inglês e pela sociedade literária inglesa dos finais do século passado. EBDL2T1.en When we sat on the sofa together to watch News at Ten
EBDL1T1.pt passou-me uma receita de Valium EBJB1.en the white bear, thalassarctos maritimus, is the aristocrat of bears... EBDL1T1.pt acaba por se esquecer de ter medo, até que acaba por verificar que não há de que ter medo. Text tags
I did, too --changed over to the knitted tie at a red light. People interested in creating specific tags for their research can do so, as long as they do the tag insertion and revision work Specific tag revision interface underway (Sousa, in preparation) e.g. semantic tag for colour (Inácio et al. 2007) Other tags
1.Observing source texts and translations 2.Constrasting Portuguese and English 3.Comparing translated and untranslated language 4.Examining the characteristics of translated texts Research work Studies unthinkable before corpora Many other studies possible! www.linguateca.pt/COMPARA/ComparaPublications.html
Your consent to our cookies if you continue to use this website.