A apresentação está carregando. Por favor, espere

A apresentação está carregando. Por favor, espere

Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Apresentações semelhantes


Apresentação em tema: "Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia."— Transcrição da apresentação:

1 Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia

2 The study of human translation  Traditionally not a hard science  Difficult to be systematic With the advances of corpus linguistics, things can change …

3 What is a corpus? large specific criteria text-retrieval software machine-readable naturally occurring texts

4 Advantages of using corpora to study human translation  An enormous amount of translated texts  Systematic analyses  Quantifiable results

5 Corpora used in translation practice and research 1.Bilingual comparable corpora Farmhouse holidays (EN) & Agroturismo (IT) 2. Monolingual comparable corpora Translational English Corpus (EN) 3. Simple parallel corpora Tectra (EN-GL) 4. Bidirectional parallel corpora COMPARA (PT-EN and EN-PT)

6 Building parallel corpora text selection Genre (scientific, imaginative, technical, etc.) Mode (oral? written?) Variety (standard? regional?) Time (contemporary? older?) Languages (which? just two or more?) Translations (professional? native speakers? different translators? ) Simple or bidirectional? Are there translations?

7 Building parallel corpora example of interrelated factors PT-EN or EN-PT PT-EN ↔ EN-PT scientific academic tourism literature politics (EP) Languages: PT-EN Genre oral popular

8 Building parallel corpora Personal use Shared use copyright permissions results verifiable more users and uses copyright no hassle

9 Building parallel corpora copyright Two permissions, double the work Publishers, authors and translators generally don’t know what a corpus is Protect Advertise

10 Building parallel corpora alignment Text? Paragraph? Sentence? Clause? Word? Which parts of ST and TT match?

11 Building parallel corpora tags Alignment tags e.g. textual, grammatical, semantic What do we want tags for? More pre-processing, less post-processing Optional tags Joe watched Robin climb into the trailer and man-handle the calves one by one towards the ramp, their winglike ears pierced with plastic identity tags. Joe ficou a ver Robin subir para o atrelado e encaminhar as vitelas uma a uma para a rampa, com as suas orelhas, que faziam lembrar asas, furadas e umas etiquetas de plástico a identificá-las.

12 Our options for A bidirectional parallel corpus of English and Portuguese Funding Portuguese Government and European Union (FEDER and FSE) contract ref. POSC/339/1.3/C/NAC Project leaders Ana Frankenberg-Garcia & Diana Santos Research assistants Pedro Sousa, Rosário Silva & Susana Inácio

13 PT Source texts EN Source texts Corpus structure EN Translations PT Translations parallel bi-directional parallel

14 PT EN PT 1 PT 2 EN 1 EN 2 ST TT

15 Language varieties Portugal Brazil Angola Mozambique UK US South Africa PORTUGUESE ENGLISH Unbalanced distribution!

16 Publication dates

17 Genre Published fiction other genres EXTENSIBLE

18 Portuguese authors Portugal Camilo Castelo Branco Eça de Queirós José Cardoso Pires José Saramago Jorge de Sena Lídia Jorge Mário de Carvalho Sá Carneiro Brazil Aluísio Azevedo Autran Dourado Chico Buarque Jô Soares José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca Mozambique Mia Couto Angola José Eduardo Agualusa

19 English authors British Isles David Lodge Ian McEwan Julian Barnes Joseph Conrad Joanna Trollope Kazuo Ishiguro Lewis Carrol Mary Shelley Oscar Wilde United States Henry James Edgar Allan Poe Richard Zimler South Africa Nadine Gordimer

20 Portuguese translators Ana Maria Amador, Ana Falcão Bastos, Ana Luísa Faria, Aníbal Fernandes, Carlos Grifo Babo, Cristina Ferreira de Almeida, Cristina Rodriguez, Eduardo Guerra Carneiro, Fernanda Pinto Rodrigues, Geraldo Galvão Ferraz, Helena Cardoso, Januário Leite, José Viera Lima, J. Teixeira de Aguilar, Lídia Cavalcante- Luther, Lucinda Santos Silva, Luís Lobo, Manuel João Gomes, M. F. Gonçalves de Azevedo, Maria Carlota Pracana, Maria do Carmo Figueira, Mário Martins de Carvalho, Nina Videira, Paula Reis, Yolanda Artiaga.

21 English translators Adria Frizzi, Alan Clarke, Alexis Levitin, Alice Clemente, Cliff Landers, David Brookshaw, David Rosenthal, Elizabeth Lowe, Ellen Watson, Helen Caldwell, Giovanni Pontiero, Graeme Mac Nicoll, Gregory Rabassa, Isabel Burton, John Gledson, John Parker, John Byrne, John Vetch, Margaret Jull Costa, Mary Fitton, Natália Costa, Peter Bush, Richard Zenith, Ronald W. Sousa.

22 Can any text be included in the corpus?  Only published source texts and translations  Only English translated directly from Portuguese Portuguese translated directly from English  Only human translations!

23 72 source texts (extracts) 75 translations Texts

24 Size 1,549,551 1,436,493 words in in English Portuguese Possibly the largest existing edited parallel corpus

25 Interface Free Easy to use by people who have never heard of corpora before Powerful and flexible tool for experienced corpus users Results good for research and education

26

27 “nodded”

28

29

30 Distribution of “nodded” in source texts and translations

31 Users and uses Language learners and anyone working with PT-EN bilingual dictionary with examples Language teachers exercises and tests Translators language equivalents Translation lecturers exercises & problems Translation theorists test translation hypotheses Lexicographers bilingual dictionaries Computational linguists and language engineers machine translation and other applications

32 Backstage options

33 Text tags EBJB1.pt ele revelou-me o seu interesse por Gosse Edmund William Gosse ( ), crítico inglês e pela sociedade literária inglesa dos finais do século passado. EBDL2T1.en When we sat on the sofa together to watch News at Ten

34 EBDL1T1.pt passou-me uma receita de Valium EBJB1.en the white bear, thalassarctos maritimus, is the aristocrat of bears... EBDL1T1.pt acaba por se esquecer de ter medo, até que acaba por verificar que não há de que ter medo. Text tags

35

36

37 1 alignment unit = 1 source-text sentence S S S S S2 SS(+S) S S½ Ø ST TT Alignment options and tags

38

39

40 Portuguese: PALAVRAS Petrus/PROP pediu/V_fmc a/DETartd especialidade/N da/PRP+DETartd casa/N --/PU uma/DETarti paella/N valenciana/ADJ --/PU que/SPECrel comemos/V em/PRP silêncio/N,/PU acompanhados/V apenas/ADV do/PRP+DETartd saboroso/ADJ vinho/N Rioja/PROP./PU Grammar tags

41 [pos="V.*"] "silêncio"

42

43 English: CLAWS (coming soon) Petrus/NP1 asked/VVD for/IF the/AT specialty/NN1 of/IO the/AT house/NN1 --a/AT1 Valencia/NP1 paella/NN1 -- which/DDQ we/PPIS2 ate/VVD in/II silence/NN1./. Grammar tags

44 I did, too --changed over to the knitted tie at a red light. People interested in creating specific tags for their research can do so, as long as they do the tag insertion and revision work Specific tag revision interface underway (Sousa, in preparation) e.g. semantic tag for colour (Inácio et al. 2007) Other tags

45

46 1.Observing source texts and translations 2.Constrasting Portuguese and English 3.Comparing translated and untranslated language 4.Examining the characteristics of translated texts Research work Studies unthinkable before corpora Many other studies possible!


Carregar ppt "Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia."

Apresentações semelhantes


Anúncios Google