Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 03-02-2016 1 Using subtitles to deal with.

Slides:



Advertisements
Apresentações semelhantes
Presenter’s Notes Some Background on the Barber Paradox
Advertisements

Socrates Grundtvig 2 Learning Partnership
"I had very good stay at the UAlg, all my teachers were friendly and willing to help. I wish I could stay for the next semester but unfortunately I have.
Avaliação Constituição dos grupos de trabalho:
Wh- Questions e Short Answers
Bee Gees Staying Alive Well, you can tell by the way I use my walk, I'm a woman's man, no time to talk. Bem, você pode dizer pelo meu jeito de caminhar.
SIMPLE PRESENT O Simple Present é o equivalente, na língua inglesa, ao Presente do Indicativo, na língua portuguesa. O Simple Present é usado para indicar.
What can I say There´s an empty place Where your Love filled my life And I know That a part of you will always Be a part of me O que eu posso dizer.
10/ Daily Goal Sheet & Daily Checklist for Teachers Folha do Objetivo Diário & Checklist diário para os Professores By Por David Batty PSNC #7.
PROJETO COMENIUS: ESPAÇO DE PARTILHA! CURSO Projeto Comenius DATA SET.2013 AUTOR/A Luzia Silva | Professora A comunidade escolar encontra-se a trabalhar.
Portuguese lesson.
suas Tecnologias - Inglês Possessive Adjectives/Possessive Pronouns
RELATIVE PRONOUNS O pronome relativo é utilizado para retomar uma pessoa ou um objeto que foi falado anteriormente e funciona como conector do sujeito.
Aluna: Flavia Roberta Alves Pinto
AdsRcatyb It's sad to think we're not gonna make it É triste pensar que não vamos conseguir It's sad to think we're not gonna make it É triste pensar.
Personal Pronouns.
DISCURSO DIRETO E INDIRETO
By Búzios Slides Sincronizado com a Música Fifteen Years Ago Conway Twitty Poeira & Country.
Present Continuous.
Do you know who I am? (Sabes quem eu sou?) Clique para avançar.
fábrica de software conceitos, idéias e ilusões
Seize The Day Aproveite o dia
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Segurança em Redes Móveis /35 Mersenne.
Do you know who I am? (Você sabe quem eu sou?) Clique para avançar.
ESTRUTURA QUESTIONS.
The Passive Voice By Laécio Miranda.
FELIZ DIA DO AMIGO!!! PS: Abaixo de cada texto tem a tradução para o inglês assim posso enviar a TODOS os meu amigos de uma só vez! I edited the real text,
Heaven Bryan Adams Oh - thinkin about all our younger years Pensando nos nossos tempos de juventude There was only you and me Só existia eu e você.
Simple Present x Present Continuous
At school... Conhecendo os colegas (relembrando).
YESTERDAY, YESTERDAY, Ontem, I KNEW THE GAMES TO PLAY Eu sabia como brincar I THOUGH I KNEW THE WAY Eu pensava que sabia LIFE WAS MEANT TO BE Como a vida.
Plano de Neg ó cio e Capital de Risco ASIT e-Business 3 ª PARTE: Exemplos Fernando Machado 25-jun-02.
Billy Paul Without You No I can't forget this evening Não, eu não posso esquecer esta tarde Or your face as you were leaving Ou o seu rosto quando você.
Rick Astley Cry for help She’s taken my time. Ela tem tomado o meu tempo Convince me she’s fine. Me convencendo de que está bem But when she leaves.
Professor Alexandre Aula 5
Pontifícia Universidade Católica do Rio Grande do Sul Departamento de Engenharia Elétrica Fernando Soares dos Reis Didactic Platform for Power Electronics.
O que são os alertas do Google?
Used to.
Present Perfect Tense Formação e Usos.
Condicional Clauses If Clauses Condicionais
Redes Sociais Online ISCTE – Instituto Universitário de Lisboa MCCTI Mónica Oliveira 13 de Março de 2013.
PSTDP Week 20 Thursday Reading/Writing.
Communicate - Trade - Culture William Barron Mobile/Cell –
QUESTION TAGS or TAG QUESTIONS
By Búzios Slides Sincronizado com a Música All For Love Bryan Adams & Rod Stewart.
Limit Equlibrium Method. Limit Equilibrium Method Failure mechanisms are often complex and cannot be modelled by single wedges with plane surfaces. Analysis.
-A partir do 2º Slide a passagem é automática!
English II Week August 24, nd semester Lecture 25.
Na linha Tradução Rolagem automática On the Line Michael Jackson.
Adjectives of quantity: some / any
INDEFINITE PRONOUNS Os pronomes indefinidos são invariáveis e têm empregos específicos de acordo com a forma (afirmativa, interrogativa e negativa). 
Verbo to be. Outros verbos.. Verb to be Simple present Am Is Are.
AULA DE VOCABULÁRIO E LEITURA Texto musical :Lay You Down Easy(Magic!)
MELHOR VISUALIZADO NO POWER POINT 2000 There comes a time, when we hear a certain call Haverá um tempo em que ouviremos um chamado When the world must.
Where do I begin Por onde eu começo To tell a story of how great a love can be, A contar a história de quão grande um amor pode ser? The sweet love.
Abril 2016 Gabriel Mormilho Faculdade de Economia, Administração e Contabilidade da Universidade de São Paulo Departamento de Administração EAD5853 Análise.
Pesquisa Operacional aplicada à Gestão de Produção e Logística Prof. Eng. Junior Buzatto Case 3.
Learning english with comics …………….. Aprendendo inglês com quadrinhos.
Name :Lara alvadia class:6c Number:10
Sunday School Adolescents Theme: Evangelism.
Sunday Bible School Theme: Evangelism Children & Intermediates
Grammar Reference: Modal Verbs
Unit l Verb to be.
Verbs followed by infinitive and gerund- page 24.
Simple Present Tense. . In English the Simple Present is used to express actions that are made with a certain frequency, like go to school, work, study…
Introduction to density estimation Modelação EcoLÓGICA
Pesquisadores envolvidos Recomenda-se Arial 20 ou Times New Roman 21.
FORMAS VERBAIS II (TEMPOS PROGRESSIVOS, PERFEITOS)
Transcrição da apresentação:

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Using subtitles to deal with Out-of-Domain interactions Daniel Magarreiro, Luísa Coheur, Francisco S. Melo INESC-ID / Instituto Superior Técnico, Lisbon, Portugal

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation Users often insist in confronting domain-specialized virtual assistants with Out Of Domain (OOD) inputs.

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation Considering that: –people become more engaged with these applications if OOD requests are addressed (Bickmore and Cassell, 2000; Patel et al., 2006) –system designers are not able to successfully anticipate all the possible OOD requests

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation A possible approach: –explore the (semi-)automatic creation/enrichment of the knowledge base of virtual assistants/chatbots, taking advantage of the vast amount of dialogues available at the web.

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation We will focus on movie subtitles (for now)

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation Movie Subtitles –the web offers a vast number of repositories with a comprehensive archive of subtitle files this will allows data redundancy example: –How are you? Fine –So, how are you? Fine –How are you? Fine –How are you? I’m dying –subtitles are often available in multiple languages

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Motivation Our approach –Build a corpus of interactions from the subtitles the Subtle corpus –Test a set of techniques to select an adequate response (from Subtle) to a user request Deployed in the Say Something Smart engine –Evaluate the plausibility of the selected answers

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Building the Subtle Corpus The Subtle corpus will be a set of interactions –Like Edgar’s knowledge base Each interaction is a pair of turns (T, A): –T is the trigger –A is an answer (to the trigger) Example: –(T: So how old are you?, A: That’s none of your business)

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Building the Subtle Corpus Problem: –Extracting interactions from subtitles files –Example: :01:05,537 --> 01:01:08,905 And makes an offer so ridiculous, :01:09,082 --> 01:01:11,881 the farmer is forced to say yes :01:12,752 --> 01:01:15,494 We gonna offer to buy Candyland?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Building the Subtle Corpus Starting point: –2Gb of subtitles in Portuguese and English from OpenSubtitles Building Subtle: –Cleaning data Example: [TIRES SCREECHING] –Finding real turns Based on handcrafted rules (previous example) The user can configure the maximum time allowed between two slots for them to be considered part of a dialogue

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Building the Subtle Corpus SubId DialogId - 1 Diff T - What a son! A - How about my mother? SubId DialogId - 2 Diff - 80 T - How about my mother? A - Tell me, did my mother fight you? SubId DialogId - 3 Diff T - Tell me, did my mother fight you? A - Did she fight me?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Building the Subtle Corpus English # Movies# Movies Ok# Interactions# Average 5,7645, 6655, 693, 811 1, 005 Portuguese # Movies# Movies Ok# Interactions# Average 3, 7013, 5983, 322,

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine The Say Something Smart Engine (SSS) will use the Subtle corpus to get an answer to a given user request. Say Something Smart User: Where do you live? SSS: Anywhere I feel like! Sublte: (T10: What was your mother’s name?, A10: The mother’s name isn’t important.) (T121: Where do you live? A121: Beaver Creek, off the Route 10.)

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine Problem: –As we will compute the distance between the given user request and the interactions from the Subtle corpus we need to limit the number of interactions.

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine SSS main steps: –Corpora Indexing –Candidate extraction –Choosing the answer

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine SSS main steps: –Corpora indexing –Candidate extraction Tokenizers, stemmers, and stop-word filters –the default ones for English –snowball analyzer for the Portuguese language The number of retrieved interactions can be configured –Choosing the answer

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine SSS main steps: –Corpora indexing –Candidate extraction Tokenizers, stemmers, and stop-word filters –the default ones for English –snowball analyzer for the Portuguese language The number of retrieved interactions can be configured –Choosing the answer

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine (T4: You don’t have to go brother., A4: I’m not your brother.) (T5: You have a brother?, A5: Yeah, I’ve got a brother, man. You know that.) (T6: Joe doesn’t have a brother?, A6: No brother.) (T7: Brother, do you have tooth paste?, A7: What brother?) (T8: Have you seen my brother?, A8: He’s not your brother anymore.) Do you have a brother?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 22 The Say Something Smart engine Being given: –A user request u –The set of interactions, U, retrieved by Lucene For each (T i, A i ) in U: Where w j is the weight assigned to measure M j Measures M 1, M 2 and M 3 are based on Jaccard similarity: J(A, B) = |A ∩ B| / |A U B|

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine M 1 : Jaccard similarity between user request and trigger (T9: How nice. What’s your mother’s name?, …) (T10: What was your mother’s name?, A10: The mother’s name isn’t important.) (T11: What’s your name?, …) (T12: What’s the name your mother and father gave you?, …) (T13: Your mother? how dare you to call my mother’s name?, …) u: What’s your mother’s name?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine M 2 : a higher score is given to the most “frequent” answer (Jaccard) (T14: Where do you live?, A14: Right here.) (T15: Where are you living?, A15: Right here.) (T16: Where do you live?, A16: New York City.) (T17: Where do you live?, A17: Dune Road. ) u: How are you?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine M 3 : Jaccard similarity between the user request and the answer (T9: How nice. What’s your mother’s name?, A9: Vickie.) (T10: What was your mother’s name?, A10: The mother’s name isn’t important.) u: What’s your mother’s name? ?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed The Say Something Smart engine M 4 : Time difference between trigger and answer (T: You're a joke! You're a joke! A: Linda Kasabian gives birth to a son. She names the child Angel.) u: Are you joking?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 27 Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 28 Evaluation Evaluation Setup –Filipe, online since September 2013 –103, user requests 20 were randomly selected

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 29 Evaluation Experiment 1: Are subtitles adequate? –Three human annotators –First 25 interactions returned by Lucene to the 20 requests –Question: is there at least one plausible answer in the 25 candidates?

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Evaluation Experiment 1: Are subtitles adequate? Results –Evaluator 1: “What country do you live?” not ok; –Evaluator 3 consider “it depends” as a plausible answer –Evaluator 2: “What country do you live?” not ok; “Are you a loser?” not ok; –Evaluators 2 and 3 considered that “So what? You want to hit me?”, or “Shut up.” were plausible answers –Evaluator 3: “Where is the capital of Japan?” not ok; –Evaluators 1 and 2 considered that “58% don’t know” was a plausible answer

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Evaluation Experiment 1: Are subtitles adequate? The three annotators agreed that 17 out of 20 turns had a plausible answer

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Evaluation Experiment 2: Answer selection –Settings (S1,...,S5) : S1 – Only takes into account M1. S2 – Only takes into account M2. S3 – Takes into account M1 and M2. S4 – Takes into account M1, M2 and M3. S5 – Takes into account all four measures. –Weights: S1−4: the same weight was given to the measures. S5: –40% weight for M1 –30% weight for M2 –20% weight for M3 –10% weight for M4.

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Evaluation Experiment 2: Answer selection –21 people evaluated the returned response, given the 20 requests

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Evaluation Experiment 2: Answer selection –Results S4 – Takes into account M1, M2 and M3. S1S2S3S4S5 39,29%45,24%46,90%61,67%51,19%

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 35 Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Meet Filipe (or “Filaipe”) Título da apresentação 36

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Index Introduction Building the Subtle Corpus The Say Something Smart Engine –Corpora Indexing and candidate extraction –Choosing the answer Evaluation Meet Filipe Conclusions and Future Work

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Conclusions and Future Work We have built the Subtle corpus (PT and EN) Tested several techniques to extract a plausible answer in Say Something Smart engine Still much room for improvement –Organizing data Detecting paraphrases … –Text processing Synonyms Named entities –Combining the measures –Adding other corpus –Tanking context into consideration –…

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Título da apresentação 39 technology from seed