A apresentação está carregando. Por favor, espere

A apresentação está carregando. Por favor, espere

Felipe Rodrigues da Silva Embrapa Recursos Genéticos e Biotecnologia

Apresentações semelhantes


Apresentação em tema: "Felipe Rodrigues da Silva Embrapa Recursos Genéticos e Biotecnologia"— Transcrição da apresentação:

1 Felipe Rodrigues da Silva Embrapa Recursos Genéticos e Biotecnologia
SUCEST: o projeto genoma da cana-de-açúcar. Felipe Rodrigues da Silva Embrapa Recursos Genéticos e Biotecnologia

2 Volume de dados disponíveis publicamente

3 Volume de dados disponíveis publicamente

4 Genomas completos de organismos

5

6

7 SOPÃO de letrinhas... A T G C

8 Cana-de-açúcar Cultivada em mais de 90 países
Ocupando cerca de 20 milhões de hectares Família das Gramíneas (Poace) Sugarcane is an important industrial crop of tropical and subtropical regions cultivated in more than 90 countries over close to 20 millions hectares (FAO - It belongs to the grass family (Poaceae), an economically important seed plant family that includes cereals like maize, wheat, rice and sorghum as well as many forage crops.

9 A cana-de-açúcar no Brasil
25% da produção mundial 300 milhões de tons. 5 milhões de hectares plantados 14.5 milhões de tons. de açúcar 15.3 bilhões de litros de álcool 350 industrias 50 mil produtores 1.4 milhões de empregos direto 3.6 milhões de empregos indiretos

10 S. berberi, S. sinence, S. robustum
Origem e tamanho Saccharum officinarum 2n = 80 X Saccharum spontaneum 2n = 64 ou 2n = 112 10 – 25% S. berberi, S. sinence, S. robustum 2C = Mbp 2n = D'Hont, A. and Glaszmann, J. C Proc Int Soc Sugarcane Technol 24: conjunto não-reduntante = 930 Mbp Sorgo = 760 Mbp Arroz = 430 Mbp The non-replicated genome size of a somatic sugarcane cell (2C) is estimated to be 7,440 mega base-pairs (Mbp) in S. officinarum. Since S. officinarum is octaploid, the size of a complete nonredundant chromosome set should be eight-fold smaller, around 930-Mbp (D'Hont and Glaszmann 2001) This value is comparable to that of sorghum (~ 760-Mbp) and about twice that of rice (~ 430-Mbp) D'Hont, A. and Glaszmann, J. C Sugarcane genome analysis with molecular markers, a first decade of research. Proc Int Soc Sugarcane Technol 24:

11 Projeto Genoma Estrutural Funcional Seqüenciamento Completo do Genoma
Região Gênica e Região Intergênica EST – Expressed Sequence Tag Regiões que codificam proteínas (Genes)

12 Seqüenciamento Completo
Biblioteca de BACs Mapa físico Genomic DNA BAC a ser seqüenciado Clones Shotgun ...ATGTTGGGCCACAGTTGACCATTGAAACTG Seqüência GTTGACCATTGAAACTGACCTTGACGTAACGTGGTA.... ...ATGTTGGGCCACAGTTGACCATTGAAACTGACCTTGACGTAACGTGGTA... Montagem

13 EST – Expressed Sequence Tag
Dogma Central da Biologia ACCTGATGGCATTTCCATCAAGCTGACCTGGAAATCGTTGGCC DNA gene B gene A cDNA mRNA Proteína NH2 COOH inserção em vetor Clonagem em E. coli Seqüenciamento

14

15 GenBank - dbEST Março de 1998
Total de Entradas ,528,715 Homo sapiens ,015 (63,4%) Plantas (total) ,087 (4.8%) Mus musculus + domesticus (camundongo) 306,544 Caenorhabditis elegans ,521 Arabidopsis thaliana ,173 Drosophila melanogaster ,625 Oryza sativa (arroz) ,844 Rattus sp. (rato) ,311 Brugia malayi (nematoide parasita) ,641 Toxoplasma gondii ,671 Emericella nidulans ,787 Schistosoma mansoni ,659 Trypanosoma brucei rhodesiense ,519 Danio rerio (zebrafish) ,373 Saccharomyces cerevisiae ,042 Zea mays (milho) ,783 Leishmania major ,692 Saccharum sp Outros ~ 20,000

16 Os Objetivos do projeto SUCEST
Identificar genes únicos (ou seqüenciar ESTs) Desenvolver um Banco de Dados para a cana-de-açúcar Disponibiilizar este Banco de Dados para grupos de Data Mining Análise funcional dos ESTs

17 O Cronograma Data Meta Jul/1999 Distribuição dos Primeiros Clones
Dec/ ,000 ESTs Jul/ ,000 ESTs Dec/ ,000 ESTs Jul/ ,000 ESTs Dec/ ,000 ESTs Jul/ ,000 ESTs Dec/ ,000 ESTs Jul/ ,000 ESTs

18 As Bibliotecas de cDNA Tecidos / Órgãos Variedades Raiz Meristema
Caule Sementes Flores Cartucho da Folha Zona de Transição Folha-Raiz Gema Lateral Calli Plântulas imaturas Plântulas infectadas com Herbaspirillum rubrisubalbicans Plântulas infectadas com Gluconacetobacter diazotroficans Variedades SP SP SP RB RB PB5211 X P

19 Os Laboratórios de Seqüenciamento
UNIVAP (SJ) (1) UNESP (JB) (2) UFSCAR (AR) ESALQ (PI) (3) USP (RP) (1) IAC (CA) (1) UFSCAR (SC) (1) BIOINFORMATICA UNICAMP (CA) UNICAMP (CA) (1) UMC (MC) (1) USP (SP) (3) IAC (CO) (1) UNESP (BT) (2) UNESP (RC) (1) UNAERP (RP) (1) USP (SC) (1) ABI PERNAMBUCO ALAGOAS RIO DE JANEIRO

20 EST – Expressed Sequence Tag
Dogma Central da Biologia ACCTGATGGCATTTCCATCAAGCTGACCTGGAAATCGTTGGCC DNA gene B gene A cDNA mRNA Proteína NH2 COOH inserção em vetor Library Status Date Qtd.Plates Qtd.Reads LR1 START 6/6/99 15: CL1 TEST 21/7/99 16: CL2 TEST 21/7/99 16: LR2 START 2/9/99 15: RZ1 START 2/9/99 15: CL4 START 14/10/99 16: CL3 START 18/10/99 13: RZ2 START 22/10/99 10: LB1 START 19/11/99 10: RT1 START 8/12/99 11: RT2 START 23/12/99 12: CL5 TEST 27/1/00 11: ST2 TEST 27/1/00 11: ST1 START 4/2/00 14: CL6 START 4/2/00 14: CL7 TEST 1 4/2/00 17: AM1 START 23/2/00 15: HR1 START 29/2/00 15: AM2 START 16/3/00 12: RZ3 START 21/3/00 11: ST3 START 5/4/00 12: AD1 START 3/5/00 17: SB1 START 3/5/00 17: FL1 START 16/5/00 10: FL4 START 24/5/00 12: FL3 START 8/6/00 17: FL5 START 14/6/00 10: RT3 START 28/7/00 10: LB2 START 28/7/00 10: SD1 START 21/8/00 11: SD2 START 21/8/00 12: FL8 START 30/8/00 18: LV1 START 30/8/00 18: FL6 START 31/8/00 15: NR1 START 29/9/00 14: NR2 START 29/9/00 14: FL2 START 3/1/01 12: clones 384 placas Clonagem em E. coli Seqüenciamento clones reads clones

21 Limpeza das seqüências
remoção de seqüências ribossômicas remoção de seqüências de vetor remoção da região de poliA corte por qualidade eliminação das derrapagens

22 poliA AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAATTCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATTTACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACCCATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCAGATGTCTTTGATGAACCCTGCCTTGTCGTGGTA GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACTTGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCAATACATTGATTGAGATTTANAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAAAAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAGGGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATACATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACCAATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCCCCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGGGTGGGGGTACCCCCAAGGGAACTTATATTTTTC AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGGGTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN

23 poliA AGGGGAGAATTTATGATCCCCTAGTACACCCGGCAGGACCGGTCCGGAATTCCCCGGTCGACCCAC GCGTCCGCTACAACAACAGCAGCAGCTTCCATTTACCTTGTCGGCTGTTGCAACCGCTGCTGCCTA CCACCAGCAACTACAGCTGCTACCAGTTAACCCATTGGCACTGGCTAACCCATTGGCTGCTGCCTT CCTGCAGCAGCAACAATTGCTGCCATTCAACCAGATGTCTTTGATGAACCCTGCCTTGTCGTGGTA GCAACCCATCGTTGGAGGTGCCATCTTCTAGAATACAAATGAGTTGTACTTGATAACAATGTTCTT GTGTCGGCGTGTGCAACTTCCCAGAAATAATCAATACATTGATTGAGATTTANAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAATATAATTAAAATAAAAAAATTTATAAAAAAAAAAAAATAATT TTTTTTTATAAAAAATAAATATAAAATAAAAAGGGGGGGCCGTTTTAAAGGAACAAAGTTTAAGAC CGGGGGTATGAAAGGGAAAATTTTTTTATATAGGGCCCCAAAATTAAATACATGGGCCGGTGTTAA CAACGGCGGGAGGGAAAAAACCTGGGGGTTACCAATTTAAAGCCGTGGAAAAAATCCCTTTTTTCA AGTGGGGTAAAAAGAAAAGGCCCCACCCATCGCCCTTCCAAAAATTGCCCCCCTTAAAGGAAAAAG GACACCCCCTTTTGGGCGCATATAACCGGGGGGGTGGGGGTACCCCCAAGGGAACTTATATTTTTC AGGCCTCATAGCCCTTTTTTTTTTTTTTTTTTTTTTTTTCAAGGTAGCGGGTTTCCCAGGAAAATT AAAAGGGGGGTCCTTTTGGGTAATAATGTTTTN

24 Quality trimming 754 bases
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTCTTCCCCAACACCTTTCAGGCCACCATTGTCTCC GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTGCGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30 754 bases

25 Quality trimming 753 bases
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTCTTCCCCAACACCTTTCAGGCCACCATTGTCTCC GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTGCGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGC 753 bases ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30

26 Quality trimming 618 bases
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT 618 bases ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30

27 Resultado de blastX read trimmado
>gi| |sp|P49027|GBLP_ORYSA GUANINE NUCLEOTIDE-BINDING PROTEIN BETA SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pir||T03764 protein RWD - rice dbj|BAA | (D38231) RWD [Oryza sativa] Length = 334 Score = 315 bits (798), Expect = 4e-85 Identities = 150/170 (88%), Positives = 156/170 (91%) Frame = +1 Query: 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIVSSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL L G M GHN VTAIATPIDNSPFIVSSSRDKS+LVWDL NPV + E Sbjct: 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSSSRDKSLLVWDLTNPVQNVGEGAG 60 Query: 289 TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGHEKDV 468 ++YGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGH+KDV Sbjct: 61 ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGHDKDV 120 Query: 469 LSVAFSVDNRQIVSASRDKTIKLWNTLGECKYTIGGDLGGGEGHNGWVSC 618 LSVAFSVDNRQIVSASRD+TIKLWNTLGECKYTIGGDLGGGEGHNGWVSC Sbjct: 121 LSVAFSVDNRQIVSASRDRTIKLWNTLGECKYTIGGDLGGGEGHNGWVSC 170

28 Resultado de blastX read inteiro
>gi| |sp|P49027|GBLP_ORYSA GUANINE NUCLEOTIDE-BINDING PROTEIN BETA SUBUNIT-LIKE PROTEIN (GPB-LR) (RWD) pir||T03764 protein RWD - rice dbj|BAA | (D38231) RWD [Oryza sativa] Length = 334 Score = 352 bits (893), Expect(2) = e-100 Identities = 168/192 (87%), Positives = 175/192 (90%) Frame = +1 Query: 109 MAGAQESLSLVGTMRGHNGEVTAIATPIDNSPFIVSSSRDKSVLVWDLQNPVHSTPESGA 288 MAGAQESL L G M GHN VTAIATPIDNSPFIVSSSRDKS+LVWDL NPV + E Sbjct: 1 MAGAQESLVLAGVMHGHNDVVTAIATPIDNSPFIVSSSRDKSLLVWDLTNPVQNVGEGAG 60 Query: 289 TADYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGHEKDV 468 ++YGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGH+KDV Sbjct: 61 ASEYGVPFRRLTGHSHFVQDVVLSSDGQFALSGSWDGELRLWDLSTGVTTRRFVGHDKDV 120 Query: 469 LSVAFSVDNRQIVSASRDKTIKLWNTLGECKYTIGGDLGGGEGHNGWVSCVRFFPNTFQA 648 LSVAFSVDNRQIVSASRD+TIKLWNTLGECKYTIGGDLGGGEGHNGWVSCVRF PNTFQ Sbjct: 121 LSVAFSVDNRQIVSASRDRTIKLWNTLGECKYTIGGDLGGGEGHNGWVSCVRFSPNTFQP 180 Query: 649 TIVSGFWDRTVR 684 TIVSG WDRTV+ Sbjct: 181 TIVSGSWDRTVK 192

29 Determinação do limiar de qualidade

30 Quality trimming 754 bases
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTCTTCCCCAACACCTTTCAGGCCACCATTGTCTCC GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTGCGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30 754 bases

31 Quality trimming 618 bases
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGT 618 bases ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30

32 Quality trimming CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTCTTCCCCAACACCTTTCAGGCCACCATTGTCTCC GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTGCGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC ACGTX: <10 ACGTX: >=10 and <15 ACGTX: >=15 and <20 ACGTX: >=20 and <25 ACGTX: >=25 and <30 ACGTX: >=30 base 684

33 Quality trimming 719 bases 618 684 719 - 66 + 35
CGGAAGACTGGAGTCGTCGCTGCGGCACCGGTCCGGAATTCCCGGGTCGACCCACGCGTCCGGCCG CCGCCACCGCATCCCTTGCAGCCCCAATCCCCCACGGCGACCATGGCCGGCGCGCAGGAGTCCCTG TCCCTGGTGGGCACGATGCGTGGCCACAACGGCGAGGTGACGGCGATCGCCACCCCGATCGACAAC TCGCCGTTCATCGTCTCCTCCTCCCGCGACAAGTCCGTGCTGGTGTGGGACCTGCAAAACCCGGTC CACTCCACCCCGGAATCCGGCGCCACCGCCGACTACGGCGTCCCCTTCCGCCGCCTCACCGGCCAC TCCCACTTCGTCCAGGACGTCGTCCTCAGCTCCGACGGCCAGTTCGCCCTCTCCGGCTCCTGGGAC GGCGAGCTCCGCCTCTGGGACCTCTCCACCGGCGTCACCACCCGCCGCTTCGTCGGCCACGAGAAG GACGTCCTCTCCGTCGCCTTCTCCGTCGACAACCGCCAGATCGTCTCCGCGTCCCGCGACAAGACC ATCAAGCTCTGGAACACCCTCGGTGAGTGCAAGTACACCATTGGTGGCGACCTCGGCGGCGGGGAG GGCCACAACGGGTGGGTCTCCTGCGTCAGGTTCTTCCCCAACACCTTTCAGGCCACCATTGTCTCC GGATTCTGGGACCGCACCGTCAGGTCTGGAACCTTACCAACTGCAAGCTGCGATGCACTCTCGATG CCCACGCGGCTATGTTAACGCCGTCGCC 719 bases antes dif. homol. depois 618 - 66 684 + 35 719

34 Determinação do limiar de qualidade

35 Exemplo de derrapagem

36 399.5 ±161.3 # médio bases >= 20/read
todos os reads 291,689 reads 864.5 ±186.3 comprimento médio 399.5 ±161.3 # médio bases >= 20/read 283,216 reads remoção de ribossômicos 283,216 reads busca de vetores 275,436 reads corte de vector + poliA 273,728 reads corte por qualidade 273,728 reads corte de vetores em extremidade 258,107 reads corte de derrapagens 256,101 reads corte de poliA em extremidade reads trimados 237,954 reads 642.6 ±139.8 avg. read size 397.8 ±120.1 avg bases >= 20/read remoção de seqs de baixa qualidade

37 total 69381 25222 49706 22425 43141 23805 17748 cluster size (reads)
HS X phrap CAP3 X HS total common 1 32202 13731 18535 11634 16838 14296 10744 2 12440 5617 9207 4869 7665 4852 3792 3 6752 2402 5192 2151 4193 1984 1441 4 4225 1239 3329 1145 2709 992 697 5 2856 676 2360 700 1872 521 344 6 2098 442 1806 482 1452 354 231 7 1582 288 1362 317 1115 220 144 8 1245 202 1091 242 862 153 99 9 974 156 913 186 720 113 72 10 776 105 752 143 634 74 44 11 639 76 607 511 54 30 12 492 71 547 429 46 32 13 437 47 454 90 400 40 25 14 366 42 391 341 26 15 306 31 390 50 295 18 16 273 279 35 275 17 225 23 235 177 227 191 19 124 176 >=20 1192 1814 87 2228 total 69381 25222 49706 22425 43141 23805 17748

38 Discrepância interna >gi|169818|gb|M |RICRGHA Rice 25S ribosomal RNA gene Length = 3377 Score = 1011 bits (510), Expect = 0.0 Identities = 540/550 (98%) Strand = Plus / Plus

39 Discrepância interna >gi| |pir||T03241 G-box binding factor 1A - rice >gi|435942|gb|AAC | (U04295) DNA-binding factor of bZIP class [Oryza sativa] Length = 390

40 Teste de consistência interna

41 Teste de consistência interna

42 Teste de consistência interna

43 Teste de consistência externa

44 Teste de consistência externa

45 Teste de consistência externa

46 Números totais Total sequences 291,689 cDNA clones sequenced (5’or 3’) 260,352 5’ end sequences 259,325 3’ end sequences 32,364 Total high-quality sequences 237,954 Success index (%) 81.6 Average insert size (bp) 1,250 Average sequence size (bp) 864 / 642 Bases with phred quality = 20/read 399

47 Números totais Trotal sequences analyzed 237,954 Number of contigs 26,803 Number of singletons 16,338 Number of sugarcane assembled sequences (SAS) 43,141 Number of assembled sequences matching to known genes 27,833 (64.5%) Number of clones with full length inserts 14,409 (

48 Contribuição específica por biblioteca
Número de ESTs SAS contigs singletons % contribuição AD1 8,137 1,474 1,200 3.4 AM1 5,991 841 664 1.9 AM2 6,629 982 705 2.3 CL6 3,511 595 467 1.4 FL1 8,412 1,753 1,465 4.1 FL3 5,714 840 667 FL4 7,289 1,082 886 2.5 FL5 5,115 861 744 2.0 FL8 3,362 378 337 0.9 HR1 5,070 717 519 1.7 LB1 3,699 459 369 1.1 LB2 5,402 790 650 1.8 LR1 6,653 984 819 LR2 2,329 299 254 0.7 LV1 3,068 384 327 RT1 4,227 569 484 1.3 RT2 5,819 942 728 2.2 RT3 4,356 614 478 RZ1 2,012 205 175 0.5 RZ2 3,177 385 301 RZ3 6,528 929 752 2.1 SB1 7,407 1,313 1,132 3.0 SD1 4,459 792 642 SD2 4,099 857 632 ST1 4,359 645 523 1.5 ST3 4,519 507 418 1.2 Contribuição específica por biblioteca 47% dos SAS são formados por reads oriúndos de uma única biblioteca 38% dos SAS tecido-especícos são singletons

49 Classificação funcional
Examination of the primary BLAST matches revealed 3 major groups of SAS with varying potential to predict their cellular function, SAS in the first group. matching sequences of known proteins with strong and nominal similarity. are likely to be transcripts of genes with similar functions ( SAS; 36%), The function of the BLAST match was used to assign cellular roles to this group, The second class is formed by 15% of the total SAS (6.614 or 24,9% of the categorized SAS in Figure 1) and this group matched to ‘unknown protein’. ‘hypothetical protein’ or ‘putative protein’ with no indication of the function of the gene product, Most of these were ESTs from other species that had been entered into the GenBank non-redundant database, In the third group are the SAS (35.6% of the total SAS) with no matches in the GenBank nr database, Almost 50% of all SAS annotated in the SUCEST are associated with five broad role categories: (1) Cellular dynamics (Biogenesis. organization and structure of the cell. Cell division. Cell growth. Motility); (2) stress response (cell rescue activities. disease. virulence and defense); (3) protein metabolism (folding and stabilization. modification. synthesis. targeting. sorting and translocation. proteolysis); (4) bioenergetics (C‑compound and carbohydrate metabolism. photosynthesis) and (5) cellular communication/signal transduction (Figure 1),

50 Porcentagem por órgão The 26 SUCEST cDNA libraries where grouped according to the sugarcane organs used as source for mRNA extraction, Infected Plantlets (AD1. HR1). Meristem (AM1. AM2. LB1. LB2). Callus (CL6). Inflorescence (FL1. FL3. FL4. FL5. FL8). Leaf (LV1. LR1. LR2). Root (RT1. RT2. RT3). Seed (SD1. SD2). Stalk (ST1. ST3. SB1. RZ1. RZ2. RZ3), The abundance of transcripts in each category was calculated dividing the number of high quality sequences in each category by the total number of high quality sequences in each organ (the sum of high quality sequences in each library associated with the plant organ – Table 2 (???) Supplementary Material), Average percentage of SAS per category does not add to 100% because some SAS appear in more than one category, SD is standard deviation,

51 SAS tecido- específicas
Número de ESTs Melhor hit biblioteca 360 (Y17556) alpha kafirin [Sorghum bicolor] SD 103 (A23207) zein zA1 [Zea mays] 42 (AF232008) beta-glucosidase aggregating factor precursor [Zea mays] RT 24 (AC007789) putative low molecular early light-inducible protein [Oryza sativa] 22 (AP002820) putative peroxidase [Oryza sativa] 19 (X56337) alpha-amylase [Oryza sativa] CL 18 (AP000374) cyclopropane fatty acid synthase [Arabidopsis thaliana] FL

52 GenBank - dbEST Março de 1998
Total de Entradas ,528,715 Homo sapiens ,015 (63,4%) Plantas (total) ,087 (4.8%) Mus musculus + domesticus (camundongo) 306,544 Caenorhabditis elegans ,521 Arabidopsis thaliana ,173 Drosophila melanogaster ,625 Oryza sativa (arroz) ,844 Rattus sp. (rato) ,311 Brugia malayi (nematoide parasita) ,641 Toxoplasma gondii ,671 Emericella nidulans ,787 Schistosoma mansoni ,659 Trypanosoma brucei rhodesiense ,519 Danio rerio (zebrafish) ,373 Saccharomyces cerevisiae ,042 Zea mays (milho) ,783 Leishmania major ,692 Saccharum sp Outros ~ 20,000

53 GenBank - dbEST Março de 2001
Total de Entradas ,692,809 Homo sapiens ,369,459 (43.8%) Plantas (total) ,099,102 (14.3 %) Glycine max (soja) ,500 Arabidopsis thaliana 113,000 Medicago truncatula (barrel medic) 112,458 Lycopersicon esculentum (tomate) 107,226 Zea mays (milho) ,999 Oryza sativa (arroz) ,657 Hordeum vulgare (cevada) ,480 Chlamydomonas reinhardtii ,973 Sorghum bicolor ,642 Triticum aestivum (trigo) ,141 Pinus taeda (loblolly pine) ,896 Lotus japonicus ,078 Solanum tuberosum (batata) ,177 Gossypium arboreum ,978 Sorghum propinquum ,974 Mesembryanthemum (ice plant) ,033 Gossypium hirsutum (algodão) ,438 Secale cereale ,123 Saccharum sp Outras Plantas (67 spp.)

54 GenBank - dbEST Setembro de 2002
Total de Entradas ,845,578 Homo sapiens ,691,979 (36.5%) Plantas (total) ,279,170 (17.4 %) Glycine max (soja) ,714 Triticum aestivum (trigo) ,593 Hordeum vulgare (cevada) ,882 Zea mays (milho) ,587 Arabidopsis thaliana ,624 Medicago truncatula (barrel medic) ,500 Lycopersicon esculentum (tomate) 148,346 Chlamydomonas reinhardtii ,324 Oryza sativa (arroz) ,429 Solanum tuberosum (batata) ,420 Sorghum bicolor ,712 Lactuca sativa (alface) ,188 Pinus taeda (loblolly pine) ,226 Physcomitrella patens ,250 Helianthus annuus (girassol) ,961 Gossypium arboreum (algodão) 38,894 Lotus japonicus ,096 Sorghum propinquum ,387 Saccharum sp Outras Plantas (78 spp.) No Laboratório de Biotecnologia de Células Vegetais (LBCV) do Instituto de Tecnologia Química e Biológica da Universidade Nova, localizado em Oeiras, na Quinta do Marquês, esta tecnologia é utilizada diariamente para manter várias linhagens de espécies diferentes de plantas herbáceas e lenhosas. Uma das plantas em estudo é a Medicago truncatula (uma luzerna anual, leguminosa pratense utilizada na produção de alimento em fresco ou para forragem para gado). Esta planta, em condições adequadas, sofre um processo de diferencia- ção in vitro denominado de Embriogénese Somática. Neste processo, as célu- las dos folíolos desta planta multiplicam-se, dando origem a células capazes de, por multiplicação e diferenciação, darem origem a embriões, em tudo idên- ticos aos originados por desenvolvimento de um zigoto (resultante da fusão do grão de pólen germinado com o óvulo).

55 Genetics and Molecular Biology
The libraries that made SUCEST Bioinformatics of the sugarcane EST project Trimming and clustering sugarcane ESTs The sugarcane signal transduction (SUCAST) catalogue: prospecting signal transduction in sugarcane In silico characterization and expression analyses of sugarcane putative sucrose non-fermenting-1 (SNF1) related kinases Identification of like protein in sugarcane (Saccharum officinarum) A search for homologues of plant photoreceptor genes and their signaling partners in the sugarcane expressed sequence tag (Sucest) database Phylogenetic relationships between Arabidopsis and sugarcane bZIP transcriptional regulatory factors Identification of sugarcane cDNAs encoding components of the cell cycle machinery Dissecting the sugarcane expressed sequence tag (SUCEST) database: unraveling flower-specific genes Molecular chaperone genes in the sugarcane expressed sequence database (SUCEST) Oxidative stress response in sugarcane In silico differential display of defense-related expressed sequence tags from sugarcane tissues infected with diazotrophic endophytes Mechanisms of sugarcane response to herbivory Base excision repair in sugarcane Genetics and Molecular Biology Preliminary analysis of microsatellite markers derived from sugarcane expressed sequence tags (ESTs) Sequence polymorphism from EST data in sugarcane: a fine analysis of 6-phosphogluconate dehydrogenase genes A search for markers of sugarcane evolution Sugarcane genes related to mitochondrial function Mitochondrial and chloroplast localization of FtsH-like proteins in sugarcane based on their phylogenetic profile Patterns of expression of cell wall related genes in sugacane Expression of sugarcane genes induced by inoculation with Gluconacetobacter diazotrophicus and Herbaspirillum rubrisubalbicans Identifying sugarcane expressed sequences associated with nutrient transporters and peptide metal chelators Prospecting sugarcane genes involved in aluminum tolerance N-glycosylation in sugarcane Sugarcane expressed sequences tags (ESTs) encoding enzymes involved in lignin biosynthesis pathways Biosynthesis of secondary metabolites in sugarcane Identification of sugarcane genes involved in the purine synthesis pathway A new member of the chalcone synthase (CHS) family in sugarcane Classification. expression pattern and comparative analysis of sugarcane expressed analysis of sugarcane expressed sequences tags (ESTs) encoding glycine-rich proteins (GRPs) Identification. classification and expression pattern analysis of sugarcane cysteine proteinases Identification of metalloprotease gene families in sugarcane Sugarcane phytocystatins: Identification. classification and expression pattern analysis DNA repair-related genes in sugarcane expressed sequence tags (ESTs) Distribution of DNA repair-related ESTs in sugarcane Survey of transposable elements in sugarcane expressed sequence tags (ESTs)

56 Genetics and Molecular Biology

57 Grupo do SUCEST

58 Uma parte do LBI

59 Uma parte do LBI

60 Os trimmadores

61 Grupo Genoma - CBMEG

62 Grupo Genoma - CBMEG felipes@cenargen.embrapa.br

63

64

65


Carregar ppt "Felipe Rodrigues da Silva Embrapa Recursos Genéticos e Biotecnologia"

Apresentações semelhantes


Anúncios Google