2 Celera 65 pessoas 175 mil leituras por dia 4 grupos 2- Mini-Prep 1- Transforma bactérias, praqueia e pega colonias2- Mini-Prep3- Reação de sequenciamento e precipitação com etanol4- Alimenta os Sequenciadores ABI Prism 370065 pessoasCelera
3 O corpo humano tem aproximadamente 100 trilhões células O corpo humano tem aproximadamente 100 trilhões células. Dentro de cada célula há o núcleo que contém o genoma - 46 cromossomos humanos - que gerenciam o desenvolvimento humano
4 Cada cromossomo é uma fita longa de DNA Cada cromossomo é uma fita longa de DNA. Cromossomos são compreendido por milhões de cópias das quatro letras do código genético - A, C, G, T as bases do DNA em que estão arranjados genes e seções não codificadoras. Encontrar a ordem, ou seqüência, destas quatro letras é o objetivo do projeto genoma. O genoma humano inteiro é composto de aproximadamente 3,5 bilhão bases.
5 Para ler o DNA, os cromossomos são cortados em partes minúsculas, cada uma destas será lida individualmente quando todos os segmentos foram lido eles são montados correta na ordem.
6 Dois métodos foram usados: DNA é fragmentado e montado na ordem correta (Celera)Montagem dos cromossomos antes de descodificar a seqüência (Consórcio Público)Métodos
7 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingThe BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNAConstructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragmentsThe shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical mapTherefore, it is much faster
8 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingSeveral copies of the genome are randomly cut into pieces that are about 150,000 base pairs (bp) longMultiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long
9 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingEach of these 150,000 bp fragments is inserted into a BAC- a bacterial artificial chromosomeThe whole collection of BACs containing the entire human genome is called a BAC libraryEach 2,000 and 10,000 bp fragment is inserted into a plasmidThe two collections of plasmids containing 2,000 and 10,000 bp chunks of human DNA are known as plasmid libraries
10 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingThese pieces are fingerprinted to give each piece a unique identification tag that determines the order of the fragmentsFingerprinting involves cutting each BAC fragment with a single enzyme and finding common sequence landmarks in overlapping fragments that determine the location of each BAC along the chromosomeThen overlapping BACs with markers every 100,000 bp form a map of each chromosomeThis step not needed in shotgun sequencing
11 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingEach BAC is then broken randomly into 1,500 bp pieces and placed in another artificial piece of DNA called M13This collection is known as an M13 libraryThis step not needed in shotgun sequencing
12 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingAll the M13 libraries are sequenced500 bp from one end of the fragment are sequenced generating millions of sequencesBoth the 2,000 and the 10,000 bp plasmid libraries are sequenced500 bp from each end of each fragment are decoded generating millions of sequencesSequencing both ends of each insert is critical for the assembling the entire chromosome
13 Whole Genome Shotgun Sequencing BAC to BACSequencingWhole Genome Shotgun SequencingThese sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments togetherComputer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome (Assembler)
14 INFORMÁTICA 1- Checar a qualidade da seqüência Precisão média de 99,5% (1 erro em 200) e meta de 99,99%2- Retirada do vetor3- Blast para tirar seqüências mitocondriais (2114) e sequências que não são humanas - vetor e genoma de E. coli (713)Assembler
15 The Assembler compares the millions of fragments against each other, finding all common segments between two fragments that are at least 40 letters long. These overlaps could not have occurred by chance, and they become the foundation of assemblyOf these overlaps, some are "true" and some are "repeat-induced"
16 The assembler now searches for groups of overlapping fragments that (1) together spell a common sequence, and (2) do not overlap fragments with sequences that dispute, or contest, the common sequenceSuch uncontested groups of fragments are assembled into what are called “unitigs”Each unitig contains on average about 30 fragments
17 The assembler identifies incorrectly assembled unitigs that spell repeats by looking at the "depth" of the total number of fragments in the unitigA statistic called the Discriminator is used to find stacks of fragments that are suspiciously highCorrectly assembled unitigs are called U-unitigs ("U" for unique), and all other unitigs are set aside
18 The Scaffolding stage begins Critical to this stage is the fact that most of the fragments were grabbed from the genome in pairs during sequencing. Known as mate pairs, these fragments are always separated by the same number of letters, either about 1,000 or about 9,000A contiguous sequence of ordered unitigs is a contig. During scaffolding, the assembler orients contigs using matesMate pairs stick together and remain the same distance apart. If mates from the same pair lie on different contigs, for instance, the contigs are likely to be neighbors about 99% of the time
19 As the assembler compares more and more mates, the contig geography becomes apparent. Sets of contigs that are ordered and oriented using enforcing pairs are called scaffolds. At this point, the scaffolding is continuous except for gapsSome of these gaps are due to missing sequence; this is unavoidable. Other gaps contain repetitive sequence that can now be closed using the unitigs that were set aside earlier by the Discriminator
20 The assembler classifies repeat sequences by size and reliability, calling the largest and most reliable repeats "rocks”Rocks are tossed into the gaps first, to be followed by the lesser "stones," and finally the smallest and least reliable pieces, "pebbles"Rocks must be linked to the contigs on either side of a gap by two or more mates
21 Stones are linked to the contigs by only one mate Stones are linked to the contigs by only one mate. Their position in a gap is confirmed by overlapsPebbles are placed in a gap based on the quality of the overlaps between each other and the adjoining contigs
22 CELERA GENOMICS COMPLETES THE FIRST ASSEMBLY OF THE HUMAN GENOME ROCKVILLE, MD, June 26, 2000CELERA GENOMICS COMPLETES THE FIRST ASSEMBLY OF THE HUMAN GENOMEAssembled Genome Has 3.12 Billion Base PairsArtigo
23 Celera's paired end-sequencing strategy has produced paired sequence reads that cover the human genome 35.6 timesThe calculation to perform the assembly involved 500 million trillion base to base comparisons requiring over 20,000 CPU hours on Celera's supercomputerThe method used by Celera has determined the genetic code of five individuals: three females and two males who have identified themselves as Hispanic, Asian, Caucasian, or African American
24 sequencia de vetor nas pontas Celera X PúblicoCelera27,27 milhões de leiturasMedia de 543 pares de bases em cada leitura16 bibliotecas com 5 doadoresAssumindo o genoma de 2,9 Gpb, a cobertura foi de 5,1 vezes em termos de seqüência 38,7 vezes em termos de clones.Público4,44 Gpb de seqüênciassequencia de vetor nas pontas2,6 Mpb fase 361 Mpb fase 1e Lixo16 Mpb fase 020 % acabadaTotal % Rascunho4,36Gpb % seqüênciasúnicas
25 Fases do Genoma Público Fase Read (corta o Bac e recobre 1X)Fase Read a Read e Bac a Bac(1) Contigs dos Bacs vão ao Gene Bank(2) Bacs ordenados em arquivos maioresFase Bacs ordenados e completos