A apresentação está carregando. Por favor, espere

A apresentação está carregando. Por favor, espere

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 13: Query Processing.

Apresentações semelhantes


Apresentação em tema: "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 13: Query Processing."— Transcrição da apresentação:

1 ©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 13: Query Processing

2 ©Silberschatz, Korth and Sudarshan21.2Database System Concepts 2 Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation

3 ©Silberschatz, Korth and Sudarshan21.3Database System Concepts 3 Basic Steps in Query Processing 1.Parsing and translation 2.Optimization 3.Evaluation

4 ©Silberschatz, Korth and Sudarshan21.4Database System Concepts 4 Basic Steps in Query Processing (Cont.) Parsing and translation  translate the query into its internal form. This is then translated into relational algebra.  Parser checks syntax, verifies relations Evaluation  The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query. Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.  Cost is estimated using statistical information from the database catalog  e.g. number of tuples in each relation, size of tuples, etc.

5 ©Silberschatz, Korth and Sudarshan21.5Database System Concepts 5 Basic Steps in Query Processing : Optimization A relational algebra expression may have many equivalent expressions  E.g.,  balance  2500 (  balance (account)) is equivalent to  balance (  balance  2500 (account)) Each relational algebra operation can be evaluated using one of several different algorithms  Correspondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.  E.g., can use an index on balance to find accounts with balance < 2500,  or can perform complete relation scan and discard accounts with balance  2500

6 ©Silberschatz, Korth and Sudarshan21.6Database System Concepts 6 Equivalent execution plans select d.customer-name from branch b, account a, depositor d where b.branch-name = a.branch-name and a.account-number = d.account-number and b.branch-city = ‘Brooklyn’

7 ©Silberschatz, Korth and Sudarshan21.7Database System Concepts 7 Measures of Query Cost Cost is generally measured as total elapsed time for answering query  Many factors contribute to time cost  disk accesses, CPU, or even network communication Typically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into account  Number of seeks * average-seek-cost  Number of blocks read * average-block-read-cost  Number of blocks written * average-block-write-cost  Cost to write a block is greater than cost to read a block –data is read back after being written to ensure that the write was successful ( MAIS o CUSTO DAS TRANSACÇÔES!!!)

8 ©Silberschatz, Korth and Sudarshan21.8Database System Concepts 8 SORTING

9 ©Silberschatz, Korth and Sudarshan21.9Database System Concepts 9 Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each tuple. For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external sort-merge is a good choice.

10 ©Silberschatz, Korth and Sudarshan21.10Database System Concepts 10 External Sort-Merge 1. Create sorted runs. Let i be 0 initially. Repeatedly do the following till the end of the relation: (a) Read M blocks of relation into memory (b) Sort the in-memory blocks (c) Write sorted data to run R i ; increment i. Let the final value of i be N 2. Merge the runs (next slide)….. Let M denote memory size (in pages).

11 ©Silberschatz, Korth and Sudarshan21.11Database System Concepts 11 External Sort-Merge (Cont.) 2. Merge the runs (N-way merge). We assume (for now) that N < M. 1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page 2. repeat 1. Select the first record (in sort order) among all buffer pages 2. Write the record to the output buffer. If the output buffer is full write it to disk. 3. Delete the record from its input buffer page. If the buffer page becomes empty then read the next block (if any) of the run into the buffer. 3. until all input buffer pages are empty:

12 ©Silberschatz, Korth and Sudarshan21.12Database System Concepts 12 External Sort-Merge (Cont.) If N  M, several merge passes are required.  In each pass, contiguous groups of M - 1 runs are merged.  A pass reduces the number of runs by a factor of M -1, and creates runs longer by the same factor.  E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs  Repeated passes are performed till all runs have been merged into one.

13 ©Silberschatz, Korth and Sudarshan21.13Database System Concepts 13 Example: External Sorting Using Sort-Merge

14 ©Silberschatz, Korth and Sudarshan21.14Database System Concepts 14 JOIN

15 ©Silberschatz, Korth and Sudarshan21.15Database System Concepts 15 Nested-Loop (NL) Join To compute the theta join r  s for each tuple t r in r do begin for each tuple t s in s do begin test pair (t r,t s ) to see if they satisfy the join condition  if they do, add t r.t s to the result. end end r is called the outer relation and s the inner relation of the join. Requires no indices and can be used with any kind of join condition. Expensive since it examines every pair of tuples in the two relations. A tabela interior deve ser a mais pequena por causa do LRU (para estar sempre em memória) Pode usar um indice para evitar-se aceder à tabela interior

16 ©Silberschatz, Korth and Sudarshan21.16Database System Concepts 16 Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block B r of r do begin for each block B s of s do begin for each tuple t r in B r do begin for each tuple t s in B s do begin Check if (t r,t s ) satisfy the join condition if they do, add t r t s to the result. end end end end

17 ©Silberschatz, Korth and Sudarshan21.17Database System Concepts 17Merge-Join 1. Sort both relations on their join attribute (if not already sorted on the join attributes). 2. Merge the sorted relations to join them 1. Join step is similar to the merge stage of the sort-merge algorithm. 2. Main difference is handling of duplicate values in join attribute: every pair with same value on join attribute must be matched

18 ©Silberschatz, Korth and Sudarshan21.18Database System Concepts 18 Hash-Join Algorithm 1.Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. 2.Partition r similarly. 3.For each i: (a)Load s i into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. (b)Read the tuples in r i from the disk one by one. For each tuple t r locate each matching tuple t s in s i using the in-memory hash index. Output the concatenation of their attributes. The hash-join of r and s is computed as follows. Relation s is called the build input and r is called the probe input.

19 ©Silberschatz, Korth and Sudarshan21.19Database System Concepts 19 Hash-Join (Cont.) r tuples in r i need only to be compared with s tuples in s i Need not be compared with s tuples in any other partition, since:  an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.  If that value is hashed to some value i, the r tuple has to be in r i and the s tuple in s i.

20 ©Silberschatz, Korth and Sudarshan21.20Database System Concepts 20 Hash-Join (Cont.)

21 ©Silberschatz, Korth and Sudarshan21.21Database System Concepts 21 Índices - revisão O que é:  É uma estrutura auxiliar de acesso à informação  Um índice definido sobre uma chave K, permite aceder rapidamente aos registos contendo a chave K.  Um índice tem 2 componentes:  A que permite encontrar a chave K dentro do índice  A que permite encontar a informação dos registos que contêm a chave K  A chave pode ser composta por vários atributos  Podem existir vários índices sobre a mesma tabela/atributos  A alteração/remoção/inserção do valor de K de algum registo na tabela implica a re-ordenação da estrutura de índice.

22 ©Silberschatz, Korth and Sudarshan21.22Database System Concepts 22 Índices -1 Data Records Index Entries Registos dos Dados Ficheiro de Índice Define o algoritmo de procura da chave Data Entries Contêm o endereço dos blocos de dados no disco com uma dada chave Tabelas Index Entries O algoritmo de procura da chave pode ser : - Sequencial - Hash Table - B-tree

23 ©Silberschatz, Korth and Sudarshan21.23Database System Concepts 23 Índices - 2 Data Records Registos dos Dados Ficheiro de Índice Define o algoritmo de procura da chave Data Entries Contêm o endereço dos blocos de dados no disco Tabelas Data Entries Contêm 1 de 3 hipóteses: - O próprio Data record com chave k - Index Entries

24 ©Silberschatz, Korth and Sudarshan21.24Database System Concepts 24 Índices - 3 Data Records Registos dos Dados Ficheiro de Índice Define o algoritmo de procura da chave Data Entries Contêm o endereço dos blocos de dados no disco Tabelas Data Records - Contêm os dados propriamente ditos - Podem: - não estar ordenados fisicamente pela chave k (tal como figura em cima) - estar ordenados fisicamente pela chave k (tal como figura do próximo slide) Index Entries

25 ©Silberschatz, Korth and Sudarshan21.25Database System Concepts 25 Índices - 4 Data Records Registos dos Dados Ficheiro de Índice Define o algoritmo de procura da chave Data Entries Contêm o endereço dos blocos de dados no disco Tabelas Data Records - Quando os registos estão ordenados fisicamente, diz-se que a tabela/indíce está “Clustered” Index Entries

26 ©Silberschatz, Korth and Sudarshan21.26Database System Concepts 26 Types of Queries 1. Point Query SELECT balance FROM accounts WHERE number = 1023; 2. Multipoint Query SELECT balance FROM accounts WHERE branchnum = 100; 3. Range Query SELECT number FROM accounts WHERE balance > 10000; 4. Prefix Match Query SELECT * FROM employees WHERE name = ‘Jensen’ and firstname = ‘Carl’ and age < 30;

27 ©Silberschatz, Korth and Sudarshan21.27Database System Concepts 27 Types of Queries 5. Function Query SELECT * FROM accounts WHERE balance = max(select balance from accounts) 6. Ordering Query SELECT * FROM accounts ORDER BY balance; 7. Grouping Query SELECT branchnum, avg(balance) FROM accounts GROUP BY branchnum; 8. Join Query SELECT distinct branch.adresse FROM accounts, branch WHERE accounts.branchnum = branch.number and accounts.balance > 10000;

28 ©Silberschatz, Korth and Sudarshan21.28Database System Concepts 28 Heurísticas de criação de índices (1) Quando as queries forem simples.  Se não os índices não são usados!!!! Exemplo, se um atributo aparece como argumento de uma função os índices não são normalmente usados. Quando os resultados representarem uma pequena proporção do total da tabela (< 20%) ? Quando os valores distintos de uma coluna representarem uma pequena proporção do total dos valores da coluna (<5%) ? Se houver condições com multiplos atributos:  As chaves compostas devem que ser declaradas da mesma ordem que usadas  As chaves compostas podem ser usadas para incluir no Data Entries a própria resposta à query..

29 ©Silberschatz, Korth and Sudarshan21.29Database System Concepts 29 Heurísticas de criação de índices (2) Quando a chave for usada em condições nas Cláusulas de Where.  Condições de “range” sugerem índices sequenciais ou Btree  Condições de igualdade sugerem índices hashed Quando a tabela for grande (o número de Data Records for grande) Só se não existirem muitos outros índices na mesma tabela

30 ©Silberschatz, Korth and Sudarshan21.30Database System Concepts 30 Heurísticas de criação de índices(3) Como um índice sobre a chave K:  acelera as leituras de registos identificados por k  acelera os updates de atributos (que não sejam parte de K) e que sejam de registos identificados por k  atrasa as inserções e eliminações de registos  atrasa as alterações aos atributos que pertencem à chave K Só será interessante quando o número acessos dos dois primeiros cenários for bastante superior ao número de acessos dos dois últimos cenários. Regra por omissão é 1 para 5!!

31 ©Silberschatz, Korth and Sudarshan21.31Database System Concepts 31 Heurísticas para a criação de Ìndices “Clustered” Os índices Clustered  implicam a re-ordenação física dos dados sempre que são inseridos novos registos na tabela.  Optimizam as pesquisas de “ranges” (intervalos) Logo só são interessantes quando:  existem queires importantes com cláusulas de Where com “ranges”  as insersões/modificações da chave são pouco frequentes (ou então se o factor de compactação for muito baixo.)

32 ©Silberschatz, Korth and Sudarshan21.32Database System Concepts 32 Heurísticas para a criação de Ìndices “Bitmaps” Os índices bitmaps  implicam a criação de um array de bits por cada valor discreto que a coluna assume Logo só são interessantes quando:  Os valores distintos de uma coluna são uma pequena % dos valores totais da coluna (< 0,1%)  as insersões/modificações da são reduzidas

33 ©Silberschatz, Korth and Sudarshan21.33Database System Concepts 33 Agregração de várias tabelas É possível definir um Índice Clustered de várias tabelas simultâneamente. Exemplo 1: Pessoa(BI, Nome, Morada), Carro(Mat, marca, BI), Um Índice em BI clustered sobre Pessoa e Carro seria armazenado da seguinte forma: BI=100   Joao, Lisboa   Bx-23-45, Fiat   00-23-IP, Rover BI=110   Pedro, Sintra   XX-23-45, Fiat   00-23-II, Opel   33-23-II, Opel   00-23-II, Opel BI=200   Pedro, Sintra BI=201   Rui, Almada 100, Joao, Lisboa 110, Pedro, Sintra 200, Pedro, Sintra 201, Rui, Almada Bx-23-45, Fiat, 100 00-23-IP, Rover, 100 XX -23-45, Fiat, 110 00-23-II, Opel, 110 33-23-II, Opel, 110 00-23-II, Opel, 110

34 ©Silberschatz, Korth and Sudarshan21.34Database System Concepts Indexes with Composite Search Keys Composite Search Keys: Search on a combination of fields.  Equality query: Every field value is equal to a constant value. E.g. wrt index:  age=20 and sal =75  Range query: Some field value is not a constant. E.g.:  age =20; or age=20 and sal > 10 Data entries in index sorted by search key to support range queries.  Lexicographic order, or  Spatial order. sue1375 bob cal joe12 10 20 8011 12 nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80,11 11 12 13 10 20 75 80 Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.

35 ©Silberschatz, Korth and Sudarshan21.35Database System Concepts 35 Multi-Attribute Index Keys To retrieve Emp records with age=30 AND sal=4000, an index on would be better than an index on age or an index on sal.  Such indexes also called composite or concatenated indexes.  Choice of index key orthogonal to clustering etc. If condition is: 20<age<30 AND 3000<sal<5000:  Clustered tree index on or is best. If condition is: age=30 AND 3000<sal<5000:  Clustered index much better than index! Composite indexes are larger, updated more often.

36 ©Silberschatz, Korth and Sudarshan21.36Database System Concepts 36 Example 1 Hash index on D.dname supports ‘Toy’ selection.  Given this, index on D.dno is not needed. Hash index on E.dno allows us to get matching (inner) Emp tuples for each selected (outer) Dept tuple. What if WHERE included: ``... AND E.age=25’’ ?  Could retrieve Emp tuples using index on E.age, then join with Dept tuples satisfying dname selection. Comparable to strategy that used E.dno index.  So, if E.age index is already created, this query provides much less motivation for adding an E.dno index. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=‘Toy’ AND E.dno=D.dno

37 ©Silberschatz, Korth and Sudarshan21.37Database System Concepts 37 Example 2 What index should we build on Emp?  B+ tree on E.sal could be used, OR an index on E.hobby could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions.  As a rule of thumb, equality selections more selective than range selections. Clearly, Emp should be the outer relation.  Suggests that we build a hash index on D.dno. As both examples indicate, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query. Have to understand optimizers! SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE E.sal BETWEEN 10000 AND 20000 AND E.hobby=‘Stamps’ AND E.dno=D.dno

38 ©Silberschatz, Korth and Sudarshan21.38Database System Concepts 38 Examples of Clustering B+ tree index on E.age can be used to get qualifying tuples.  How selective is the condition?  Is the index clustered? Consider the GROUP BY query.  If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.  Clustered E.dno index may be better! Equality queries and duplicates:  Clustering on E.hobby helps! SELECT E.dno FROM Emp E WHERE E.age>40 SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUP BY E.dno SELECT E.dno FROM Emp E WHERE E.hobby=´Stamps´

39 ©Silberschatz, Korth and Sudarshan21.39Database System Concepts 39 Clustering and Joins Clustering is especially important when accessing inner tuples.  Should make index on E.dno clustered. Suppose that the WHERE clause is instead: WHERE E.hobby=‘Stamps´ AND E.dno=D.dno  If many employees collect stamps, Sort-Merge join may be worth considering. A clustered index on D.dno would help. Summary: Clustering is useful whenever many tuples are to be retrieved. SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=‘Toy’ AND E.dno=D.dno

40 ©Silberschatz, Korth and Sudarshan21.40Database System Concepts 40 Index-Only Plans A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable index is available. SELECT D.mgr FROM Dept D, Emp E WHERE D.dno=E.dno SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE D.dno=E.dno SELECT E.dno, COUNT (*) FROM Emp E GROUP BY E.dno SELECT E.dno, MIN (E.sal) FROM Emp E GROUP BY E.dno SELECT AVG (E.sal) FROM Emp E WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 Tree index! Tree index! or Tree!


Carregar ppt "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 13: Query Processing."

Apresentações semelhantes


Anúncios Google