A apresentação está carregando. Por favor, espere

A apresentação está carregando. Por favor, espere

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 29-06-10Frontiers of GPU Computing 20101.

Apresentações semelhantes


Apresentação em tema: "Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 29-06-10Frontiers of GPU Computing 20101."— Transcrição da apresentação:

1 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 29-06-10Frontiers of GPU Computing 20101 Efficient Independent Component Analysis on a GPU Rui Ramalho, Pedro Tomás, Leonel Sousa 1

2 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 29-06-10Frontiers of GPU Computing 20102 Outline Motivation Independent Component Analysis FastICA Algorithm Experimental Results Conclusions 2

3 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Blind Source Separation Blind Source Separation (BSS) is a signal processing technique that separates a set of signals (sources) from a set of mixed signals. Little is known about the original signals or the mixing process, only that the original signals are uncorrelated. 29-06-10Frontiers of GPU Computing 201033 mixsep

4 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Blind Source Separation: Cocktail Party A classical example of blind source separation is the cocktail party problem. –A number of people are talking simultaneously in a crowded room (at cocktail party). –Despite all the noise and cross talking, a human brain has little difficulty following a conversation. –Machines have to rely on blind source separation. 29-06-10Frontiers of GPU Computing 201044

5 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Blind Source Separation: Applications Blind Source Separation has also been used in several other domains: –EEG/MEG measurements (each sensor picks up a mixture of brain electrical activity and BSS can be used to separate and identify them). –Denoising images (by treating the noise as an independent source it is possible to separate it from the images original components). –Financial analysis (BSS can be used to uncover hidden factors in financial data). 29-06-10Frontiers of GPU Computing 20105

6 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Independent Component Analysis Independent Component Analysis (ICA) is a special case of Blind Source Separation. The mixed signals sources are assumed to be statistically independent (BSS only assumes the sources are statistically uncorrelated). 29-06-10Frontiers of GPU Computing 20106

7 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Independent Component Analysis: ICA Model Under the ICA model, the observed variables are assumed to be a linear combination of several independent sources/signals. The objective of ICA is to find the matrix W that inverts the mixing operation performed by the matrix A, without knowledge of A or s. 29-06-10Frontiers of GPU Computing 20107

8 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Independent Component Analysis: Measuring Statistical Independence One of the ways of measuring statistical independence is through negentropy: –H(y) is the differential information entropy of y: In practice J(y) needs to be estimated. The estimator used by FastICA is: –G is a nonquadratic nonlinear function – is a Gaussian variable of zero mean and unit variance 29-06-10Frontiers of GPU Computing 20108

9 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed FastICA Algorithm The procedure for computing the independent components can be divided in 3 stages: –pre-processing Allows a number of simplifications on the FastICA algorithm. –weight vector computation The FastICA algorithm itself. –decorrelation Prevents the algorithm from converging to the same solutions. 29-06-10Frontiers of GPU Computing 20109

10 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed FastICA Algorithm: Preprocessing & Weight Vector Computation Preprocessing includes general tasks such as centering, whitening or filtering the data. The computation of each of the weight vectors is done by: –g is the derivative of the nonlinear contrast function J –This algorithm can be modified to compute all the ICs simultaneously (a symmetric approach). 29-06-10Frontiers of GPU Computing 201010

11 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed FastICA Algorithm: GPU Implementation The preprocessing stage is generally inexpensive and was implemented on the CPU. The FastICA algorithm is composed mostly of matrix operations that can be efficiently implemented using CUBLAS. –The computation of the non-linear function g and g have no dependencies. –The expected value is computed using hierarchical additions, storing the intermediate results in the GPUs shared memory. 29-06-10Frontiers of GPU Computing 201011

12 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed FastICA Algorithm: Decorrelation To keep the estimated weight vectors from converging to the same results, they need to be decorrelated: –After estimating p independent components, subtract the projections of the previous p components from the p+1 estimate: –An alternative is to apply a symmetric decorrelation after every iteration: 29-06-10Frontiers of GPU Computing 201012

13 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Decorrelation: The Tricky Bit The computation of (WW T ) 1/2 is complex and can be done using the eigenvalues of (WW T ). –This can be done using the already available CPU-based high performance libraries (LAPACK). –Alternatively, the eigenvalues can be computed directly on the GPU 29-06-10Frontiers of GPU Computing 201013

14 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Jacobi Eigenvalue Algorithm The Jacobi Eigenvalue Algorithm successively uses Jacobi rotations to annihilate the off-diagonal elements of a given matrix A. A Jacobi rotation is given by: –J is a Jacobi rotation matrix –c = cos( ) –s = sin( ) 29-06-10Frontiers of GPU Computing 201014

15 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Jacobi Eigenvalue Algorithm Each Jacobi rotation only changes two columns and two rows of the matrix A. By carefully choosing the order of the rotations, up to N/2 rotations can be done simultaneously. The matrix J is a very sparse matrix, making CUBLAS unsuitable for this algorithm. 29-06-10Frontiers of GPU Computing 201015

16 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Decorrelation: Iterative Algorithm Another alternative to the eigenvalue problem is to avoid its computation altogether. Algorithm 4 converges to the decorrelation expression presented earlier. 29-06-10Frontiers of GPU Computing 201016

17 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Decorrelation: Comparison of Decorrelation Algorithms Experimental results show that the proposed GPU-based Jacobi eigenvalue algorithm is outperformed by a CPU based LAPACK eigenvalue algorithm using multiple relatively robust representations (MRRR). However, avoiding the explicit computation of the eigenvalues is still the fastest process. 29-06-1017Frontiers of GPU Computing 2010

18 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Experimental Results: Experimental Setup Experimental Setup –A hyperbolic tangent was chosen as a typical non-linear function g –The iterative decorrelation algorithm that avoids the explicit computation of the eigenvalues is used in the decorrelation step. 29-06-10Frontiers of GPU Computing 201018 CPUGPU AMD Opteron 170NVidea GeForce 8800 GTX Number of cores2128 Clock Frequency2 GHz1.35 GHz Main Memory2 GB768 MB

19 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Experimental Results: Single Core CPU Vs GPU The accelerated portion of the algorithm (loop) is spedup up to 110x, for estimating 256 ICs with 10 000 samples. As the accelerated portion gets faster, so grows the influence of the unaccelerated part of the algorithm (the preprocessing stage). This noticeably reduces the global speedup. 29-06-10Frontiers of GPU Computing 201019

20 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Experimental Results: Single Core CPU Vs GPU The accelerated loop component ceases to be the bottleneck. The additional penalty of transferring data to and from the GPU is negligible. 29-06-10Frontiers of GPU Computing 201020

21 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Experimental Results: Multicore CPU Vs GPU The parallelized GPU algorithm was also tested on a more powerful Geforce GTX 285, with 240 cores. This implementation was compared with a CPU based implementation on an Intel Core 2 Quad Q9950 (@2.83GHz) using Intels high performance MKL library. It was possible to attain a speedup of around 12x 29-06-10Frontiers of GPU Computing 201021

22 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed Conclusions By using a GPU it was possible to speedup the FastICA algorithm by 55x for estimating 256 ICs with 1000 samples each, in comparison with a serial version running on a single core of a CPU. These results can be further improved as the current bottleneck lies in the preprocessing stage, which is still done on the CPU. 29-06-10Frontiers of GPU Computing 201022

23 01-06-08IV Jornadas sobre Sistemas Reconfiguráveis - REC'200823 technology from seed 29-06-1023Frontiers of GPU Computing 2010


Carregar ppt "Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed 29-06-10Frontiers of GPU Computing 20101."

Apresentações semelhantes


Anúncios Google