Computational Finite Element Methods In Nanotechnology Pdf

16.08.2019

Computational Finite Element Methods In Nanotechnology Pdf

digestfullpac.netlify.com › ▆ ▆ ▆ Computational Finite Element Methods In Nanotechnology Pdf

Computational Finite Element Methods In Nanotechnology Pdf Rating: 6,1/10 5733 votes

Finite Element Methods In Finance

(Redirected from Gpgpu)

Courses offered by the School of Engineering are listed under the subject code ENGR on the Stanford Bulletin's ExploreCourses web site. The School of Engineering. Computational Finite Element Methods in Nanotechnology is a book for beginners in the field of nanotechnology. Read about this book and purchase it here.

General-purpose computing on graphics processing units (GPGPU, rarely GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU).^[1]^[2]^[3]^[4] The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.^[5] In addition, even a single GPU-CPU framework provides advantages that multiple CPUs on their own do not offer due to the specialization in each chip.^[6]

Essentially, a GPGPU pipeline is a kind of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form. While GPUs operate at lower frequencies, they typically have many times the number of cores. Thus, GPUs can process far more pictures and graphical data per second than a traditional CPU. Migrating data into graphical form and then using the GPU to scan and analyze it can create a large speedup.

Manual de medicina intensiva + acceso web. Libros de Medicina en PDF De la Cl. Editor: Juan Carlos Montejo Gonz Oseas Florentino Lira Editor Gil Editores.. Manual de Acupuntura y.

GPGPU pipelines were developed at the beginning of the 21st century for graphics processing (e.g., for better shaders). These pipelines were found to fit scientific computing needs well, and have since been developed in this direction.

Computational Finite Element Methods In Nanotechnology Pdf

2Implementations
3Hardware support
4GPU vs. CPU
5Stream processing
- 5.1GPU programming concepts
- 5.2GPU methods
6Applications
- 6.1Bioinformatics

History[edit]

In principle, any arbitrary boolean function, including those of addition, multiplication and other mathematical functions can be built-up from a functionally complete set of logic operators. In 1987, Conway's Game of Life became one of the first examples of general purpose computing using an early stream processor called a blitter to invoke a special sequence of logical operations on bit vectors.^[7]

General-purpose computing on GPUs became more practical and popular after about 2001, with the advent of both programmable shaders and floating point support on graphics processors. Notably, problems involving matrices and/or vectors – especially two-, three-, or four-dimensional vectors – were easy to translate to a GPU, which acts with native speed and support on those types. The scientific computing community's experiments with the new hardware began with a matrix multiplication routine (2001); one of the first common scientific programs to run faster on GPUs than CPUs was an implementation of LU factorization (2005).^[8]

Finite Element Methods In Finance

These early efforts to use GPUs as general-purpose processors required reformulating computational problems in terms of graphics primitives, as supported by the two major APIs for graphics processors, OpenGL and DirectX. This cumbersome translation was obviated by the advent of general-purpose programming languages and APIs such as Sh/RapidMind, Brook and Accelerator.^[9]^[10]

These were followed by Nvidia's CUDA, which allowed programmers to ignore the underlying graphical concepts in favor of more common high-performance computing concepts.^[8] Newer, hardware vendor-independent offerings include Microsoft's DirectCompute and Apple/Khronos Group's OpenCL.^[8] This means that modern GPGPU pipelines can leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form.

Implementations[edit]

Any language that allows the code running on the CPU to poll a GPU shader for return values, can create a GPGPU framework.

As of 2016, OpenCL is the dominant open general-purpose GPU computing language, and is an open standard defined by the Khronos Group.^[11] OpenCL provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia, and ARM platforms. The Khronos Group is currently involved in the development of SYCL, which has its implementations with ComputeCPP and SYCL STL, the first being developed by Codeplay, and currently only supported in Linux Operating Systems. The second one, being hosted by Khronos Group on GitHub, and possible to be compiled for any modern operating system.

The dominant proprietary framework is NvidiaCUDA.^[12] Nvidia launched CUDA in 2006, a software development kit (SDK) and application programming interface (API) that allows using the programming language C to code algorithms for execution on GeForce 8 series and later GPUs.

Programming standards for parallel computing include OpenCL (vendor-independent), OpenACC, and OpenHMPP. Mark Harris, the founder of GPGPU.org, coined the term GPGPU.

The Xcelerit SDK,^[13] created by Xcelerit,^[14] is designed to accelerate large existing C++ or C# code-bases on GPUs with minimal effort. It provides a simplified programming model, automates parallelisation, manages devices and memory, and compiles to CUDA binaries. Additionally, multi-core CPUs and other accelerators can be targeted from the same source code.

OpenVIDIA was developed at University of Toronto between 2003-2005,^[15] in collaboration with Nvidia.

Altimesh Hybridizer^[16] created by Altimesh^[17] compiles Common Intermediate Language to CUDA binaries. It supports generics and virtual functions.^[18] Debugging and profiling is integrated to visual studio and Nsight.^[19] It's available as a Visual Studio Extension on Visual Studio Marketplace.

Microsoft introduced the DirectCompute GPU computing API, released with the DirectX 11 API.

Alea GPU^[20] created by QuantAlea^[21] introduces native GPU computing capabilities for the Microsoft .NET language F#^[22] and C#. Alea GPU also provides a simplified GPU programming model based on GPU parallel-for and parallel aggregate using delegates and automatic memory management.^[23]

MATLAB supports GPGPU acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server,^[24] and third-party packages like Jacket.

GPGPU processing is also used to simulate Newtonian physics by Physics engines,^[25] and commercial implementations include Havok Physics, FX and PhysX, both of which are typically used for computer and video games.

Close to Metal, now called Stream, is AMD's GPGPU technology for ATI Radeon-based GPUs.

C++ Accelerated Massive Parallelism (C++ AMP) is a library that accelerates execution of C++ code by exploiting the used such setups for high-quantity processing.

Caches[edit]

Historically, CPUs have used hardware-managed caches but the earlier GPUs only provided software-managed local memories. However, as GPUs are being increasingly used for general-purpose applications, state-of-the-art GPUs are being designed with hardware-managed multi-level caches^[29] which have helped the GPUs to move towards mainstream computing. For example, GeForce 200 series GT200 architecture GPUs did not feature an L2 cache, the Fermi GPU has 768 KiB last-level cache, the Kepler GPU has 1.5 MiB last-level cache,^[29]^[30] the Maxwell GPU has 2 MiB last-level cache and the Pascal GPU has 4 MiB last-level cache.

Register file[edit]

GPUs have very large register files, which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200) and Pascal GPUs are 6 MiB and 14 MiB, respectively.^[31]^[32] By comparison, the size of a register file on CPUs is small, typically tens or hundreds of kilobytes.^[31]

Energy efficiency[edit]

Several research projects have compared the energy efficiency of GPUs with that of CPUs and FPGAs.^[33]

Stream processing[edit]

GPUs are designed specifically for graphics and thus are very restrictive in operations and programming. Due to their design, GPUs are only effective for problems that can be solved using stream processing and the hardware can only be used in certain ways.

The following discussion referring to vertices, fragments and textures concerns mainly the legacy model of GPGPU programming, where graphics APIs (OpenGL or DirectX) were used to perform general-purpose computation. With the introduction of the CUDA (Nvidia, 2007) and OpenCL (vendor-independent, 2008) general-purpose computing APIs, in new GPGPU codes it is no longer necessary to map the computation to graphics primitives. The stream processing nature of GPUs remains valid regardless of the APIs used. (See e.g.,^[34])

GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running one kernel on many records in a stream at once.

A stream is simply a set of records that require similar computation. Streams provide data parallelism. Kernels are the functions that are applied to each element in the stream. In the GPUs, vertices and fragments are the elements in streams and vertex and fragment shaders are the kernels to be run on them.^[dubious] For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.^[vague]

Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.^[35]

Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.

GPU programming concepts[edit]

Computational resources[edit]

There are a variety of computational resources available on the GPU:

Programmable processors – vertex, primitive, fragment and mainly compute pipelines allow programmer to perform kernel on streams of data
Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
Texture unit – read-only memory interface
Framebuffer – write-only memory interface

In fact, a program can substitute a write only texture for output instead of the framebuffer. This is done either through Render to Texture (RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.

Textures as stream[edit]

The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.

Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.

Kernels[edit]

Compute kernels can be thought of as the body of loops. For example, a programmer operating on a grid on the CPU might have code that looks like this:

On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.

Flow control[edit]

In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs.^[36] Conditional writes could be performed using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.

Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various methods, such as static branch resolution, pre-computation, predication, loop splitting,^[37] and Z-cull^[38] can be used to achieve branching when hardware support does not exist.

GPU methods[edit]

Map[edit]

The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.

Reduce[edit]

Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally, a reduction can be performed in multiple steps. The results from the prior step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.

Stream filtering[edit]

Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.

Scan[edit]

The scan operation, also termed parallel prefix sum, takes in a vector (stream) of data elements and an (arbitrary) associative binary function '+' with an identity element 'i'. If the input is [a0, a1, a2, a3, ..], an exclusive scan produces the output [i, a0, a0 + a1, a0 + a1 + a2, ..], while an inclusive scan produces the output [a0, a0 + a1, a0 + a1 + a2, a0 + a1 + a2 + a3, ..] and does not require an identity to exist. While at first glance the operation may seem inherently serial, efficient parallel scan algorithms are possible and have been implemented on graphics processing units. The scan operation has uses in e.g., quicksort and sparse matrix-vector multiplication.^[34]^[39]^[40]^[41]

Scatter[edit]

The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the vertex, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects.

The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with another gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot.

In dedicated compute kernels, scatter can be performed by indexed writes.

Gather[edit]

Gather is the reverse of scatter, after scatter reorders elements according to a map, gather can restore the order of the elements according to the map scatter used. In dedicated compute kernels, gather may be performed by indexed reads. In other shaders, it is performed with texture-lookups.

Sort[edit]

The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using radix sort for integer and floating point data and coarse-grained merge sort and fine-grained sorting networks for general comparable data.^[42]^[43]

Search[edit]

The search operation allows the programmer to find a given element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.^{[citation needed]}Mostly the search method used is binary search on sorted elements.

Data structures[edit]

A variety of data structures can be represented on the GPU:

Dense arrays
Sparse matrixes (sparse array) – static or dynamic
Adaptive structures (union type)

Applications[edit]

The following are some of the areas where GPUs have been used for general purpose computing:

Automatic parallelization^[44]^[45]
Computer clusters or a variant of a parallel computing (using GPU cluster technology) for highly calculation-intensive tasks:^{[citation needed]}
- High-performance computing (HPC) clusters, often termed supercomputers
  - including cluster technologies like Message Passing Interface, and single-system image (SSI), distributed computing, and Beowulf
- Grid computing (a form of distributed computing) (networking many heterogeneous computers to create a virtual computer architecture)
- Load-balancing clusters, sometimes termed a server farm
Physical based simulation and physics engines^[25] (usually based on Newtonian physics models)
- Conway's Game of Life, cloth simulation, fluid incompressible flow by solution of Euler equations (fluid dynamics)^[46] or Navier–Stokes equations^[47]
Statistical physics
- Ising model^[48]
Lattice gauge theory^{[citation needed]}
Segmentation – 2D and 3D^[49]
CT reconstruction^[50]
Fast Fourier transform^[51]
GPU learning – machine learning and data mining computations, e.g., with software BIDMach
k-nearest neighbor algorithm^[52]
Fuzzy logic^[53]
Audio signal processing^[54]
- Audio and sound effects processing, to use a GPU for digital signal processing (DSP)
Video processing^[55]
- Hardware accelerated video decoding and post-processing
  - Motion compensation (mo comp)
  - Inverse discrete cosine transform (iDCT)
  - Variable-length decoding (VLD), Huffman coding
  - Inverse quantization (IQ (not to be confused by Intelligence Quotient))
  - In-loop deblocking
  - Bitstream processing (CAVLC/CABAC) using special purpose hardware for this task because this is a serial task not suitable for regular GPGPU computation
  - Deinterlacing
    - Spatial-temporal deinterlacing
  - Noise reduction
  - Edge enhancement
  - Color correction
- Hardware accelerated video encoding and pre-processing
Global illumination – ray tracing, photon mapping, radiosity among others, subsurface scattering
Geometric computing – constructive solid geometry, distance fields, collision detection, transparency computation, shadow generation
Scientific computing
- Monte Carlo simulation of light propagation^[56]
- Molecular modeling on GPU^[57]
- Quantum mechanical physics
- Astrophysics^[58]
Bioinformatics^[59]^[60]
Clinical decision support system (CDSS)^[61]
Computer vision^[62]
Digital signal processing / signal processing
Control engineering^{[citation needed]}
Operations research^[63]^[64]^[65]
- Implementations of: the GPU Tabu Search algorithm solving the Resource Constrained Project Scheduling problem is freely available on GitHub;^[66] the GPU algorithm solving the Nurse Rerostering problem is freely available on GitHub.^[67]
Database operations^[68]^[69]^[70]
Computational Fluid Dynamics especially using Lattice Boltzmann methods
Cryptography^[71] and cryptanalysis
Performance modeling: computationally intensive tasks on GPU^[57]
- Implementations of: MD6, Advanced Encryption Standard (AES),^[72]^[73]Data Encryption Standard (DES), RSA,^[74]elliptic curve cryptography (ECC)
- Password cracking^[75]^[76]
- Cryptocurrency transactions processing ('mining') (Bitcoin mining)
Electronic design automation^[77]^[78]
Antivirus software^[79]^[80]
Intrusion detection^[81]^[82]
Increase computing power for distributed computing projects like SETI@home, Einstein@home

Bioinformatics[edit]

GPGPU usage in Bioinformatics:^[57]^[83]

Application	Description	Supported features	Expected speed-up†	GPU‡	Multi-GPU support	Release status
BarraCUDA	DNA, including epigenetics, sequence mapping software^[84]	Alignment of short sequencing reads	6–10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 0.7.107f
CUDASW++	Open source software for Smith-Waterman protein database searches on GPUs	Parallel search of Smith-Waterman database	10–50x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.0.8
CUSHAW	Parallelized short read aligner	Parallel, accurate long read aligner – gapped alignments to large genomes	10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 1.0.40
GPU-BLAST	Local search with fast k-tuple heuristic	Protein alignment according to blastp, multi CPU threads	3–4x	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 2.2.26
GPU-HMMER	Parallelized local and global search with profile hidden Markov models	Parallel local and global search of hidden Markov models	60–100x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.3.2
mCUDA-MEME	Ultrafast scalable motif discovery algorithm based on MEME	Scalable motif discovery algorithm based on MEME	4–10x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 3.0.12
SeqNFind	A GPU accelerated sequence analysis toolset	Reference assembly, blast, Smith–Waterman, hmm, de novo assembly	400x	T 2075, 2090, K10, K20, K20X	Yes	Available now
UGENE	Opensource Smith–Waterman for SSE/CUDA, suffix array based repeats finder and dotplot	Fast short read alignment	6–8x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 1.11
WideLM	Fits numerous linear models to a fixed design and response	Parallel linear regression on multiple similarly-shaped models	150x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 0.1-1

Molecular dynamics[edit]

Application	Description	Supported features	Expected speed-up†	GPU‡	Multi-GPU support	Release status
Abalone	Models molecular dynamics of biopolymers for simulations of proteins, DNA and ligands	Explicit and implicit solvent, hybrid Monte Carlo	4–120x	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 1.8.88
ACEMD	GPU simulation of molecular mechanics force fields, implicit and explicit solvent	Written for use on GPUs	160 ns/day GPU version only	T 2075, 2090, K10, K20, K20X	Yes	Available now
AMBER	Suite of programs to simulate molecular dynamics on biomolecule	PMEMD: explicit and implicit solvent	89.44 ns/day JAC NVE	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 12 + bugfix9
DL-POLY	Simulate macromolecules, polymers, ionic systems, etc. on a distributed memory parallel computer	Two-body forces, link-cell pairs, Ewald SPME forces, Shake VV	4x	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 4.0 source only
CHARMM	MD package to simulate molecular dynamics on biomolecule.	Implicit (5x), explicit (2x) solvent via OpenMM	TBD	T 2075, 2090, K10, K20, K20X	Yes	In development Q4/12
GROMACS	Simulate biochemical molecules with complex bond interactions	Implicit (5x), explicit (2x) solvent	165 ns/Day DHFR	T 2075, 2090, K10, K20, K20X	Single only	Available now, version 4.6 in Q4/12
HOOMD-Blue	Particle dynamics package written grounds up for GPUs	Written for GPUs	2x	T 2075, 2090, K10, K20, K20X	Yes	Available now
LAMMPS	Classical molecular dynamics package	Lennard-Jones, Morse, Buckingham, CHARMM, tabulated, course grain SDK, anisotropic Gay-Bern, RE-squared, 'hybrid' combinations	3–18x	T 2075, 2090, K10, K20, K20X	Yes	Available now
NAMD	Designed for high-performance simulation of large molecular systems	100M atom capable	6.44 ns/days STMV 585x 2050s	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 2.9
OpenMM	Library and application for molecular dynamics for HPC with GPUs	Implicit and explicit solvent, custom forces	Implicit: 127–213 ns/day; Explicit: 18–55 ns/day DHFR	T 2075, 2090, K10, K20, K20X	Yes	Available now, version 4.1.1

† Expected speedups are highly dependent on system configuration. GPU performance compared against multi-core x86 CPU socket. GPU performance benchmarked on GPU supported features and may be a kernel to kernel performance comparison. For details on configuration used, view application website. Speedups as per Nvidia in-house testing or ISV's documentation.

‡ Q=Quadro GPU, T=Tesla GPU. Nvidia recommended GPUs for this application. Check with developer or ISV to obtain certification information.

References[edit]

^Fung, et al., 'Mediated Reality Using Computer Graphics Hardware for Computer Vision'Archived 2 April 2012 at the Wayback Machine, Proceedings of the International Symposium on Wearable Computing 2002 (ISWC2002), Seattle, Washington, USA, 7–10 October 2002, pp. 83–89.
^An EyeTap video-based featureless projective motion estimation assisted by gyroscopic tracking for wearable computer mediated reality, ACM Personal and Ubiquitous Computing published by Springer Verlag, Vol.7, Iss. 3, 2003.
^'Computer Vision Signal Processing on Graphics Processing Units', Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004)Archived 19 August 2011 at the Wayback Machine: Montreal, Quebec, Canada, 17–21 May 2004, pp. V-93 – V-96
^Chitty, D. M. (2007, July). A data parallel approach to genetic programming using programmable graphics hardwareArchived 8 August 2017 at the Wayback Machine. In Proceedings of the 9th annual conference on Genetic and evolutionary computation (pp. 1566-1573). ACM.
^'Using Multiple Graphics Cards as a General Purpose Parallel Computer: Applications to Computer Vision', Proceedings of the 17th International Conference on Pattern Recognition (ICPR2004)Archived 18 July 2011 at the Wayback Machine, Cambridge, United Kingdom, 23–26 August 2004, volume 1, pages 805–808.
^Mittal, S.; Vetter, J. (2015). 'A Survey of CPU-GPU Heterogeneous Computing Techniques'. ACM Computing Surveys. 47 (4): 1–35. doi:10.1145/2788396.
^Hull, Gerald (December 1987). 'LIFE'. Amazing Computing. 2 (12): 81–84.
^ ^a^b^cDu, Peng; Weber, Rick; Luszczek, Piotr; Tomov, Stanimire; Peterson, Gregory; Dongarra, Jack (2012). 'From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming'. Parallel Computing. 38 (8): 391–407. CiteSeerX10.1.1.193.7712. doi:10.1016/j.parco.2011.10.002.
^Tarditi, David; Puri, Sidd; Oglesby, Jose (2006). 'Accelerator: using data parallelism to program GPUs for general-purpose uses'(PDF). ACM SIGARCH Computer Architecture News. 34 (5).
^Che, Shuai; Boyer, Michael; Meng, Jiayuan; Tarjan, D.; Sheaffer, Jeremy W.; Skadron, Kevin (2008). 'A performance study of general-purpose applications on graphics processors using CUDA'. J. Parallel and Distributed Computing. 68 (10): 1370–1380. CiteSeerX10.1.1.143.4849. doi:10.1016/j.jpdc.2008.05.014.
^OpenCLArchived 9 August 2011 at the Wayback Machine at the Khronos Group
^'OpenCL Gains Ground on CUDA'. 28 February 2012. Archived from the original on 23 April 2012. Retrieved 10 April 2012. 'As the two major programming frameworks for GPU computing, OpenCL and CUDA have been competing for mindshare in the developer community for the past few years.'
^'Xcelerit SDK'. XceleritSDK. 26 October 2015. Archived from the original on 8 March 2018.
^'Home page'. Xcelerit. Archived from the original on 8 March 2018.
^James Fung, Steve Mann, Chris Aimone, 'OpenVIDIA: Parallel GPU Computer Vision', Proceedings of the ACM Multimedia 2005, Singapore, 6–11 November 2005, pages 849–852
^'Hybridizer'. Hybridizer. Archived from the original on 17 October 2017.
^'Home page'. Altimesh. Archived from the original on 17 October 2017.
^'Hybridizer generics and inheritance'. 27 July 2017. Archived from the original on 17 October 2017.
^'Debugging and Profiling with Hybridizer'. 5 June 2017. Archived from the original on 17 October 2017.
^'Introduction'. Alea GPU. Archived from the original on 25 December 2016. Retrieved 15 December 2016.
^'Home page'. Quant Alea. Archived from the original on 12 December 2016. Retrieved 15 December 2016.
^'Use F# for GPU Programming'. F# Software Foundation. Archived from the original on 18 December 2016. Retrieved 15 December 2016.
^'Alea GPU Features'. Quant Alea. Archived from the original on 21 December 2016. Retrieved 15 December 2016.
^'MATLAB Adds GPGPU Support'. 20 September 2010. Archived from the original on 27 September 2010.
^ ^a^bJoselli, Mark, et al. 'A new physics engine with automatic process distribution between CPU-GPU.' Proceedings of the 2008 ACM SIGGRAPH symposium on Video games. ACM, 2008.
^'Android 4.2 APIs - Android Developers'. developer.android.com. Archived from the original on 26 August 2013.
^Mapping computational concepts to GPUs: Mark Harris. Mapping computational concepts to GPUs. In ACM SIGGRAPH 2005 Courses (Los Angeles, California, 31 July – 4 August 2005). J. Fujii, Ed. SIGGRAPH '05. ACM Press, New York, NY, 50.
^Double precision on GPUs (Proceedings of ASIM 2005)Archived 21 August 2014 at the Wayback Machine: Dominik Goddeke, Robert Strzodka, and Stefan Turek. Accelerating Double Precision (FEM) Simulations with (GPUs). Proceedings of ASIM 2005 – 18th Symposium on Simulation Technique, 2005.
^ ^a^b'A Survey of Techniques for Managing and Leveraging Caches in GPUsArchived 16 February 2015 at the Wayback Machine', S. Mittal, JCSC, 23(8), 2014.
^'Nvidia-Kepler-GK110-Architecture-Whitepaper'(PDF). Archived(PDF) from the original on 21 February 2015.
^ ^a^b'A Survey of Techniques for Architecting and Managing GPU Register FileArchived 26 March 2016 at the Wayback Machine', IEEE TPDS, 2016
^'Inside Pascal: Nvidia’s Newest Computing PlatformArchived 7 May 2017 at the Wayback Machine'
^'A Survey of Methods for Analyzing and Improving GPU Energy EfficiencyArchived 4 September 2015 at the Wayback Machine', Mittal et al., ACM Computing Surveys, 2014.
^ ^a^b'D. Göddeke, 2010. Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters. Ph.D. dissertation, Technischen Universität Dortmund'. Archived from the original on 16 December 2014.
^Asanovic, K.; Bodik, R.; Demmel, J.; Keaveny, T.; Keutzer, K.; Kubiatowicz, J.; Morgan, N.; Patterson, D.; Sen, K.; Wawrzynek, J.; Wessel, D.; Yelick, K. (2009). 'A view of the parallel computing landscape'. Commun. ACM. 52 (10): 56–67. doi:10.1145/1562764.1562783.
^'GPU Gems – Chapter 34, GPU Flow-Control Idioms'.
^Future Chips. 'Tutorial on removing branches', 2011
^GPGPU survey paperArchived 4 January 2007 at the Wayback Machine: John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. 'A Survey of General-Purpose Computation on Graphics Hardware'. Computer Graphics Forum, volume 26, number 1, 2007, pp. 80–113.
^'S. Sengupta, M. Harris, Y. Zhang, J. D. Owens, 2007. Scan primitives for GPU computing. In T. Aila and M. Segal (eds.): Graphics Hardware (2007)'. Archived from the original on 5 June 2015.
^Blelloch, G. E. (1989). 'Scans as primitive parallel operations'(PDF). IEEE Transactions on Computers. 38 (11): 1526–1538. doi:10.1109/12.42122. Archived(PDF) from the original on 23 September 2015.
^'M. Harris, S. Sengupta, J. D. Owens. Parallel Prefix Sum (Scan) with CUDA. In Nvidia: GPU Gems 3, Chapter 39'.^{[permanent dead link]}
^Merrill, Duane. Allocation-oriented Algorithm Design with Application to GPU Computing. Ph.D. dissertation, Department of Computer Science, University of Virginia. Dec. 2011.
^Sean Baxter. Modern gpuArchived 7 October 2016 at the Wayback Machine, 2013.
^Henriksen, Troels, Martin Elsman, and Cosmin E. Oancea. 'Size slicing: a hybrid approach to size inference in futhark.' Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing. ACM, 2014.
^Baskaran, Muthu Manikandan, et al. 'A compiler framework for optimization of affine loop nests for GPGPUs.' Proceedings of the 22nd annual international conference on Supercomputing. ACM, 2008.
^'K. Crane, I. Llamas, S. Tariq, 2008. Real-Time Simulation and Rendering of 3D Fluids. In Nvidia: GPU Gems 3, Chapter 30'.^{[permanent dead link]}
^'M. Harris, 2004. Fast Fluid Dynamics Simulation on the GPU. In Nvidia: GPU Gems, Chapter 38'. Archived from the original on 7 October 2017.
^Block, Benjamin, Peter Virnau, and Tobias Preis. 'Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D Ising model.' Computer Physics Communications 181.9 (2010): 1549-1556.
^Sun, Shanhui, Christian Bauer, and Reinhard Beichel. 'Automated 3-D segmentation of lungs with lung cancer in CT data using a novel robust active shape model approach.' IEEE transactions on medical imaging 31.2 (2011): 449-460.
^Jimenez, Edward S., and Laurel J. Orr. 'Rethinking the union of computed tomography reconstruction and GPGPU computing.' Penetrating Radiation Systems and Applications XIV. Vol. 8854. International Society for Optics and Photonics, 2013.
^Sørensen, Thomas Sangild, et al. 'Accelerating the nonequispaced fast Fourier transform on commodity graphics hardware.' IEEE Transactions on Medical Imaging 27.4 (2008): 538-547.
^Fast k-nearest neighbor search using GPU. In Proceedings of the CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA, June 2008. V. Garcia and E. Debreuve and M. Barlaud.
^M. Cococcioni, R. Grasso, M. Rixen, Rapid prototyping of high performance fuzzy computing applications using high level GPU programming for maritime operations support, in Proceedings of the 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Paris, 11–15 April 2011
^Whalen, Sean. 'Audio and the graphics processing unit.' Author report, University of California Davis 47 (2005): 51.
^Wilson, Ron (3 September 2009). 'DSP brings you a high-definition moon walk'. EDN. Archived from the original on 22 January 2013. Retrieved 3 September 2009. Lowry is reportedly using Nvidia Tesla GPUs (graphics-processing units) programmed in the company's CUDA (Compute Unified Device Architecture) to implement the algorithms. Nvidia claims that the GPUs are approximately two orders of magnitude faster than CPU computations, reducing the processing time to less than one minute per frame.
^Alerstam, E.; Svensson, T.; Andersson-Engels, S. (2008). 'Parallel computing with graphics processing units for high speed Monte Carlo simulation of photon migration'(PDF). Journal of Biomedical Optics. 13 (6): 060504. Bibcode:2008JBO..13f0504A. doi:10.1117/1.3041496. Archived(PDF) from the original on 9 August 2011.
^ ^a^b^cHasan Khondker S., Chatterjee Amlan , Radhakrishnan, Sridhar, and Antonio John K., 'Performance Prediction Model and Analysis for Compute-Intensive Tasks on GPUs.', The 11th IFIP International Conference on Network and Parallel Computing (NPC-2014), Ilan, Taiwan, Sept. 2014, Lecture Notes in Computer Science (LNCS), pp. 612–17, ISBN978-3-662-44917-2.
^'Computational Physics with GPUs: Lund Observatory'. www.astro.lu.se. Archived from the original on 12 July 2010.
^Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh (2007). 'High-throughput sequence alignment using Graphics Processing Units'. BMC Bioinformatics. 8: 474. doi:10.1186/1471-2105-8-474. PMC2222658. PMID18070356.
^Svetlin A. Manavski; Giorgio Valle (2008). 'CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment'. BMC Bioinformatics. 9 (Suppl. 2): S10. doi:10.1186/1471-2105-9-s2-s10. PMC2323659. PMID18387198.
^Olejnik, M; Steuwer, M; Gorlatch, S; Heider, D (15 November 2014). 'gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing'. Bioinformatics. 30 (22): 3272–3. doi:10.1093/bioinformatics/btu535. PMID25123901.
^Wang, Guohui, et al. 'Accelerating computer vision algorithms using OpenCL framework on the mobile GPU-a case study.' 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
^GPU computing in ORArchived 13 January 2015 at the Wayback Machine Vincent Boyer, Didier El Baz. 'Recent Advances on GPU Computing in Operations Research'. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, on pages: 1778–1787
^Bukata, Libor; Sucha, Premysl; Hanzalek, Zdenek (2014). 'Solving the Resource Constrained Project Scheduling Problem using the parallel Tabu Search designed for the CUDA platform'. Journal of Parallel and Distributed Computing. 77: 58–68. arXiv:1711.04556. doi:10.1016/j.jpdc.2014.11.005.
^Bäumelt, Zdeněk; Dvořák, Jan; Šůcha, Přemysl; Hanzálek, Zdeněk (2016). 'A Novel Approach for Nurse Rerostering based on a Parallel Algorithm'. European Journal of Operational Research. 251 (2): 624–639. doi:10.1016/j.ejor.2015.11.022.
^CTU-IIGArchived 9 January 2016 at the Wayback Machine Czech Technical University in Prague, Industrial Informatics Group (2015).
^NRRPGpuArchived 9 January 2016 at the Wayback Machine Czech Technical University in Prague, Industrial Informatics Group (2015).
^Naju Mancheril. 'GPU-based Sorting in PostgreSQL'(PDF). School of Computer Science – Carnegie Mellon University. Archived(PDF) from the original on 2 August 2011.
^SQream DB
^MapD
^Manavski, Svetlin A. 'CUDA compatible GPU as an efficient hardware accelerator for AES cryptography.' 2007 IEEE International Conference on Signal Processing and Communications. IEEE, 2007.
^Harrison, Owen; Waldron, John (2007). 'AES Encryption Implementation and Analysis on Commodity Graphics Processing Units'. Cryptographic Hardware and Embedded Systems - CHES 2007. Lecture Notes in Computer Science. 4727. p. 209. CiteSeerX10.1.1.149.7643. doi:10.1007/978-3-540-74735-2_15. ISBN978-3-540-74734-5.
^AES and modes of operations on SM4.0 compliant GPUs.Archived 21 August 2010 at the Wayback Machine Owen Harrison, John Waldron, Practical Symmetric Key Cryptography on Modern Graphics Hardware. In proceedings of USENIX Security 2008.
^Harrison, Owen; Waldron, John (2009). 'Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware'. Progress in Cryptology – AFRICACRYPT 2009. Lecture Notes in Computer Science. 5580. p. 350. CiteSeerX10.1.1.155.5448. doi:10.1007/978-3-642-02384-2_22. ISBN978-3-642-02383-5.
^'Teraflop Troubles: The Power of Graphics Processing Units May Threaten the World's Password Security System'. Georgia Tech Research Institute. Archived from the original on 30 December 2010. Retrieved 7 November 2010.
^'Want to deter hackers? Make your password longer'. MSNBC. 19 August 2010. Archived from the original on 29 October 2010. Retrieved 7 November 2010.
^Lerner, Larry (9 April 2009). 'Viewpoint: Mass GPUs, not CPUs for EDA simulations'. EE Times. Retrieved 3 May 2009.
^'W2500 ADS Transient Convolution GT'. accelerates signal integrity simulations on workstations that have Nvidia Compute Unified Device Architecture (CUDA)-based Graphics Processing Units (GPU)
^GrAVity: A Massively Parallel Antivirus EngineArchived 27 July 2010 at the Wayback Machine. Giorgos Vasiliadis and Sotiris Ioannidis, GrAVity: A Massively Parallel Antivirus Engine. In proceedings of RAID 2010.
^'Kaspersky Lab utilizes Nvidia technologies to enhance protection'. Kaspersky Lab. 14 December 2009. Archived from the original on 19 June 2010. During internal testing, the Tesla S1070 demonstrated a 360-fold increase in the speed of the similarity-defining algorithm when compared to the popular Intel Core 2 Duo central processor running at a clock speed of 2.6 GHz.
^Gnort: High Performance Network Intrusion Detection Using Graphics ProcessorsArchived 9 April 2011 at the Wayback Machine. Giorgos Vasiliadis et al., Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In proceedings of RAID 2008.
^Regular Expression Matching on Graphics Hardware for Intrusion DetectionArchived 27 July 2010 at the Wayback Machine. Giorgos Vasiliadis et al., Regular Expression Matching on Graphics Hardware for Intrusion Detection. In proceedings of RAID 2009.
^'Archived copy'(PDF). Archived(PDF) from the original on 25 March 2013. Retrieved 12 September 2013.CS1 maint: Archived copy as title (link)
^Langdon, William B; Lam, Brian Yee Hong; Petke, Justyna; Harman, Mark (2015). 'Improving CUDA DNA Analysis Software with Genetic Programming'. Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO '15. pp. 1063–1070. doi:10.1145/2739480.2754652. ISBN9781450334723.

External links[edit]

OCLTools Open Source OpenCL Compiler and Linker
Tech Report article: 'ATI stakes claims on physics, GPGPU ground' by Scott Wasson
Preis, Tobias; Virnau, Peter; Paul, Wolfgang; Schneider, Johannes J (2009). 'GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model'. Journal of Computational Physics. 228 (12): 4468. Bibcode:2009JCoPh.228.4468P. doi:10.1016/j.jcp.2009.03.018.
GPGPU Programming in F# using the Microsoft Research Accelerator system

Retrieved from 'https://en.wikipedia.org/w/index.php?title=General-purpose_computing_on_graphics_processing_units&oldid=898898985'

digestfullpac