Papers on hgpu.org (.txt-file)
__host__ __device__ — Generic programming in Cuda
.NET High Performance Computing
“Local Rank Differences” Image Feature Implemented on GPU
[Serbian] The Methods and Procedures for Accelerating Operations and Queries in Large Database Systems and Data Warehouse (Big Data Systems)
10×10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency
190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs
2-D Impulse Noise Suppression by Recursive Gaussian Maximum Likelihood Estimation
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
2D and 3D level-set algorithms on GPU
2D Image Convolution using Three Parallel Programming Models on the Xeon Phi
2D Triangulation of Polygons on CUDA
2D/3D image registration on the GPU
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
2PARMA: Parallel Paradigms and Run-time Management Techniques for Many-Core Architectures
3-SAT on CUDA: Towards a massively parallel SAT solver
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
3D data denoising via Non-Local means filter by using parallel GPU strategies
3D Edge Bundling for Geographical Data Visualization
3D finite difference computation on GPUs using CUDA
3D finite element numerical integration on GPUs
3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis
3D Haar-Like Elliptical Features for Object Classification in Microscopy
3D Hydrodynamic Simulation of Classical Nova Explosions
3D Information Extraction Based on GPU
3D Modeling, Distance and Gradient Computation for Motion Planning: A Direct GPGPU Approach
3D Non-Local Means denoising via multi-GPU
3D nonrigid registration via optimal mass transport on the GPU
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition with Convolutional Neural Networks
3D Objects Tracking by GPGPU-Enhanced Particle Filter Algorithms
3D Recursive Gaussian IIR on GPU and FPGAs: A Case Study for Accelerating Bandwidth-Bounded Applications
3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation
3D simulation of complex shading affecting PV systems taking benefit from the power of graphics cards developed for the video game industry
3D Skeleton Extraction Method using Potential Field on OpenCL
3D tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy
3D vision of electromagnetic fields in antenna and microwave technique
3D visualization of astronomy data cubes using immersive displays
3DES ECB Optimized for Massively Parallel CUDA GPU Architecture
3I: A tool for visualizing and processing in parallel 2D & 3D images
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence
4kUHD H264 wireless live video streaming using CUDA
5.6: GPU enhancement of FDTD-PIC plasma-wave simulations
8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy
94% on CIFAR-10 in 3.29 Seconds on a Single GPU
A (ir)regularity-aware task scheduler for heterogeneous platforms
A (Somewhat Dated) Comparative Study of Betweenness Centrality Algorithms on GPU
A 3D Convex Hull Algorithm for Graphics Hardware
A 3D radiative transfer framework: XIII. OpenCL implementation
A 3D radiative transfer framework. VIII. OpenCL implementation
A 57mW embedded mixed-mode neuro-fuzzy accelerator for intelligent multi-core processor
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks
A balanced programming model for emerging heterogeneous multicore systems
A Batched GPU Algorithm for Set Intersection
A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit
A Bi-objective Optimization Framework for Query Plans
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers
A block-asynchronous relaxation method for graphics processing units
A Braille Conversion Service Using GPU and Human Interaction by Computer Vision
A breadth-first course in multicore and manycore programming
A capabilities-aware framework for using computational accelerators in data-intensive computing
A Case Against Small Data Types on GPGPUs
A Case for Work-stealing on FPGAs with OpenCL Atomics
A Case Study for Petascale Applications in Astrophysics: Simulating Gamma-Ray Bursts
A Case Study in Using OpenCL on FPGAs: Creating an Open-Source Accelerator of the AutoDock Molecular Docking Software
A Case Study of OpenCL on an Android Mobile GPU
A Case Study of SWIM: Optimization of Memory Intensive Application on GPGPU
A case study on porting scientific applications to GPU/CUDA
A Case Study: Exploiting Neural Machine Translation to Translate CUDA to OpenCL
A CG-based Poisson solver on a GPU-cluster
A characterization and analysis of PTX kernels
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU
A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
A Cloud Computing Service Architecture of a Parallel Algorithm Oriented to Scientific Computing with CUDA and Monte Carlo
A cluster for CS education in the manycore era
A Co-Design Framework with OpenCL Support for Low-Energy Wide SIMD Processor
A Co-Prime Blur Scheme for Data Security in Video Surveillance
A Coarse Grain Reconfigurable Architecture for sequence alignment problems in bio-informatics
A code motion technique for accelerating general-purpose computation on the GPU
A Code Optimization Framework for Performance Portability of GPU Kernels onto Custom Accelerators
A Code Transformation Framework for Scientific Applications on Structured Grids
A code-based analytical approach for using separate device coprocessors in computing systems
A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques
A collision detection algorithm using adaptive particle sensor
A combined MPI-CUDA parallel solution of linear and nonlinear Poisson-Boltzmann equation
A Common GPU n-Dimensional Array for Python and C
A Comparative Analysis of GPU Implementations of Spectral Unmixing Algorithms
A comparative analysis of the performance and deployment overhead of parallelized Finite Difference Time Domain (FDTD) algorithms on a selection of high performance multiprocessor computing systems
A comparative benchmarking of the FFT on Fermi and Evergreen GPUs
A Comparative Measurement Study of Deep Learning as a Service Framework
A Comparative Study of 2D Numerical Methods with GPU Computing
A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++
A Comparative Study of Game Tree Searching Methods
A comparative study of GPU programming models and architectures using neural networks
Titles: 100
open PDFs: 88
packages: 9