CUDA Stream Compaction

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2

Haorong Yang
LinkedIn
Tested on: Windows 10 Home, i7-10750H @ 2.60GHz 16GB, GTX 2070 Super Max-Q (Personal)

The goal of this project was to implement a stream compaction algorithm on the GPU in CUDA from scratch. The algorithm will remove 0s from an array of ints utilizing a scan function, which performs parallel reduction on the array to obtain an exclusive prefix sum.

Although the goal is to obtain an efficient parallel solution, for comparison, a few variations of the algorithm were also implemented. A list of algorithms that will be compared to each other:

CPU scan function
CPU stream compaction without scan
CPU sream compaction with scan
GPU naive scan
GPU work-efficient scan & compaction
thrust library's implementation

The test results for array size of 2^8 is:

****************
** SCAN TESTS **
****************
    [  17  20  19  34  19   3   6   3  27   2  14   5  21 ...  36   0 ]
==== cpu scan, power-of-two ====
   elapsed time: 0.0005ms    (std::chrono Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ... 6163 6199 ]
==== cpu scan, non-power-of-two ====
   elapsed time: 0.0004ms    (std::chrono Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ... 6092 6101 ]
    passed
==== naive scan, power-of-two ====
   elapsed time: 0.029056ms    (CUDA Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ... 6163 6199 ]
    passed
==== naive scan, non-power-of-two ====
   elapsed time: 0.026752ms    (CUDA Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ...   0   0 ]
    passed
==== work-efficient scan, power-of-two ====
   elapsed time: 0.012768ms    (CUDA Measured)
    [ 6199 6216 6236 6255 6289 6308 6311 6317 6320 6347 6349 6363 6368 ... 3050 3086 ]
    a[0] = 0, b[0] = 6199
    FAIL VALUE
==== work-efficient scan, non-power-of-two ====
   elapsed time: 0.012512ms    (CUDA Measured)
    [ 6138 6155 6175 6194 6228 6247 6250 6256 6259 6286 6288 6302 6307 ... 2979 2988 ]
    a[0] = 0, b[0] = 6138
    FAIL VALUE
==== thrust scan, power-of-two ====
   elapsed time: 0.055264ms    (CUDA Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ... 6163 6199 ]
    passed
==== thrust scan, non-power-of-two ====
   elapsed time: 0.054368ms    (CUDA Measured)
    [   0  17  37  56  90 109 112 118 121 148 150 164 169 ... 6092 6101 ]
    passed

*****************************
** STREAM COMPACTION TESTS **
*****************************
    [   3   0   1   2   1   1   0   1   1   0   2   1   3 ...   2   0 ]
==== cpu compact without scan, power-of-two ====
   elapsed time: 0.0008ms    (std::chrono Measured)
    [   3   1   2   1   1   1   1   2   1   3   1   2   2 ...   3   2 ]
    passed
==== cpu compact without scan, non-power-of-two ====
   elapsed time: 0.0008ms    (std::chrono Measured)
    [   3   1   2   1   1   1   1   2   1   3   1   2   2 ...   1   3 ]
    passed
==== cpu compact with scan ====
   elapsed time: 0.004ms    (std::chrono Measured)
    [   3   1   2   1   1   1   1   2   1   3   1   2   2 ...   3   2 ]
    passed
==== work-efficient compact, power-of-two ====
   elapsed time: 0.020992ms    (CUDA Measured)
    [   3   1   2   1   1   1   1   2   1   3   1   2   2 ...   0   0 ]
expected count is 185, count is 185
    passed
==== work-efficient compact, non-power-of-two ====
   elapsed time: 0.021888ms    (CUDA Measured)
    [   3   1   2   1   1   1   1   2   1   3   1   2   2 ...   1   3 ]
expected count is 185, count is 183
    passed

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
cmake		cmake
img		img
src		src
stream_compaction		stream_compaction
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md
cis565_stream_compaction_test.launch		cis565_stream_compaction_test.launch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Stream Compaction

About

Uh oh!

Releases

Packages

Languages

YangH34/Project2-Stream-Compaction

Folders and files

Latest commit

History

Repository files navigation

CUDA Stream Compaction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages