《大规模并行处理器程序设计》[91M]百度网盘|pdf下载|亲测有效

内容简介

本书介绍并行编程和GPU架构的基本概念，详细探索了构建并行程序的各种技术，涵盖性能、浮点格式、并行模式和动态并行等主题，适合专业人士及学生阅读。书中通过案例研究展示了开发过程，从计算思维的细节着手，*终给出了高效的并行程序示例。新版更新了关于CUDA的讨论，包含CuDNN等新的库，同时将不再重要的内容移到附录中。新版还增加了关于并行模式的两个新章节，并更新了案例研究，以反映当前的行业实践。

作者简介

大卫·B 柯克（David B Kirk）美国国家工程院院士，NVIDIA Fellow，曾任NVIDIA公司首席科学家。他领导了NVIDIA图形技术的开发，并且是CUDA技术的创始人之一。2002年，他荣获ACM SIGGRAPH计算机图形成就奖，以表彰其在把高性能计算机图形系统推向大众市场方面做出的杰出贡献。他拥有加州理工学院计算机科学博士学位。

胡文美（Wen-mei W Hwu）美国伊利诺伊大学厄巴纳-香槟分校电气与计算机工程系AMD Jerry Sanders讲席教授，并行计算研究中心首席科学家，领导IMPACT团队和CUDA卓越中心的研究工作。他在编译器设计、计算机体系结构、微体系结构和并行计算方面做出了卓越贡献，是IEEE Fellow、ACM Fellow，荣获了包括ACM SigArch Maurice Wilkes Award在内的众多奖项。他还是MulticoreWare公司的联合创始人兼CTO。他拥有加州大学伯克利分校计算机科学博士学位。

Preface Acknowledgements
CHAPTER1 Introduction1
11 Heterogeneous Parallel Computing2
12 Architecture of a Modern GPU6
13 Why More Speed or Parallelism?8
14 Speeding Up Real Applications10
15 Challenges in Parallel Programming 12
16 Parallel Programming Languages and Models12
17 Overarching Goals14
18 Organization of the Book15
References 18
CHAPTER2 Data Parallel Computing19
21 Data Parallelism20
22 CUDA C Program Structure22
23 A Vector Addition Kernel 25
24 Device Global Memory and Data Transfer27
25 Kernel Functions and Threading32
26 Kernel Launch37
27 Summary38
Function Declarations38
Kernel Launch38
Built-in (Predefined) Variables 39
Run-time API39
28 Exercises39
References 41
CHAPTER3 Scalable Parallel Execution43
31 CUDA Thread Organization43
32 Mapping Threads to Multidimensional Data47
33 Image Blur: A More Complex Kernel 54
34 Synchronization and Transparent Scalability 58
35 Resource Assignment60
36 Querying Device Properties61
37 Thread Scheduling and Latency Tolerance64
38 Summary67
39 Exercises67
CHAPTER4 Memory and Data Locality 71
41 Importance of Memory Access Efficiency72
42 Matrix Multiplication73
43 CUDA Memory Types77
44 Tiling for Reduced Memory Traffic84
45 A Tiled Matrix Multiplication Kernel90
46 Boundary Checks94
47 Memory as a Limiting Factor to Parallelism97
48 Summary99
49 Exercises

查看全部↓