基于嵌入式CPU+GPU异构平台的遥感图像滤波加速

谭鹏源; 薛长斌; 周莉

doi:10.11728/cjss2024.01.2023-0033

基于嵌入式CPU+GPU异构平台的遥感图像滤波加速

doi: 10.11728/cjss2024.01.2023-0033 cstr: 32142.14.cjss2024.01.2023-0033

谭鹏源^{1, 2,},
薛长斌^1, ,,
周莉¹

1.
中国科学院国家空间科学中心　北京　100190
2.
中国科学院大学　北京　100049

基金项目: 中国科学院国防科技重点实验室基金项目资助(CXJJ-20S017)

详细信息

作者简介:

谭鹏源：男, 1996年9月出生于广西壮族自治区钦州市. 现为中国科学院国家空间科学中心硕士研究生, 主要研究方向为遥感图像并行处理. E-mail: tanpengyuan19@mails.ucas.ac.cn

通讯作者:

男, 1972年5月出生于辽宁省锦州市. 现为中国科学院国家空间科学中心研究员, 博士生导师, 主要研究方向为空间在轨精密过程控制技术、星上数据管理技术及航天系统工程等. E-mail: xuechangbin@nssc.ac.cn

中图分类号: V19, TP391
计量
- 文章访问数: 497
- HTML全文浏览量: 164
- PDF下载量: 79
- 被引次数:
  0(来源:Crossref)
  
  0(来源:其他)
出版历程
- 收稿日期: 2023-03-02
- 修回日期: 2023-04-26
- 网络出版日期: 2023-07-27

Acceleration of Remote Sensing Image Filtering Based on Embedded CPU+GPU Heterogeneous Platform

TAN Pengyuan^{1, 2
,},
XUE Changbin^{1
, ,},
ZHOU Li¹

1.
National Space Science Center, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049

摘要

摘要: 针对遥感图像在轨实时处理提出一种基于嵌入式CPU + GPU异构平台的遥感图像滤波加速设计方法. 以加速拉普拉斯滤波为例, 利用GPU的并行计算特点, 通过数据划分及数据映射的方法对算法进行并行设计; 利用GPU的向量单元和缓存等硬件资源, 通过采取向量化和向量重组以及工作组调优方法进一步提高了算法的运行速度. 在嵌入式开发板上验证了加速设计的可行性和高效性. 实验结果表明, 相比于单CPU的串行实现, 在增加GPU并行处理后的拉普拉斯滤波获得了4.08～16.92倍的加速比. 进一步利用GPU硬件资源优化性能后, 加速比可达15.38～56.41倍.
- 嵌入式GPU /
- 遥感图像滤波 /
- OpenCL /
- 向量化 /
- 向量重组
Abstract: A method is proposed for accelerating remote sensing image filtering in real-time using an embedded CPU + GPU heterogeneous platform for satellite-based image processing. the algorithm was initially parallelized through data division and mapping, leveraging the parallel computing capabilities of the GPU. Subsequently, hardware resources like the vector unit and cache of the GPU were employed to enhance algorithm speed through vectorization, vector permutation, and workgroup tuning. The feasibility and efficiency of this accelerated design were validated on an embedded development board. The experiments demonstrate a speedup ranging from 4.08 to 16.92 times when incorporating GPU parallel processing, compared to the serial implementation on a single CPU. Further optimization using GPU hardware resources can push the speedup to 15.38 to 56.41 times.
- Embedded GPU /
- Remote sensing image filtering /
- OpenCL /
- Vectorization /
- Vector permutation

HTML全文

图 1 Mali GPU Midgard架构

Figure 1. Mali GPU Midgard architecture

下载: 全尺寸图片幻灯片

图 2 二维索引空间

Figure 2. Two-dimensional NDRange

下载: 全尺寸图片幻灯片

图 3 两种常见的拉普拉斯模板

Figure 3. Two common types of Laplacian templates

下载: 全尺寸图片幻灯片

图 4 拉普拉斯模板滑动卷积

Figure 4. Laplacian template sliding convolution

下载: 全尺寸图片幻灯片

图 5 拉普拉斯滤波内核向量化示例

Figure 5. Example of vectoring the Laplacian filtering kernel

下载: 全尺寸图片幻灯片

图 6 向量加载冗余

Figure 6. Redundancy in vector loading

下载: 全尺寸图片幻灯片

图 7 相邻行目标向量的计算存在重复使用的数据

Figure 7. Calculation of two target vectors in adjacent rows involves duplicated data

下载: 全尺寸图片幻灯片

图 8 不同工作组大小下最坏形状(深色条纹)和最优形状(浅色条纹)对应的内核执行时间

Figure 8. Kernel execution times corresponding to worst (dark stripes) and optimal (light stripes) shapesfor different workgroup sizes

下载: 全尺寸图片幻灯片

图 9 拉普拉斯滤波GPU版本相对于CPU版本的加速比

Figure 9. Speedup of the Laplacian filtering GPU version relative to its CPU counterpart

下载: 全尺寸图片幻灯片

图 10 不同优化方法在GPU上获得的性能

Figure 10. Performance obtained on the GPU through various optimization methods

下载: 全尺寸图片幻灯片

图 11 图像滤波加速比

Figure 11. Speedup of image filtering

下载: 全尺寸图片幻灯片

参考文献(13)

[1]	韦玉春, 汤国安, 杨昕, 等. 遥感数字图像处理教程[M]. 北京: 科学出版社, 2007: 174-184 WEI Yuchun, TANG Guoan, YANG Xin, et al. Remote Sensing Digital Image Processing Course[M]. Beijing: Science Press, 2007: 174-184
[2]	KOSMIDIS L, RODRIGUEZ I, JOVER-ALVAREZ A, et al. GPU4S: Major project outcomes, lessons learnt and way forward[C]//2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). Grenoble, France: IEEE, 2021: 1314-1319
[3]	XIAO H, GUO B Y, ZHANG H Y, et al. A parallel algorithm of image mean filtering based on OpenCL[J]. IEEE Access, 2021, 9: 65001-65016 doi: 10.1109/ACCESS.2021.3068772
[4]	XIAO H, XIAO S Y, MA G, et al. Image Sobel edge extraction algorithm accelerated by OpenCL[J]. The Journal of Supercomputing, 2022, 78(14): 16236-16265 doi: 10.1007/s11227-022-04404-8
[5]	PANG Y L, JIANG S, CHENG B W, et al. Design and implement of median filter toward remote sensing images based on FPGA[C]//2021 IEEE 14th International Conference on ASIC (ASICON). Kunming, China: IEEE, 2021: 1-4
[6]	HARRIS P. The Mali GPU: An Abstract Machine, Part 3-The Midgard Shader Core[OL]. (2014-03-12)[2023-02-10]. https://community.arm.com/arm-community-blogs/b/graphics-gaming-and-vr-blog/posts/the-mali-gpu-an-abstract-machine-part-3--the-midgard-shader-core
[7]	Khronos OpenCL Working Group. The OpenCL Specification V1.2[EB/OL]. (2011-11-14)[2013-02-10]. https://registry.khronos.org/OpenCL/specs/opencl-1.2.pdf
[8]	周浔. 工业射线图像增强算法的研究[D]. 广州: 华南理工大学, 2020 ZHOU Xun. Research on Industrial Ray Image Enhancement Algorithm[D]. Guangzhou: South China University of Technology, 2020
[9]	SEO S, LEE J, JO G, et al. Automatic OpenCL work-group size selection for multicore CPUs[C]//Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Edinburgh, UK: IEEE, 2013: 387-397
[10]	USAMENTIAGA R. Real-time filtering on parallel SIMD architectures for automated quality inspection[J]. Journal of Real-Time Image Processing, 2021, 18(1): 127-141 doi: 10.1007/s11554-020-00954-3
[11]	LI K, YUAN L, ZHANG Y Q, et al. Reducing redundancy in data organization and arithmetic calculation for stencil computations[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. St. Louis, Missouri: ACM, 2021: 84
[12]	董钰山. 面向SMP的模板计算访存优化研究[D]. 长沙: 国防科学技术大学, 2015 DONG Yushan. Optimizations of Memory-access for Stencil Computations on Shared-memory Multi-core Processor[D]. Changsha: National University of Defense Technology, 2015
[13]	JIANG S Q, RAN L H, CAO T, et al. Profiling and optimizing deep learning inference on mobile GPUs[C]//Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems. Tsukuba, Japan: ACM, 2020: 75-81

施引文献

- Crossref ()
- 其他引用 ()

资源附件(0)

访问统计

图(11)

计量

文章访问数: 497
HTML全文浏览量: 164
PDF下载量: 79
被引次数:
0(来源:Crossref)

0(来源:其他)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于嵌入式CPU+GPU异构平台的遥感图像滤波加速

doi: 10.11728/cjss2024.01.2023-0033 cstr: 32142.14.cjss2024.01.2023-0033

作者简介:

谭鹏源：男, 1996年9月出生于广西壮族自治区钦州市. 现为中国科学院国家空间科学中心硕士研究生, 主要研究方向为遥感图像并行处理. E-mail: tanpengyuan19@mails.ucas.ac.cn

通讯作者:

男, 1972年5月出生于辽宁省锦州市. 现为中国科学院国家空间科学中心研究员, 博士生导师, 主要研究方向为空间在轨精密过程控制技术、星上数据管理技术及航天系统工程等. E-mail: xuechangbin@nssc.ac.cn

计量

Acceleration of Remote Sensing Image Filtering Based on Embedded CPU+GPU Heterogeneous Platform

计量

目录

留言板

基于嵌入式CPU+GPU异构平台的遥感图像滤波加速

doi: 10.11728/cjss2024.01.2023-0033 cstr: 32142.14.cjss2024.01.2023-0033

作者简介: 谭鹏源：男, 1996年9月出生于广西壮族自治区钦州市. 现为中国科学院国家空间科学中心硕士研究生, 主要研究方向为遥感图像并行处理. E-mail: tanpengyuan19@mails.ucas.ac.cn

通讯作者: 男, 1972年5月出生于辽宁省锦州市. 现为中国科学院国家空间科学中心研究员, 博士生导师, 主要研究方向为空间在轨精密过程控制技术、星上数据管理技术及航天系统工程等. E-mail: xuechangbin@nssc.ac.cn

计量

出版历程

Acceleration of Remote Sensing Image Filtering Based on Embedded CPU+GPU Heterogeneous Platform

计量

出版历程

目录

作者简介:

谭鹏源：男, 1996年9月出生于广西壮族自治区钦州市. 现为中国科学院国家空间科学中心硕士研究生, 主要研究方向为遥感图像并行处理. E-mail: tanpengyuan19@mails.ucas.ac.cn

通讯作者:

男, 1972年5月出生于辽宁省锦州市. 现为中国科学院国家空间科学中心研究员, 博士生导师, 主要研究方向为空间在轨精密过程控制技术、星上数据管理技术及航天系统工程等. E-mail: xuechangbin@nssc.ac.cn