在 多執行緒平行計算 OpenMP教學 Part1 : 基礎語法 (附完整程式碼) 中,我們只是將parallel region裡面的程式碼執行了很多次 : 我們開了很多執行緒,每個執行緒都將裡面的函式(printf)做了一次,其實並沒有真正作到分工。在OpenMP中,要將計算量分工給多個執行緒處理,可以用到 worksharing constructs : for construct, sections construct, single construct。以下我們將使用這些 worksharing constructs 來達到真正的分工、真正的平行化處理,提昇效能
完整程式碼
- GitHub 完整程式碼連結 : https://github.com/grandma-tutorial/OpenMP-tutorial
OpenMP 語法
- # pragma omp for
- # pragma omp sections
- # pragma omp single
OpenMP 範例程式 : For loop 平行化
- # pragma omp for : 將 for loop 平行化,要注意的是,在以下範例中,每個 array 裡面的元素不應該和其他元素有關係,每個計算要是獨立的( 計算順序不重要),這樣才能讓 for loop 平行不出錯
- # pragma omp for 必須在 parallel region 裡面使用 (必須在 #pragma omp parallel的大括號裡面),其他的 worksharing construct 也是,需在 parallel region 裡面
# === complile 編譯 ===
$ g++ -fopenmp example_worksharing_1.cpp -o example_worksharing_1.out
// ** 檔名 example_worksharing_1.cpp **
// 都會阿嬤 OpenMP 教學
// 都會阿嬤 https://weikaiwei.com
#include <stdio.h>
#include <omp.h>
int main()
{
const int N = 8;
int A[N] = {1, 2, 3, 4, 5, 6, 7, 8};
#pragma omp parallel
{
const int thread_id = omp_get_thread_num();
// parallel computing for A[i] = A[i] + 1
#pragma omp for
for (int i = 0; i < N; i++)
{
A[i] = A[i] + 1;
printf("A[%d] is computed by thread number : %d\n", i, thread_id);
}
}
// print the array after parallel computing
for (int i = 0; i < N; i++)
{
printf("%d, ", A[i]);
}
printf("\n");
return 0;
}
# === 執行 execute===
$ ./example_worksharing_1.out
# === 輸出 output===
A[6] is computed by thread number : 3
A[7] is computed by thread number : 3
A[4] is computed by thread number : 2
A[5] is computed by thread number : 2
A[0] is computed by thread number : 0
A[1] is computed by thread number : 0
A[2] is computed by thread number : 1
A[3] is computed by thread number : 1
2, 3, 4, 5, 6, 7, 8, 9,
OpenMP 範例程式 : Sections 平行化
- # pragma omp sections : 包含數個 section (沒有”s”),手動將程式碼拆成很多個 section
- 每個 section 只由1個執行緒執行
# === complile 編譯 ===
$ g++ -fopenmp example_worksharing_2.cpp -o example_worksharing_2.out
// ** 檔名 example_worksharing_2.cpp **
// 都會阿嬤 OpenMP 教學
// 都會阿嬤 https://weikaiwei.com
#include <stdio.h>
#include <omp.h>
int main()
{
const int N = 8;
int A[N] = {1, 2, 3, 4, 5, 6, 7, 8};
#pragma omp parallel
{
const int thread_id = omp_get_thread_num();
// parallel computing for A[i] = A[i] + 1
#pragma omp sections
{
// section 1
#pragma omp section
{
for (int i = 0; i < N / 2; i++)
{
A[i] = A[i] + 1;
printf("A[%d] is computed by thread number : %d\n", i, thread_id);
}
}
// section 2
#pragma omp section
{
for (int i = N / 2; i < N; i++)
{
A[i] = A[i] + 1;
printf("A[%d] is computed by thread number : %d\n", i, thread_id);
}
}
}
}
// print the array after parallel computing
for (int i = 0; i < N; i++)
{
printf("%d, ", A[i]);
}
printf("\n");
return 0;
}
# === 執行 execute===
$ ./example_worksharing_2.out
# === 輸出 output===
A[0] is computed by thread number : 3
A[1] is computed by thread number : 3
A[2] is computed by thread number : 3
A[3] is computed by thread number : 3
A[4] is computed by thread number : 0
A[5] is computed by thread number : 0
A[6] is computed by thread number : 0
A[7] is computed by thread number : 0
2, 3, 4, 5, 6, 7, 8, 9,
OpenMP 範例程式 : single 單一執行
- # pragma omp single : 在 parallel region 裡面,原來程式碼會被所有的執行緒執行,但有 single construct,則只會被一個執行緒執行
# === complile 編譯 ===
$ g++ -fopenmp example_worksharing_3.cpp -o example_worksharing_3.out
// ** 檔名 example_worksharing_3.cpp **
// 都會阿嬤 OpenMP 教學
// 都會阿嬤 https://weikaiwei.com
#include <stdio.h>
#include <omp.h>
int main()
{
#pragma omp parallel
{
const int thread_id = omp_get_thread_num();
printf("**Outside** single section, I am thread number %d\n", thread_id);
#pragma omp single
{
printf("!!Inside!! single section, I am thread number %d\n", thread_id);
}
}
return 0;
}
# === 執行 execute===
$ ./example_worksharing_3.out
# === 輸出 output===
**Outside** single section, I am thread number 0
!!Inside!! single section, I am thread number 0
**Outside** single section, I am thread number 1
**Outside** single section, I am thread number 3
**Outside** single section, I am thread number 2