【平行運算】OpenMP教學(三) 同步 Synchronization

觀看次數： 15,322

同步(Synchronization) 在多執行緒平行化中是很重要的概念。我們可能開了很多執行緒讓他們各別執行執行某些任務(假設是A1, A2, A3,…，A10彼此相似但獨立)，但 B 這個任務必須等 A1, A2, A3,…, A10 都完成後才能執行。
舉個例來說，有 10 個送貨員將 10 包貨物獨立的放到貨車上，第 1 個送貨員先放，或是第 5 個送貨員先放順序不重要，10 個送貨員都將貨物放好之後，送貨員能開車出發去送貨。
以上面的例子來講，10 個送貨員(10 threads)獨立、平行的執行放置貨物 (A1, A2,…,A10)，送貨員(1 thread) 送貨(B) 必須等 A 全部完成後才能執行。
因此等待變成是一件很重要的事，我們必須避免有一個送貨員將貨物放好後，就去開車送貨！所以我們必須在 A 和 B 之間加一個同步的機制，等待 A 全部做完後，再去做 B
好消息是，在多執行緒平行計算 OpenMP教學 Part2 : 分工 Worksharing (附完整程式碼) 中介紹的 worksharing construct : for, sections, single 是有自帶一個 barrier 的，也就是會等for, sections, single 裡面的事情做完後，程式碼才會往下執行

完整程式碼

GitHub 完整程式碼連結 : https://github.com/grandma-tutorial/OpenMP-tutorial

OpenMP 語法

# pragma omp barrier
# pragma omp critical
# pragma omp ordered

OpenMP 範例程式 : barrier 障礙

# pragma omp barrier : 等到barrier上面的 threads 都做完後，再往下執行程式。想要讓 A 先做完再做 B，就在 A 和 B之間加一個 barrier 做同步 synchronization
for, sections, single 是有自帶一個 barrier 的，也就是會等for, sections, single 裡面的事情做完後，程式碼才會往下執行
以下範例中，在 Case1 沒有 barrier，造成 A 與 B 交錯執行；Case2 裡 A 和 B 之間有barrier，所以 A 執行完才執行B

# === complile 編譯 ===
$ g++ -fopenmp example_synchronization_1.cpp -o example_synchronization_1.out

// ** 檔名 example_synchronization_1.cpp **

#include <stdio.h>
#include <omp.h>

int main()
{
    printf("Case 1, without barrier:\n");
    #pragma omp parallel num_threads(2)
    {

        const int thread_id = omp_get_thread_num();
        // Case 1, without barrier
        printf("A1 I am thread %d\n", thread_id);
        printf("A2 I am thread %d\n", thread_id);
        printf("B I am thread %d\n", thread_id);
    }

    printf("\n=======================\n\n");
    printf("Case 2, with barrier:\n");
    #pragma omp parallel num_threads(2)
    {
        const int thread_id = omp_get_thread_num();
        // Case 2, with barrier
        printf("A1 I am thread %d\n", thread_id);
        printf("A2 I am thread %d\n", thread_id);
        #pragma omp barrier
        printf("B I am thread %d\n", thread_id);
    }

    return 0;
}

# === 執行 execute===
$ ./example_synchronization_1.out

# === 輸出 output===
Case 1, without barrier:
A1 I am thread 1
A2 I am thread 1
B I am thread 1
A1 I am thread 0
A2 I am thread 0
B I am thread 0

=======================

Case 2, with barrier:
A1 I am thread 1
A2 I am thread 1
A1 I am thread 0
A2 I am thread 0
B I am thread 1
B I am thread 0

OpenMP 範例程式 : critical 關鍵

# pragma omp critical : 在 critical region 裡面的程式碼，無論何時都只會被 1 個 thread 執行
critical 與 single 看起來很像，差異在於 single 裡面的程式碼只會執行一次，而 critical 裡面的程式碼，會執行很多次，但一次只會有 1 個 thread 執行，不會有 2 個 thread 同時執行裡面的程式碼的情況發生

# === complile 編譯 ===
$ g++ -fopenmp example_synchronization_2.cpp -o example_synchronization_2.out

// ** 檔名 example_synchronization_2.cpp **
// 都會阿嬤 OpenMP 教學
// 都會阿嬤 https://weikaiwei.com
#include <stdio.h>
#include <omp.h>

int main()
{
    // Case 1, without critical
    int number = 0;
    #pragma omp parallel num_threads(2)
    {
        #pragma omp for
        for (int i = 0; i < 10000; i++)
        {
            number++;
        }

        #pragma omp single // just print once
        printf("Without critical, the number is :%d\n", number); // wrong because of data race
    }

    // Case 2, with critical
    number = 0;
#pragma omp parallel num_threads(2)
    {
        #pragma omp for
        for (int i = 0; i < 10000; i++)
        {
            #pragma omp critical
            {
                number++;
            }
        }

        #pragma omp single // just print once
        printf("With critical,    the number is :%d\n", number); // correct because of data race
    }

    return 0;
}

# === 執行 execute===
$ ./example_synchronization_2.out

# === 輸出 output===
Without critical, the number is :5275
With critical,    the number is :10000

OpenMP 範例程式 : ordered 順序

# pragma omp ordered : 搭配 # pragma omp for ordered 使用，放在 for region 裡面，# pragma omp ordered 指定 for loop 裡面要被順序執行的區塊，
以下例子中，會將陣列A中的每個元素加1，在一般的 for 平行化中，每個元素相加的順序是不一定的，有可能 A[1] 先加，也有可能 A[5] 先加。但若有 ordered，則順序必定是 A[0], A[1],…A[N] (當然這樣就沒有平行的效益了，這邊只是舉個例子讓大家了解 ordered 的用法)

# === complile 編譯 ===
$ g++ -fopenmp example_synchronization_3.cpp -o example_synchronization_3.out

// ** 檔名 example_synchronization_3.cpp **
// 都會阿嬤 OpenMP 教學
// 都會阿嬤 https://weikaiwei.com
#include <stdio.h>
#include <omp.h>

int main()
{
    const int N = 8;
    int A[N] = {1, 2, 3, 4, 5, 6, 7, 8};
#pragma omp parallel
    {
        const int thread_id = omp_get_thread_num();

        #pragma omp for ordered
        for (int i = 0; i < N; i++)
        {
            #pragma omp ordered
            {
                A[i] = A[i] + 1;
                printf("A[%d] is computed by thread : %d\n", i, thread_id);
            }
        }
    }
    return 0;
}

# === 執行 execute===
$ ./example_synchronization_3.out

# === 輸出 output===
A[0] is computed by thread : 0
A[1] is computed by thread : 0
A[2] is computed by thread : 1
A[3] is computed by thread : 1
A[4] is computed by thread : 2
A[5] is computed by thread : 2
A[6] is computed by thread : 3
A[7] is computed by thread : 3

【平行運算】OpenMP教學(三) 同步 Synchronization

完整程式碼

OpenMP 語法

OpenMP 範例程式 : barrier 障礙

OpenMP 範例程式 : critical 關鍵

OpenMP 範例程式 : ordered 順序

You might also like

留言討論區取消回覆