Â Â

**1.** For each of the following code segments, use OpenMP pragmas to make the loop parallel, or

explain why the code segment is not suitable for parallel execution.

**a**. for (i = 0; i < (int) sqrt(x); i++) {

a[i] = i + 12;

if (i < 10) b[i] = a[i];

}

**b**. flag = 0;

for (i = 0; (i < n) & (!flag); i++) {

a[i] = 2.8 * i;

if (a[i] < b[i]) flag = 1;

}

**c**. for (i = 0; i < n; i++) {

a[i] = fun(i);

}

**d**. for (i = 0; i < n; i++) {

a[i] = fun(i);

if (a[i] < b[i]) b[i] = a[i];

}

**e**. for (i = 0; i < n; i++) {

a[i] = fun(i);

if (a[i] < b[i]) break;

}

**f**. product = 0;

for (i = 0; i < n; i++) {

product += a[i] * b[i];

}

**g**. for (i = j; i < 3 * j; i++) {

a[i] = a[i] + a[i-j];

}

**h**. for (i = j; i < n; i++) {

a[i] = c * a[i-j];

}

**2**. Suppose a parallel program completes execution on 32 processors in 348 seconds, and it has

been found that this program spends 21 seconds in initialization and cleanup on one processor, and for

the remaining time all 32 processors are active. What is the scaled speedup of this parallel program?

**3**. Suppose a parallel program executing on 20 processors spends 98% of its time inside parallel

code. What is the scaled speedup of this parallel program?

**4**. The table below shows the speedups observed for six different parallel programs A, B, C, D,

E, F as the number of processors is increased from 1 through 8.

Â Â

Processors

Speedup

Â

A

B

C

D

E

F

Â

1

1.00

1.00

1.00

1.00

1.00

1.00

Â

2

1.60

1.92

1.92

1.96

1.74

1.94

Â

3

2.00

2.73

2.78

2.88

2.30

2.82

Â

4

2.29

3.39

3.57

3.67

2.74

3.65

Â

5

2.50

3.91

4.31

4.46

3.09

4.42

Â

6

2.67

4.29

5.00

5.22

3.38

5.15

Â

7

2.80

4.55

5.65

5.93

3.62

5.84

Â

8

2.91

4.71

6.25

6.25

3.81

6.50

Using the Karp-Flatt metric as the basis, choose the statement that best describes the expected speedup

for each program with 16 processors.

I. The speedup achieved on 16 processors will probably be at least 40% higher than the speedup

achieved on eight processors.

II. The speedup achieved on 16 processors will probably be less than 40% higher than the speedup

achieved on eight processors, due to the increase in overhead as processors are added.

III. The speedup achieved on 16 processors will probably be less than 40% higher than the speedup

achieved on eight processors, due to the large serial component of the computation.

**5**. Let n â‰¥ f(p) denote the isoefficiency relation of a parallel system and let M(n) denote the

amount of memory required to store a problem of size n. Use the scalability function to rank the

parallel systems shown below from the most scalable to the least scalable:

a. f(p) = Cp, M(n) = n2.

b. f(p) = Câˆšp, M(n) = n2.

c. f(p) = Câˆšplog p, M(n) = n2.

d. f(p) = Cplog p, M(n) = n2.

e. f(p) = Cp, M(n) = n.

f. f(p) = Cpâˆšp, M(n) = n.

g. f(p) = Cp2âˆšp, M(n) = n.

**6**. Suppose a problem of size 100,000 can be solved in 15 hours on a computer today. Assuming

that the execution time is solely determined by the CPU speed, determine how large a problem can be

solved in 15 hours time by a computer that is 100 times as fast as todayâ€™s computer, if the algorithm

used to solve the problem has a time complexity given by (for a problem size of n):

a. Î˜(n2)

b. Î˜(nlog2n)

c. Î˜(n3)

1. For each of the following code segments, use OpenMP pragmas to make the loop parallel, or

explain why the code segment is not suitable for parallel execution.

a. for (i = 0; i < (int) sqrt(x); i++) {

a[i] = i + 12;

if (i < 10) b[i] = a[i];

}

b. flag = 0;

for (i = 0; (i < n) & (!flag); i++) {

a[i] = 2.8 * i;

if (a[i] < b[i]) flag = 1;

}

c. for (i = 0; i < n; i++) {

a[i] = fun(i);

}

d. for (i = 0; i < n; i++) {

a[i] = fun(i);

if (a[i] < b[i]) b[i] = a[i];

}

e. for (i = 0; i < n; i++) {

a[i] = fun(i);

if (a[i] < b[i]) break;

}

f. product = 0;

for (i = 0; i < n; i++) {

product += a[i] * b[i];

}

g. for (i = j; i < 3 * j; i++) {

a[i] = a[i] + a[i-j];

}

h. for (i = j; i < n; i++) {

a[i] = c * a[i-j];

}

2. Suppose a parallel program completes execution on 32 processors in 348 seconds, and it has

been found that this program spends 21 seconds in initialization and cleanup on one processor, and for

the remaining time all 32 processors are active. What is the scaled speedup of this parallel program?

3. Suppose a parallel program executing on 20 processors spends 98% of its time inside parallel

code. What is the scaled speedup of this parallel program?

4. The table below shows the speedups observed for six different parallel programs A, B, C, D,

E, F as the number of processors is increased from 1 through 8.

Processors Speedup

A B C D E F

1 1.00 1.00 1.00 1.00 1.00 1.00

2 1.60 1.92 1.92 1.96 1.74 1.94

3 2.00 2.73 2.78 2.88 2.30 2.82

4 2.29 3.39 3.57 3.67 2.74 3.65

5 2.50 3.91 4.31 4.46 3.09 4.42

6 2.67 4.29 5.00 5.22 3.38 5.15

7 2.80 4.55 5.65 5.93 3.62 5.84

8 2.91 4.71 6.25 6.25 3.81 6.50

Using the Karp-Flatt metric as the basis, choose the statement that best describes the expected speedup

for each program with 16 processors.

I. The speedup achieved on 16 processors will probably be at least 40% higher than the speedup

achieved on eight processors.

II. The speedup achieved on 16 processors will probably be less than 40% higher than the speedup

achieved on eight processors, due to the increase in overhead as processors are added.

III. The speedup achieved on 16 processors will probably be less than 40% higher than the speedup

achieved on eight processors, due to the large serial component of the computation.

5. Let n â‰¥ f(p) denote the isoefficiency relation of a parallel system and let M(n) denote the

amount of mem