Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

COMPUTER SCIENCE STUDY MATERIALS, Cheat Sheet of Computers and Information technologies

THIS MATERIAL IS HELPING THE ASPIRANTS WHO NEED TO GET THE KNOWLEDGE. IT ALSO HELPFUL TO ALL THE STUDDENTS, ACADEMITIONS WHO WANT TO REFERE THE CONCEPTS SUCH AS COMPUTING ARCHITECTURE AND MANY MORE. THRUGH THESE MATERIALS

Typology: Cheat Sheet

2022/2023

Uploaded on 04/19/2023

spoc-suma-latha
spoc-suma-latha 🇮🇳

1 document

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Parallelism
IData Parallel: performing a
repeated operation (or chain of
operation) over vectors of data.
IConventionally expressed as a
loop, but implementations can be
constructed to perform loop
operations as a single operation.
IOperations can be conditional on
elements (see a2 assignment).
INon-unit strides are often used
(second example).
Examples of data parallelism:
i0..n
a1(i) = b1(i) + c1(i)
if (b2(i)6=0)a2(i) = b2(i) + 4
a3(i) = b3(i) + c3(i+1)
i0,2,4· · · n
a4(i) = b4(i) + c4(i)
1/ 10
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download COMPUTER SCIENCE STUDY MATERIALS and more Cheat Sheet Computers and Information technologies in PDF only on Docsity!

Data Parallelism

I (^) Data Parallel: performing a repeated operation (or chain of operation) over vectors of data. I (^) Conventionally expressed as a loop, but implementations can be constructed to perform loop operations as a single operation. I (^) Operations can be conditional on elements (see a2 assignment). I (^) Non-unit strides are often used (second example).

Examples of data parallelism:

∀i ∈ 0 ..n a 1 (i) = b 1 (i) + c 1 (i) if (b 2 (i) 6 = 0 ) → a 2 (i) = b 2 (i) + 4 a 3 (i) = b 3 (i) + c 3 (i + 1 )

∀i ∈ 0 , 2 , 4 · · · n a 4 (i) = b 4 (i) + c 4 (i)

Supporting Data Parallelism

Three basic solutions:

I (^) Vector processors

I (^) SIMD processors

I (^) GPU processors

Contrary to the textbook prose, vector processors do not predate SIMD processors by 30 years; the ILLIAC-IV was completed in 1966 (predating vector processors)!! Of course the embedding of SIMD in x86 occurred much later.

Interestingly, the Burroughs BSP was effectively the successor to the ILLIAC IV. It had 16 arithmetic units and 17 memory units to facilitate parallel access across a broader range of stride lengths.

Example of Vector Processing

Main memory

Vector registers

Scalar registers

FP add/subtract

FP multiply

FP divide

Integer

Logical

Vector load/store

Vector Processing

I (^) Vector registers: high port count to allow multiple concurrent reads/writes each cycle

I (^) Vector functional units: usually heavily pipelined to permit the processing of a new operation each cycle

I (^) Vector load/store unit: to feed the beast these are also pipelined to get data into/outof the core

I (^) Scalar registers: integer and floating point

Multiple Lanes

(a) Element group (b)

SIMD

CU

DS

DS

DS

PU 1

PU 2

PU MM m

MM 2

MM 1

n

2

1

n

SM

IS

IS

GPU/GPGPU

I (^) GPUs provide multiple types of parallelism that was originally developed for processing the vectors and vector operations commonly found in graphics processing. I (^) Processing with GPUs if often considered a heterogeneous processing platform. I (^) CUDA/OpenCL: two models for programming heterogeneous systems (CUDA is specifically for NVIDIA GPGPUs, highly successful; OpenCL is more general and intended to support heterogeneous parallelism; OpenCL community is fractured and only Intel is really supporting). I (^) ROCm: AMDs continuation of OpenCL (supporting version 2.0; working on version 3.0); supposed to be general purpose and work on a multitude of GPGPU hardware I (^) Metal: Apple proprietary GPU programming (2014) setup to support GPGPU programming (esp machine learning, image processing, neural networks). I (^) The real challenge is planning the migration of data into/outof the GPGPU. Often the GPGPU has limited memory space and feeding the beast becomes an issue.