Go to All Subject -

Computer Sotware and Inormation Technology Engineering CSE IT

Multi - Core Architectures and Programming - CS6801

An Introduction to Parallel Programming by Peter S Pacheco

Chapter 1 Why Parallel Computing


-> Why Parallel Computing?
-> Why We Need Ever-Increasing Performance
-> Why We’re Building Parallel Systems
-> Why we Need to Write Parallel Programs
-> How Do We Write Parallel Programs?
-> Concurrent, Parallel, Distributed

Chapter 2 Parallel Hardware and Parallel Software


-> Parallel Hardware and Parallel Software
-> Some Background: von Neumann architecture, Processes, multitasking, and threads
-> Modifications to the Von Neumann Model
-> Parallel Hardware
-> Parallel Software
-> Input and Output
-> Performance of Parallel Programming
-> Parallel Program Design with example
-> Writing and Running Parallel Programs
-> Assumptions - Parallel Programming

Chapter 3 Distributed Memory Programming with MPI


-> Distributed-Memory Programming with MPI
-> The Trapezoidal Rule in MPI
-> Dealing with I/O
-> Collective Communication
-> MPI Derived Datatypes
-> Performance Evaluation of MPI Programs
-> A Parallel Sorting Algorithm

Chapter 4 Shared Memory Programming with Pthreads


-> Shared-Memory Programming with Pthreads
-> Processes, Threads, and Pthreads
-> Pthreads - Hello, World Program
-> Matrix-Vector Multiplication
-> Critical Sections
-> Busy-Waiting
-> Mutexes
-> Producer-Consumer Synchronization and Semaphores
-> Barriers and Condition Variables
-> Read-Write Locks
-> Caches, Cache Coherence, and False Sharing
-> Thread-Safety
-> Shared-Memory Programming with OpenMP
-> The Trapezoidal Rule
-> Scope of Variables
-> The Reduction Clause
-> The parallel For Directive
-> More About Loops in Openmp: Sorting
-> Scheduling Loops
-> Producers and Consumers
-> Caches, Cache Coherence, and False Sharing
-> Thread-Safety
-> Parallel Program Development
-> Two n-Body Solvers
-> Parallelizing the basic solver using OpenMP
-> Parallelizing the reduced solver using OpenMP
-> Evaluating the OpenMP codes
-> Parallelizing the solvers using pthreads
-> Parallelizing the basic solver using MPI
-> Parallelizing the reduced solver using MPI
-> Performance of the MPI solvers
-> Tree Search
-> Recursive depth-first search
-> Nonrecursive depth-first search
-> Data structures for the serial implementations
-> Performance of the serial implementations
-> Parallelizing tree search
-> A static parallelization of tree search using pthreads
-> A dynamic parallelization of tree search using pthreads
-> Evaluating the Pthreads tree-search programs
-> Parallelizing the tree-search programs using OpenMP
-> Performance of the OpenMP implementations
-> Implementation of tree search using MPI and static partitioning
-> Implementation of tree search using MPI and dynamic partitioning
-> Which API?

Multicore Application Programming For Windows Linux and Oracle Solaris by Darryl Gove

Chapter 1 Hardware and Processes and Threads


-> Hardware, Processes, and Threads
-> Examining the Insides of a Computer
-> The Motivation for Multicore Processors
-> Supporting Multiple Threads on a Single Chip
-> Increasing Instruction Issue Rate with Pipelined Processor Cores
-> Using Caches to Hold Recently Used Data
-> Using Virtual Memory to Store Data
-> Translating from Virtual Addresses to Physical Addresses
-> The Characteristics of Multiprocessor Systems
-> How Latency and Bandwidth Impact Performance
-> The Translation of Source Code to Assembly Language
-> The Performance of 32-Bit versus 64-Bit Code
-> Ensuring the Correct Order of Memory Operations
-> The Differences Between Processes and Threads

Chapter 2 Coding for Performance


-> Coding for Performance
-> Defining Performance
-> Understanding Algorithmic Complexity
-> Why Algorithmic Complexity Is Important
-> Using Algorithmic Complexity with Care
-> How Structure Impacts Performance
-> Performance and Convenience Trade-Offs in Source Code and Build Structures
-> Using Libraries to Structure Applications
-> The Impact of Data Structures on Performance
-> The Role of the Compiler
-> The Two Types of Compiler Optimization
-> Selecting Appropriate Compiler Options
-> How Cross-File Optimization Can Be Used to Improve Performance
-> Using Profile Feedback
-> How Potential Pointer Aliasing Can Inhibit Compiler Optimizations
-> Identifying Where Time Is Spent Using Profiling
-> Commonly Available Profiling Tools
-> How Not to Optimize
-> Performance by Design

Chapter 3 Identifying Opportunities for Parallelism


-> Identifying Opportunities for Parallelism
-> Using Multiple Processes to Improve System Productivity
-> Multiple Users Utilizing a Single System
-> Improving Machine Efficiency Through Consolidation
-> Using Containers to Isolate Applications Sharing a Single System
-> Hosting Multiple Operating Systems Using Hypervisors
-> Using Parallelism to Improve the Performance of a Single Task
-> One Approach to Visualizing Parallel Applications
-> How Parallelism Can Change the Choice of Algorithms
-> Amdahl’s Law
-> Determining the Maximum Practical Threads
-> How Synchronization Costs Reduce Scaling
-> Parallelization Patterns
-> Data Parallelism Using SIMD Instructions
-> Parallelization Using Processes or Threads
-> Multiple Independent Tasks
-> Multiple Loosely Coupled Tasks
-> Multiple Copies of the Same Task
-> Single Task Split Over Multiple Threads
-> Using a Pipeline of Tasks to Work on a Single Item
-> Division of Work into a Client and a Server
-> Splitting Responsibility into a Producer and a Consumer
-> Combining Parallelization Strategies
-> How Dependencies Influence the Ability Run Code in Parallel
-> Antidependencies and Output Dependencies
-> Using Speculation to Break Dependencies
-> Critical Paths
-> Identifying Parallelization Opportunities

Chapter 4 Synchronization and Data Sharing


-> Synchronization and Data Sharing
-> Data Races
-> Using Tools to Detect Data Races
-> Avoiding Data Races
-> Synchronization Primitives
-> Mutexes and Critical Regions
-> Spin Locks
-> Semaphores
-> Readers-Writer Locks
-> Barriers
-> Atomic Operations and Lock-Free Code
-> Deadlocks and Livelocks
-> Communication Between Threads and Processes
-> Storing Thread-Private Data

Chapter 5 Using POSIX Threads


-> Using POSIX Threads
-> Creating Threads
-> Compiling Multithreaded Code
-> Process Termination
-> Sharing Data Between Threads
-> Variables and Memory
-> Multiprocess Programming
-> Sockets
-> Reentrant Code and Compiler Flags
-> Windows Threading

Chapter 6 Windows Threading


-> Creating Native Windows Threads
-> Terminating Threads
-> Creating and Resuming Suspended Threads
-> Using Handles to Kernel Resources
-> Methods of Synchronization and Resource Sharing
-> An Example of Requiring Synchronization Between Threads
-> Protecting Access to Code with Critical Sections
-> Protecting Regions of Code with Mutexes
-> Slim Reader/Writer Locks
-> Signaling Event Completion to Other Threads or Processes
-> Wide String Handling in Windows
-> Creating Processes
-> Sharing Memory Between Processes
-> Inheriting Handles in Child Processes
-> Naming Mutexes and Sharing Them Between Processes
-> Communicating with Pipes
-> Communicating Using Sockets
-> Atomic Updates of Variables
-> Allocating Thread-Local Storage
-> Setting Thread Priority

Chapter 7 Using Automatic Parallelization and OpenMP


-> Using Automatic Parallelization and OpenMP
-> Using Automatic Parallelization to Produce a Parallel Application
-> Identifying and Parallelizing Reductions
-> Automatic Parallelization of Codes Containing Calls
-> Assisting Compiler in Automatically Parallelizing Code
-> Using OpenMP to Produce a Parallel Application
-> Using OpenMP to Parallelize Loops
-> Runtime Behavior of an OpenMP Application
-> Variable Scoping Inside OpenMP Parallel Regions
-> Parallelizing Reductions Using OpenMP
-> Accessing Private Data Outside the Parallel Region
-> Improving Work Distribution Using Scheduling
-> Using Parallel Sections to Perform Independent Work
-> Nested Parallelism
-> Using OpenMP for Dynamically Defined Parallel Tasks
-> Keeping Data Private to Threads
-> Controlling the OpenMP Runtime Environment
-> Waiting for Work to Complete
-> Restricting the Threads That Execute a Region of Code
-> Ensuring That Code in a Parallel Region Is Executed in Order
-> Collapsing Loops to Improve Workload Balance
-> Enforcing Memory Consistency
-> An Example of Parallelization

Chapter 8 Hand Coded Synchronization and Sharing


-> Hand-Coded Synchronization and Sharing
-> Atomic Operations
-> Using Compare and Swap Instructions to Form More Complex Atomic Operations
-> Enforcing Memory Ordering to Ensure Correct Operation
-> Compiler Support of Memory-Ordering Directives
-> Reordering of Operations by the Compiler
-> Volatile Variables
-> Operating System–Provided Atomics
-> Lockless Algorithms
-> Dekker’s Algorithm
-> Producer-Consumer with a Circular Buffer
-> Scaling to Multiple Consumers or Producers
-> Scaling the Producer-Consumer to Multiple Threads
-> Modifying the Producer-Consumer Code to Use Atomics
-> The ABA Problem

Chapter 9 Scaling with Multicore Processors


-> Scaling with Multicore Processors
-> Constraints to Application Scaling
-> Hardware Constraints to Scaling
-> Bandwidth Sharing Between Cores
-> False Sharing
-> Cache Conflict and Capacity
-> Pipeline Resource Starvation
-> Operating System Constraints to Scaling
-> Multicore Processors and Scaling

Chapter 10 Other Parallelization Technologies


-> Other Parallelization Technologies
-> GPU-Based Computing
-> Language Extensions
-> Alternative Languages
-> Clustering Technologies
-> Transactional Memory
-> Vectorization

​Read Or Refer

Recent New Topics :