








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Every student can refer to this document.
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!
1.1.1 The age of Internet Computing Billions of people use the Internet every day. As a result, supercomputer sites and large data centers must provide high-performance computing (HPC) services to huge numbers of Internet users concurrently, but its no longer optimal for measuring system performance. The emergence of computing clouds instead demands high throughput computing (HTC) systems built with parallel and distributed computing technologies. With upgrade of data centers using fast servers, storage systems, and high bandwidth networks for emerging new technologies 1.1.2 The Platform Evolution Computer technology has gone through five generations of development, with each generation lasting from 10 to 20 years. Successive generations are overlapped in about 10 years, were built to satisfy the demands of large businesses and government organizations. Since 1990, the use of both HPC and HTC systems hidden in. High-Performance Computing: For many years, HPC systems emphasize the raw speed performance , which has been increased to Pflops in 2010. This improvement was driven mainly by the demands from scientific, engineering, and manufacturing communities. For example, the Top 500 most powerful computer systems in the world are measured by floating-point speed in Linpack benchmark results. However, the number of supercomputer users is limited to less than 10% of all computer users. PART-1 Systems Modeling, Clustering and virtualization: Scalable Computing over the Internet, Technologies for Network based systems, System models for Distributed and Cloud Computing, Software environments for distributed systems and clouds, Performance, Security And Energy Efficiency.
Today, the majority of computer users are using desktop computers or large servers when they conduct Internet searches and market-driven computing tasks. High-Throughput Computing : The development of market-oriented high-end computing systems is undergoing a strategic change from an HPC paradigm to an HTC paradigm. This HTC paradigm pays more attention to high-flux computing. The main application for high-flux computing is in Internet searches and web services by millions or more users simultaneously. The performance goal thus shifts to measure high throughput or the number of tasks completed per unit of time. HTC technology needs to not only improve in terms of batch processing speed, but also address the acute problems of cost, energy savings, security, and reliability at many data and enterprise computing centers. COMPUTING PARADIGM DISTINCTIONS Centralized computing : This is a computing paradigm by which all computer resources are centralized in one physical system. All resources (processors, memory, and storage) are fully shared and tightly coupled within one integrated OS. Many data centers and supercomputers are centralized systems. Parallel computing : In parallel computing, all processors are either tightly coupled with centralized shared memory or loosely coupled with distributed memory. Some authors refer to this discipline as parallel processing. Interprocessor communication is accomplished through shared memory or via message passing. A computer system capable of parallel computing is commonly known as a parallel computer Distributed computing : A distributed system consists of multiple autonomous computers, each having its own private memory, communicating through a computer network. Information exchange in a distributed system is accomplished through message passing. A computer program that runs in a distributed system is known as a distributed program. Cloud computing : An Internet cloud of resources can be either a centralized or a distributed computing system. The cloud applies parallel or distributed computing, or both. Clouds can be built with physical or virtualized resources over large data centers that are centralized or distributed. Some authors consider cloud computing to be a form of utility computing or service computing. 1.1.3 Scalable Computing Trends and New Paradigms. It’s important to understand how distributed systems emphasize both resource distribution and concurrency or high degree of parallelism (DoP). Degrees of Parallelism: Bit-level parallelism (BLP): It converts bit-serial processing to word-level processing gradually. Over the years, users graduated from 4-bit microprocessors to 8,16,32, and 64-bitCPUs. Instruction-level parallelism (ILP): The processor executes multiple instructions simultaneously.
1.2.2 GPU Computing to Exascale and Beyond Many-core GPU : Graphics Processing Units are specialist multi-core processors designed for a high degree of parallel processing, containing a large number of simpler, independent processor cores. Many-core processors are used extensively in embedded computers and high- performance computing. (Main frames, super computers). A GPU is a graphics co-processor mounted on a computer’s graphics card to perform high level graphics tasks in video editing apps. A modern GPU chip can be built with hundreds of processing cores.This can be termed as massive parallelism at multicore and multi-threading levels. GPU Programming Model The interaction between a CPU and GPU in performing parallel execution of floating-point operations concurrently is shown in below figure. The CPU instructs the GPU to perform massive data processing where the bandwidth must be matched between main memory and GPU memory. The major benefits of GPU over CPU are power and massive parallelism. 1.2.3 Memory, Storage and Wide Area Network Memory Technology : The traditional RAMs in computers are all DRAMs. For hard drives, capacity increased from 260 MB to 3 TB and lately 5 TB (by Seagate). Static RAM and is 'static' because the memory does not have to be continuously refreshed like Dynamic RAM. SRAM is faster but also more expensive and is used inside the CPU. Faster processor speed and higher memory capacity will result in a wider gap between processors and memory, which is an ever-existing problem. Disks and Storage Technology: The rapid growth of flash memory and solid-state drives (SSD) also has an impact on the future of HPC and HTC systems. An SSD can handle 300,000 to 1 million write cycles per block, increasing the speed and performance. Power consumption should also be taken care-of before planning any increase of capacity. System-Area Interconnects : The nodes in small clusters are interconnected by an Ethernet switch or a LAN. As shown below,where a LAN is used to connect clients to servers. A Storage Area Network (SAN) connects servers to network storage like disk arrays. Network Attached Storage (NAS) connects clients directly to disk arrays. All these types of network appear in a large cluster built with commercial network components (Cisco, Juniper). If
not much data is shared (overlapped), we can build a small cluster with an Ethernet Switch + copper cables to link to the end machines (clients/servers). Wide Area Network (WAN): We can also notice the rapid growth of Ethernet bandwidth from 10 Mbps to 1 Gbps and still increasing. Different bandwidths are needed for local, national, and international levels of networks. It is also estimated that computers will be used concurrently in the coming future and higher bandwidth will certainly add more speed and capacity to aid the cloud/distributed computing. Note that most data centers use gigabit Ethernet as interconnect in their server clusters. 1.2.4 Virtual Machines and Middleware A typical computer has a single OS image at a time. This leads to a rigid architecture that tightly couples apps to a specific hardware platform i.e., an app working on a system might not work on another system with another OS (non-portable). To build large clusters, grids and clouds, we need to increase the capacity of computing, storage and networking resources in a Virtualized manner. A cloud of limited resources should aggregate all these dynamically to bring out the expected results. As seen in above figure, the Host machine is equipped with a physical hardware. The Virtual Machine is built with virtual resources managed by a guest OS to run a specific application (Ex: VMware to run Ubuntu for Hadoop). Between the VMs and the host platform we need a middleware called VM Monitor (VMM). A hypervisor (VMM) is a program that allows different operating systems to share a single hardware host. This approach is called bare-metal VM because a hypervisor handles CPU, memory and I/O directly. VM can also be implemented with a dual mode as shown above. Here, part of VMM runs under user level and another part runs under supervisor level.
Cluster Architecture: The Figure below shows the architecture of a typical server cluster that has low latency and high bandwidth network. For building a large cluster, an interconnection network can be utilized using Gigabit Ethernet, Myrinet or Infini Brand switches. Through a hierarchical construction using SAN, LAN or WAN, scalable clusters can be built with increasing number of nodes. The concerned cluster is connected to the Internet through a VPN (Virtual Private Network) gateway, which has an IP address to locate the cluster. Generally, most clusters have loosely connected nodes, which are autonomous with their own OS. Major Cluster Design Issues : A cluster-wide OSs or a single OS controlling the cluster virtually is not yet available. This makes the designing and achievement of SSI difficult and expensive. All the apps should rely upon the middleware to bring out the coupling between the machines in cluster or between the clusters 1.3.2 Grid Computing Infrastructures Grid computing is designed to allow close interaction among applications running on distant computers simultaneously. Computational Grids : Provides an infrastructure that couples computers, software/hardware, sensors and others together. The grid can be constructed across LAN, WAN and other networks on a regional, national or global scale. They are also termed as virtual platforms. Computers, workstations, servers and clusters are used in a grid. Note that PCs, laptops and others can be viewed as access devices to a grid system. Figure below shows an example grid built by different organisations over multiple systems of different types, with different operating systems.
Grid Families : Grid technology demands new distributed computing models, software/middleware support, network protocols, and hardware infrastructures. National grid projects are followed by industrial grid platforms by IBM, Microsoft, HP, Dell-EMC, Cisco, and Oracle. 1.3.3 Peer-to-Peer Network Families P2P Systems: In a P2P system, every node acts as both a client and a server, providing part of the system resources. Peer machines are simply client computers connected to the Internet. All client machines act autonomously to join or leave the system freely. This implies that no master-slave relationship exists among the peers. No central coordination or central database is needed. In other words, no peer machine has a global view of the entire P2P system. The system is self-organizing with distributed control. Overlay Networks : An overlay network is a virtual network formed by mapping each physical machine with its ID, through a virtual mapping. If a new peer joins the system, its peer ID is added as a node in the overlay network. The P2P overlay network distinguishes the logical connectivity among the peers. P2P Application Families : There exist 4 types of P2P networks: distributed file sharing, collaborative platform, distributed P2P computing and others. Ex: BitTorrent, Napster, Skype, Geonome, JXTA, .NET etc. 1.3.4 Cloud Computing Over the Internet Cloud Computing is defined by IBM as- “A cloud is a pool of virtualized computer resources. A cloud can host a variety of different workloads that include batch-style backend jobs and interactive and user-facing applications”. A cloud allows workloads to be deployed and scaled out through rapid provisioning of physical or virtual systems. The cloud supports redundant, self-recovering, and highly scalable programming models that allow workloads to recover from software or hardware
Platform as a Service (PaaS) : In this model, the user can install his own apps onto a virtualized cloud platform. PaaS includes middleware, DBs, development tools, and some computing languages. It includes both hardware and software. The provider supplies the API and the software tools (ex: Java, Python, .NET). The user need not manage the cloud infrastructure which is taken care of by the provider. Software as a Service (SaaS) : It is browser-initiated application software paid cloud customers. This model is used in business processes, industry applications, CRM, ERP, HR and collaborative (joint) applications. Ex: Google Apps, Twitter, Facebook, Cloudera, Salesforce etc. 1.4 SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS 1.4.1 Service-Oriented Architecture (SOA) In grids/web services, Java, and CORBA, an entity is, respectively, a service, a Java object, and a CORBA distributed object in a variety of languages. These architectures build on the traditional seven Open Systems Interconnection (OSI) layers that provide the base networking abstractions. On top of this we have a base software environment, which would be .NET or Apache Axis for web services, the Java Virtual Machine for Java, and a broker network for CORBA. On top of this base environment one would build a higher level environment reflecting the special features of the distributed computing environment. The Evolution of SOA: The service-oriented architecture (SOA) has evolved over the years. SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also known as interclouds), and systems of systems in general. A large number of sensors provide data-collection services, denoted in the figure as SS (sensor service).
A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a GPA, or a wireless phone, among other things. Raw data is collected by sensor services. All the SS devices interact with large or small computers, many forms of grids, databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on. Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to respond to specific requests from the web, the grid, or web services. 1.4.2 Trends toward Distributed Operating Systems A distributed system inherently has multiple system images. This is mainly due to the fact that all node machines run with an independent operating system. To promote resource sharing and fast communication among node machines, it is best to have a distributed OS that manages all resources coherently and efficiently. Such a system is most likely to be a closed system, and it will likely rely on message passing and RPCs for internode communications. It should be pointed out that a distributed OS is crucial for upgrading the performance, efficiency, and flexibility of distributed applications. (a) Amoeba versus DCE (b) MOSIX2 for Linux Clusters 1.4.3 Parallel and Distributed Programming Models In this section, we will explore programming models for distributed computing with expected scalable performance and application flexibility. Message-Passing Interface (MPI) : MPI is a library of sub-programs that can be called from C or FORTRAN to write parallel programs running on a distributed system. The goal here is to represent clusters, grid systems, and P2P systems with upgraded web services and other utility apps. Distributed programming can also be supported by Parallel Virtual Machine (PVM) MapReduce : it is a web programming model for scalable data processing on large data clusters. It is applied mainly in web-scale search and cloud computing apps. The user specifies a Map function to generate a set of intermediate key/value pairs. Then the user applies a Reduce function to merge all intermediate values with the same (intermediate) key. MapReduce is highly scalable to explore high degrees of parallelism at different job levels and can handle terabytes of data on thousands of
The maximum speedup of n can be obtained only if α is reduced to zero or the code can be parallelized with α = 0. 1.5.2 Fault Tolerance and System Availability In addition to performance, system availability and application flexibility are two other important design goals in a distributed computing system. System Availability : High availability (HA) is needed in all clusters, grids, P2P networks and cloud systems. A system is highly available if it has a long mean time to failure (MTTF) and a short mean time to repair (MTTR). System Availability = MTTF/(MTTF + MTTR) System availability depends on many factors like hardware, software and network components. Any failure that will lead to the failure of the total system is known as a single point of failure. It is the general goal of any manufacturer or user to bring out a system with no single point of failure. For achieving this goal, the factors that need to be considered are: adding hardware redundancy, increasing component reliability and designing testability. 1.5.3 Network Threats and Data Integrity This section introduces system vulnerability, network threats, defense countermeasures, and copyright protection in distributed or cloud computing systems. Threats to networks and systems : The Figure below presents a summary of various attack types and the damaged caused by them to the users. Information leaks lead to a loss of confidentiality. Loss of data integrity can be caused by user alteration, Trojan horses, service spoofing attacks, and Denial of Service (DoS) – this leads of loss of Internet connections and system operations. Users need to protect clusters, grids, clouds and P2P systems from malicious intrusions that may destroy hosts, network and storage resources. Security Responsibilities : The main responsibilities include confidentiality, integrity and availability for most Internet service providers and cloud users. In the order of SaaS, PaaS and IaaS, the providers increase/transfer security control to the users. IN brief, the SaaS model relies on the cloud provider for all the security features. On the other hand, IaaS wants the users to take control of all security functions, but their availability is still decided by the providers. Finally, the PaaS model divides the
security aspects in this way: data integrity and availability is with the provider while confidentiality and privacy control is the burden of the users. Copyright Protection : Collusive (secret agreement) piracy is the main source of copyright violation within the boundary of a P2P network. Clients may illegally share their software, allotted only to them, with others thus triggering piracy. One can develop a proactive (control the situation before damage happens) content poisoning scheme to stop colluders (conspirers) and pirates, detect them and stop them to proceed in their illegal work. System Defence Technologies : There exist three generations of network defence. In the first generation, tools were designed to prevent intrusions. These tools established themselves as access control policies, cryptographic systems etc. but an intruder can always slip into the system since there existed a weak link every time. The second generation detected intrusions in a timely manner to enforce remedies. Ex: Firewalls, intrusion detection systems (IDS), public key infrastructure (PKI) services (banking, e-commerce), reputation systems etc. The third generation provides more intelligent responses to intrusions. Data Protection Infrastructure : Security infrastructure is required to protect web and cloud services. At the user level, one needs to perform trust negotiation and reputation aggregation over all users. At the app end, we need to establish security precautions and intrusion detection systems to restrain virus, worm, malware, and DDoS attacks. Piracy and copyright violations should also be detected and contained. These can be studied in detail later when the three types of clouds are encountered and the general services offered by the cloud are discussed. 1.5.4 Energy Efficiency in Distributed Computing Primary performance goals in conventional parallel and distributed computing systems are high performance and high throughput, considering some form of performance reliability (e.g., fault tolerance and security Energy Consumption of Unused Servers To run a server farm (data center) a company has to spend a huge amount of money for hardware, software, operational support, and energy every year. Therefore, companies should thoroughly identify whether their installed server farm (more specifically, the volume of provisioned resources) is at an appropriate level, particularly in terms of utilization. It was estimated in the past that, on average, one-sixth (15 percent) of the full-time servers in a company are left powered on without being actively used (i.e., they are idling) on a daily basis. This indicates that with 44 million servers in the world, around 4.7 million servers are not doing any useful work. Reducing Energy in Active Servers In addition to identifying unused/underutilized servers for energy savings, it is also necessary to apply appropriate techniques to decrease energy consumption in active distributed systems with negligible influence on their performance.