IEEE 2019-2020 Parallel and Distributed Computing Projects in Java

Abstract: Intercloud seeks to facilitate resource sharing among clouds. To support Intercloud, a trust evaluation framework among clouds and users is required. For trust evaluation, conventional protocols are typically based on a centralized architecture focusing on a one-way relationship. For Intercloud, the environment is highly dynamic and distributed, and relationships can be one-way or two-way (i.e., clouds provide services to each other). This paper presents a distributed trust evaluation protocol with privacy protection for Intercloud. The new contributions and innovative features are summarized below. First, feedback is protected by homomorphic encryption with verifiable secret sharing. Second, to cater to the dynamic nature of Intercloud, trust evaluation can be conducted in a distributed manner and is functional even when some of the parties are offline. Third, to facilitate customized trust evaluation, an innovative mechanism is used to store feedback, such that it can be processed flexibly while protecting feedback privacy. The protocol has been proved based on a formal security model. Simulations have been performed to demonstrate the effectiveness of the protocol. The results show that even when half of the clouds are malicious or offline, by choosing suitable operational parameters the protocol can still support effective trust evaluation with privacy protection.
Abstract: Functional dependencies (FDs) play a very important role in many data management tasks such as schema normalization, data cleaning, and query optimization. Meanwhile, there are ever-increasing application demands for efficient FD discovery on large-scale datasets. Unfortunately, due to huge runtime and memory overhead, the existing single-machine FD discovery algorithms are inefficient for large-scale datasets. Recently, distributed data-parallel computing has become the de facto standard for large-scale data processing. However, it is challenging to design an efficient distributed FD discovery algorithm. In this paper, we present SmartFD, which is an efficient and scalable algorithm for distributed FD discovery. First, we propose a novel attribute sorting-based algorithm framework. Next, to discover all the FDs grouped by a given attribute, we propose an efficient distributed algorithm AFDD (Attribute-centric Functional Dependency Discovery). In AFDD, we design an FSEA (Fast Sampling and Early Aggregation) mechanism to improve the efficiency of distributed sampling and propose a memory-efficient index-based method for distributed FD validation. Moreover, AFDD employs an attribute-parallel method to accelerate the pruning-and-generation of candidate FDs. Furthermore, we propose an adaptive switching strategy between distributed sampling and distributed validation based on the unified time-based efficiency metric. Also, we employ a distributed probing based method to make the switching strategy more accurate. Experimental results on Apache Spark reveal that SmartFD outperforms the state-of-the-art single-machine algorithm HyFD and the existing distributed algorithm HFDD with 3.2x-44.9x and 2.5x-455.7x speedup respectively. Moreover, SmartFD achieves good row scalability and column scalability. Additionally, SmartFD has sub-linear node scalability.
Abstract: Support Vector Machines (SVM) are widely used as supervised learning models to solve the classification problem in machine learning. Training SVMs for large datasets is an extremely challenging task due to excessive storage and computational requirements. To tackle so-called big data problems, one needs to design scalable distributed algorithms to parallelize the model training and to develop efficient implementations of these algorithms. In this paper, we propose a distributed algorithm for SVM training that is scalable and communication-efficient. The algorithm uses a compact representation of the kernel matrix, which is based on the QR decomposition of low-rank approximations, to reduce both computation and storage requirements for the training stage. This is accompanied by considerable reduction in communication required for a distributed implementation of the algorithm. Experiments on benchmark data sets with up to five million samples demonstrate negligible communication overhead and scalability on up to 64 cores. Execution times are vast improvements over other widely used packages. Furthermore, the proposed algorithm has linear time complexity with respect to the number of samples making it ideal for SVM training on decentralized environments such as smart embedded systems and edge-based internet of things, IoT.
The tremendous growth of cloud computing and large-scale data analytics highlight the importance of reducing datacenter power consumption and environmental impact of brown energy. While many Internet service operators have at least partially powered their datacenters by green energy, it is challenging to effectively utilize green energy due to the intermittency of renewable sources, such as solar or wind. We find that the geographical diversity of internet-scale services can be carefully scheduled to improve the efficiency of applying green energy in datacenters. In this paper, we propose a holistic heterogeneity-aware cloud workload management approach, sCloud, that aims to maximize the system goodput in distributed self-sustainable datacenters. sCloud adaptively places the transactional workload to distributed datacenters, allocates the available resource to heterogeneous workloads in each datacenter, and migrates batch jobs across datacenters, while taking into account the green power availability and QoS requirements. We formulate the transactional workload placement as a constrained optimization problem that can be solved by nonlinear programming. Then, we propose a batch job migration algorithm to further improve the system goodput when the green power supply varies widely at different locations. Finally, we extend sCloud by integrating a flexible batch job manager to dynamically control the job execution progress without violating the deadlines. We have implemented sCloud in a university cloud testbed with real-world weather conditions and workload traces. Experimental results demonstrate sCloud can achieve near-to-optimal system performance while being resilient to dynamic power availability. sCloud with the flexible batch job management approach outperforms a heterogeneity-oblivious approach by 37 percent in improving system goodput and 33 percent in reducing QoS violations.
Abstract: To accommodate massive digital data, distributed data stores have become the main solution for cloud services. Among others, key-value stores are widely adopted due to their superior performance. But with the rapid growth of cloud storage, there are growing concerns about data privacy. In this paper, we design and build EncKV, an encrypted and distributed key-value store with rich query support. First, EncKV partitions data records with secondary attributes into a set of encrypted key-value pairs to hide relations between data values. Second, EncKV uses the latest cryptographic techniques for searching on encrypted data, i.e., searchable symmetric encryption (SSE) and order-revealing encryption (ORE) to support secure exact-match and range-match queries, respectively. It further employs a framework for encrypted and distributed indexes supporting query processing in parallel. To address inference attacks on ORE, EncKV is equipped with an enhanced ORE scheme with reduced leakage. For practical considerations, EncKV also enables secure system scaling in a minimally intrusive way. We complete the prototype implementation and deploy it on Amazon Cloud. Experimental results confirm that EncKV preserves the efficiency and scalability of distributed key-value stores.
Cluster schedulers provide flexible resource sharing mechanism for best-effort cloud jobs, which occupy a majority in modern datacenters. Properly tuning a scheduler's configurations is the key to these jobs' performance because it decides how to allocate resources among them. Today's cloud scheduling systems usually rely on cluster operators to set the configuration and thus overlook the potential performance improvement through optimally configuring the scheduler according to the heterogeneous and dynamic cloud workloads. In this paper, we introduce AdaptiveConfig, a run-time configurator for cluster schedulers that automatically adapts to the changing workload and resource status in two steps. First, a comparison approach estimates jobs' performances under different configurations and diverse scheduling scenarios. The key idea here is to transform a scheduler's resource allocation mechanism and their variable influence factors (configurations, scheduling constraints, available resources, and workload status) into business rules and facts in a rule engine, thereby reasoning about these correlated factors in job performance comparison. Secondly, a workload-adaptive optimizer transforms the cluster-level searching of huge configuration space into an equivalent dynamic programming problem that can be efficiently solved at scale. We implement AdaptiveConfig on the popular YARN Capacity and Fair schedulers and demonstrate its effectiveness using real-world Facebook and Google workloads, i.e. successfully finding best configurations for most of scheduling scenarios and considerably reducing latencies by a factor of two with low optimization time.
Abstract: Application partitioning that splits the executions into local and remote parts, plays a critical role in high-performance mobile offloading systems. Optimal partitioning will allow mobile devices to obtain the highest benefit from Mobile Cloud Computing (MCC) or Mobile Edge Computing (MEC). Due to unstable resources in the wireless network (network disconnection, bandwidth fluctuation, network latency, etc.) and at the service nodes (different speeds of mobile devices and cloud/edge servers, memory, etc.), static partitioning solutions with fixed bandwidth and speed assumptions are unsuitable for offloading systems. In this paper, we study how to dynamically partition a given application effectively into local and remote parts while reducing the total cost to the degree possible. For general tasks (represented in arbitrary topological consumption graphs), we propose a Min-Cost Offloading Partitioning (MCOP) algorithm that aims at finding the optimal partitioning plan (i.e., to determine which portions of the application must run on the mobile device and which portions on cloud/edge servers) under different cost models and mobile environments. Simulation results show that the MCOP algorithm provides a stable method with low time complexity which significantly reduces execution time and energy consumption by optimally distributing tasks between mobile devices and servers, besides it adapts well to mobile environmental changes.
Abstract: Memory access is known to be the main bottleneck for shared-memory parallel graph applications especially for large and irregular graphs. Propagation blocking (PB) idea was proposed recently to improve the parallel performance of PageRank and sparse matrix and vector multiplication operations. The idea is based on separating parallel computation into two phases, binning and accumulation, such that random memory accesses are replaced with contiguous accesses. In this paper, we propose an algorithm that allows execution of these two phases concurrently. We propose several improvements to increase parallel throughput, reduce memory overhead, and improve work efficiency. Our experimental results show that our proposed algorithms improve shared-memory parallel throughput by a factor of up to 2x compared to the original PB algorithms. We also show that the memory overhead can be reduced significantly (from 170% down to less than 5%) without significant degradation of performance. Finally, we demonstrate that our concurrent execution model allows asynchronous parallel execution, leading to significant work efficiency in addition to throughput improvements.
Abstract: Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload balance. A heterogeneous-aware Incremental Data Partitioning and Allocation (IDPA) strategy is proposed, where large-scale training datasets are partitioned and allocated to the computing nodes in batches according to their computing power. To minimize the synchronization waiting during the global weight update process, an Asynchronous Global Weight Update (AGWU) strategy is proposed. In the inner-layer parallelism, we further accelerate the training process for each CNN subnetwork on each computer, where computation steps of convolutional layer and the local weight training are parallelized based on task-parallelism. We introduce task decomposition and scheduling strategies with the objectives of thread-level load balancing and minimum waiting time for critical paths. Extensive experimental results indicate that the proposed BPT-CNN effectively improves the training performance of CNNs while maintaining the accuracy.
We propose a reinforcement learning algorithm, Megh, for live migration of virtual machines that simultaneously reduces the cost of energy consumption and enhances the performance. Megh learns the uncertain dynamics of workloads as-it-goes. Megh uses a dimensionality reduction scheme to projectthe combinatorially explosive state-action space to a polynomial dimensional space. These schemes enable Megh to be scalable and to work in real-time. We experimentally validate that Megh is more cost-effective and time-efficient than the MadVM and MMT algorithms.