IEEE 2017-2018 Networking Projects in Java

Abstract:

Open communications over the Internet pose serious threats to countries with repressive regimes, leading them to develop and deploy censorship mechanisms within their networks. Unfortunately, existing censorship circumvention systems do not provide high availability guarantees to their users, as censors can easily identify, hence disrupt, the traffic belonging to these systems using today's advanced censorship technologies. In this paper, we propose Serving the Web by Exploiting Email Tunnels (SWEET), a highly available censorship-resistant infrastructure. SWEET works by encapsulating a censored user's traffic inside email messages that are carried over public email services like Gmail and Yahoo Mail. As the operation of SWEET is not bound to any specific email provider, we argue that a censor will need to block email communications all together in order to disrupt SWEET, which is unlikely as email constitutes an important part of today's Internet. Through experiments with a prototype of our system, we find that SWEET's performance is sufficient for Web browsing. In particular, regular Websites are downloaded within couple of seconds.

Abstract:

Legacy networks are often designed to operate with simple single-path routing, like shortest-path, which is known to be throughput suboptimal. On the other hand, previously proposed throughput optimal policies (i.e., backpressure) require every device in the network to make dynamic routing decisions. In this work, we study an overlay architecture for dynamic routing such that only a subset of devices (overlay nodes) need to make dynamic routing decisions. We determine the essential collection of nodes that must bifurcate traffic for achieving the maximum multicommodity network throughput. We apply our optimal node placement algorithm to several graphs and the results show that a small fraction of overlay nodes is sufficient for achieving maximum throughput. Finally, we propose a heuristic policy (OBP), which dynamically controls traffic bifurcations at overlay nodes. In all studied simulation scenarios, OBP not only achieves full throughput, but also reduces delay in comparison to the throughput optimal backpressure routing.

Abstract:

We investigate the capability of localizing node failures in communication networks from binary states (normal/failed) of end-to-end paths. Given a set of nodes of interest, uniquely localizing failures within this set requires that different observable path states associate with different node failure events. However, this condition is difficult to test on large networks due to the need to enumerate all possible node failures. Our first contribution is a set of sufficient/necessary conditions for identifying a bounded number of failures within an arbitrary node set that can be tested in polynomial time. In addition to network topology and locations of monitors, our conditions also incorporate constraints imposed by the probing mechanism used. We consider three probing mechanisms that differ according to whether measurement paths are (i) arbitrarily controllable, (ii)controllable but cycle-free, or (iii) uncontrollable (determined by the default routing protocol). Our second contribution is to quantify the capability of failure localization through (1)the maximum number of failures (anywhere in the network) such that failures within a given node set can be uniquely localized, and (2) the largest node set within which failures can be uniquely localized under a given bound on the total number of failures. Both measures in (1–2) can be converted into functions of a per-node property, which can be computed efficiently based on the above sufficient/necessary conditions. We demonstrate how measures (1–2) proposed for quantifying failure localization capability can be used to evaluate the impact of various parameters, including topology, number of monitors, and probing mechanisms.

Abstract:

This paper investigates the problem of finding optimal paths in single-source single-destination accumulative multi-hop networks. We consider a single source that communicates to a single destination assisted by several relays through multiple hops. At each hop, only one node transmits, while all the other nodes receive the transmitted signal, and store it after processing/decoding and mixing it with the signals received in previous hops. That is, we consider that terminals make use of advanced energy accumulation transmission/reception techniques, such as maximal ratio combining reception of repetition codes, or information accumulation with rateless codes. Accumulative techniques increase communication reliability, reduce energy consumption, and decrease latency. We investigate the properties that a routing metric must satisfy in these accumulative networks to guarantee that optimal paths can be computed with Dijkstra's algorithm. We model the problem of routing in accumulative multi-hop networks, as the problem of routing in a hypergraph. We show that optimality properties in a traditional multi-hop network (monotonicity and isotonicity) are no longer useful and derive a new set of sufficient conditions for optimality. We illustrate these results by studying the minimum energy routing problem in static accumulative multi-hop networks for different forwarding strategies at relays.

Abstract:

Nowadays, a big part of people rely on available content in social media in their decisions (e.g., reviews and feedback on a topic or product). The possibility that anybody can leave a review provides a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research, and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this paper, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review data sets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features helps us to obtain better results in terms of different metrics experimented on real-world review data sets from Yelp and Amazon Web sites. The results show that NetSpam outperforms the existing methods and among four categories of features, including review-behavioral, user-behavioral, review-linguistic, and user-linguistic, the first type of features performs better than the other categories.

Abstract:

Uncertain data clustering has been recognized as an essential task in the research of data mining. Many centralized clustering algorithms are extended by defining new distance or similarity measurements to tackle this issue. With the fast development of network applications, these centralized methods show their limitations in conducting data clustering in a large dynamic distributed peer-to-peer network due to the privacy and security concerns or the technical constraints brought by distributive environments. In this paper, we propose a novel distributed uncertain data clustering algorithm, in which the centralized global clustering solution is approximated by performing distributed clustering. To shorten the execution time, the reduction technique is then applied to transform the proposed method into its deterministic form by replacing each uncertain data object with its expected centroid. Finally, the attribute-weight-entropy regularization technique enhances the proposed distributed clustering method to achieve better results in data clustering and extract the essential features for cluster identification. The experiments on both synthetic and real-world data have shown the efficiency and superiority of the presented algorithm.

Abstract:

Fraudulent behaviors in Google Play, the most popular Android app market, fuel search rank abuse and malware proliferation. To identify malware, previous work has focused on app executable and permission analysis. In this paper, we introduce FairPlay, a novel system that discovers and leverages traces left behind by fraudsters, to detect both malware and apps subjected to search rank fraud. FairPlay correlates review activities and uniquely combines detected review relations with linguistic and behavioral signals gleaned from Google Play app data (87 K apps, 2.9 M reviews, and 2.4M reviewers, collected over half a year), in order to identify suspicious apps. FairPlay achieves over 95 percent accuracy in classifying gold standard datasets of malware, fraudulent and legitimate apps. We show that 75 percent of the identified malware apps engage in search rank fraud. FairPlay discovers hundreds of fraudulent apps that currently evade Google Bouncer's detection technology. FairPlay also helped the discovery of more than 1,000 reviews, reported for 193 apps, that reveal a new type of “coercive” review campaign: users are harassed into writing positive reviews, and install and review other apps.

Abstract:

Fog computing, known as “cloud closed to ground,” deploys light-weight compute facility, called Fog servers, at the proximity of mobile users. By precatching contents in the Fog servers, an important application of Fog computing is to provide high-quality low-cost data distributions to proximity mobile users, e.g., video/live streaming and ads dissemination, using the single-hop low-latency wireless links. A Fog computing system is of a three tier Mobile-Fog-Cloud structure; mobile user gets service from Fog servers using local wireless connections, and Fog servers update their contents from Cloud using the cellular or wired networks. This, however, may incur high content update cost when the bandwidth between the Fog and Cloud servers is expensive, e.g., using the cellular network, and is therefore inefficient for nonurgent, high volume contents. How to economically utilize the Fog-Cloud bandwidth with guaranteed download performance of users thus represents a fundamental issue in Fog computing. In this paper, we address the issue by proposing a hybrid data dissemination framework which applies software-defined network and delay-tolerable network (DTN) approaches in Fog computing. Specifically, we decompose the Fog computing network with two planes, where the cloud is a control plane to process content update queries and organize data flows, and the geometrically distributed Fog servers form a data plane to disseminate data among Fog servers with a DTN technique. Using extensive simulations, we show that the proposed framework is efficient in terms of data-dissemination success ratio and content convergence time among Fog servers.

Abstract:

This paper studies the cloud market for computing jobs with completion deadlines, and designs efficient online auctions for cloud resource provisioning. A cloud user bids for future cloud resources to execute its job. Each bid includes: 1 a utility, reflecting the amount that the user is willing to pay for executing its job and 2 a soft deadline, specifying the preferred finish time of the job, as well as a penalty function that characterizes the cost of violating the deadline. We target cloud job auctions that executes in an online fashion, runs in polynomial time, provides truthfulness guarantee, and achieves optimal social welfare for the cloud ecosystem. Towards these goals, we leverage the following classic and new auction design techniques. First, we adapt the posted pricing auction framework for eliciting truthful online bids. Second, we address the challenge posed by soft deadline constraints through a new technique of compact exponential-size LPs coupled with dual separation oracles. Third, we develop efficient social welfare approximation algorithms using the classic primal-dual framework based on both LP duals and Fenchel duals. Empirical studies driven by real-world traces verify the efficacy of our online auction design.

Abstract:

Location information of Web pages plays an important role in location-sensitive tasks such as Web search ranking for location-sensitive queries. However, such information is usually ambiguous, incomplete, or even missing, which raises the problem of location prediction for Web pages. Meanwhile, Web pages are massive and often noisy, which pose challenges to the majority of existing algorithms for location prediction. In this paper, we propose a novel and scalable location prediction framework for Web pages based on the query-URL click graph. In particular, we introduce a concept of term location vectors to capture location distributions for all terms and develop an automatic approach to learn the importance of each term location vector for location prediction. Empirical results on a large URL set demonstrate that the proposed framework significantly improves the location prediction accuracy comparing with various representative baselines. We further provide a principled way to incorporate the proposed framework into the search ranking task and experimental results on a commercial search engine show that the proposed method remarkably boosts the ranking performance for location-sensitive queries.