nd International Conference on Parallel Processing

2013 42nd International Conference on Parallel Processing (ICPP 2013) Lyon, France 1-4 October 2013 Pages 1-592 IEEE Catalog Number: ISBN: 1/2 CF...
Author: Bertha Jenkins
0 downloads 1 Views 120KB Size
2013 42nd International Conference on Parallel Processing (ICPP 2013)

Lyon, France 1-4 October 2013

Pages 1-592

IEEE Catalog Number: ISBN:

1/2

CFP13127-POD 978-1-4799-1448-7

2013 42nd International Conference on Parallel Processing

ICPP 2013 Table of Contents Message from the ICPP 2013 General Chairs...................................................................................................................................................xvi Message from the ICPP 2013 Program Co-Chairs ............................................................................................................................................xvii Organizing Committee......................................................................................................................xviii Tracks....................................................................................................................................................xx Special Members................................................................................................................................xxv Reviewers...........................................................................................................................................xxvi

Main Conference Papers Algorithms 1 An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU Implementation .................................................................................................1 Akihiko Kasagi, Koji Nakano, and Yasuaki Ito AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs ...................................................................................................11 Marco Maggioni and Tanya Berger-Wolf A Push-Relabel-Based Maximum Cardinality Bipartite Matching Algorithm on GPUs ...............................................................................................................................................21 Mehmet Deveci, Kamer Kaya, Bora Ucar, and Ümit V. Çatalyürek

Applications 1 Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions ..........................................................................................................................................30 David Ozog, Jeff R. Hammond, James Dinan, Pavan Balaji, Sameer Shende, and Allen Malony Efficient Data Redistribution Methods for Coupled Parallel Particle Codes .........................................40 Michael Hofmann and Gudula Rünger

A Diffusion-Based Processor Reallocation Strategy for Tracking Multiple Dynamically Varying Weather Phenomena ..........................................................................................50 Preeti Malakar, Vijay Natarajan, Sathish S. Vadhiyar, and Ravi S. Nanjundiah

Architectures 1 HAccRG: Hardware-Accelerated Data Race Detection in GPUs .........................................................60 Anup Holey, Vineeth Mekkat, and Antonia Zhai Adaptive Runtime Selection for GPU ...................................................................................................70 Jean-François Dollinger and Vincent Loechner Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs ...........................................................................................80 Sreeram Potluri, Khaled Hamidouche, Akshay Venkatesh, Devendar Bureddy, and Dhabaleswar K. Panda

Algorithms 2 On Scientific Workflow Scheduling in Clouds under Budget Constraint ...............................................90 Xiangyu Lin and Chase Qishi Wu On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks ..................................................................................................................................................100 Jeeva Paudel, Olivier Tardieu, and José Nelson Amaral A Dynamic Moldable Job Scheduling Based Parallel SAT Solver ......................................................110 Sajjad Asghar, Eric Aubanel, and David Bremner

Networking 1 BlindDate: A Neighbor Discovery Protocol .........................................................................................120 Keyu Wang, Xufei Mao, and Yunhao Liu Freeweb: P2P-Assisted Collaborative Censorship-Resistant Web Browsing ....................................130 Haiying Shen, Alex X. Liu, and Lianyu Zhao Churn: A Key Effect on Real-World P2P Software .............................................................................140 Cheng-Yun Ho, Ming-Chen Chung, Li-Hsing Yen, and Chien-Chao Tseng

Performance Models 1 Flow Migration on Multicore Network Processors: Load Balancing While Minimizing Packet Reordering ............................................................................................................150 Muhammad Faisal Iqbal, Jim Holt, Jee Ho Ryoo, Lizy K. John, and Gustavo de Veciance Prediction of Parallel Speed-Ups for Las Vegas Algorithms ..............................................................160 Charlotte Truchet, Florian Richoux, and Philippe Codognet Empirical Analysis of Space-Filling Curves for Scientific Computing Applications ........................................................................................................................................170 Daryl Deford and Ananth Kalyanaraman

Algorithms 3 Engineering High-Performance Community Detection Heuristics for Massive Graphs .............................................................................................................................180 Christian L. Staudt and Henning Meyerhenke Cont2: Social-Aware Content and Contact Based File Search in Delay Tolerant Networks ...............................................................................................................................190 Kang Chen and Haiying Shen Hypergraph Sparsification and Its Application to Partitioning .............................................................200 Mehmet Deveci, Kamer Kaya, and Ümit V. Çatalyürek

Applications 2 Fast Approximate Subgraph Counting and Enumeration ...................................................................210 George M. Slota and Kamesh Madduri Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching ..........................................................................................................................220 Ryoma Sin’ya, Kiminori Matsuzaki, and Masataka Sassa Expression Tree Evaluation by Dynamic Code Generation - Are Accelerators Up for the Task? ............................................................................................................230 Thomas Müller, Josef Weidendorfer, and Andreas Blaszczyk Predicting Execution Readiness of MPI Binaries with FEAM, a Framework for Efficient Application Migration .......................................................................................................240 Karolina Sarnowska-Upton and Andrew Grimshaw

Software 1 A NUMA-Aware Runtime Environment for the Actor Model ...............................................................250 Emilio Francesquini, Alfredo Goldman, and Jean-François Méhaut Integrating Multi-GPU Execution in an OpenACC Compiler ...............................................................260 Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura, and Naoya Maruyama AOmpLib: An Aspect Library for Large-Scale Multi-core Parallel Programming ......................................................................................................................................270 Bruno Medeiros and João L. Sobral HyPHI - Task Based Hybrid Execution C++ Library for the Intel Xeon Phi Coprocessor .......................................................................................................................................280 Jiri Dokulil, Enes Bajrovic, Siegfried Benkner, Martin Sandrieser, and Beverly Bachmayer

Algorithms 4 A Prioritized Distributed Mutual Exclusion Algorithm Balancing Priority Inversions and Response Time ..........................................................................................................290 Jonathan Lejeune, Luciana Arantes, Julien Sopena, and Pierre Sens A Generalized Mutual Exclusion Problem and Its Algorithm ..............................................................300 Aoxueluo, Weigang Wu, Jiannong Cao, and Michel Raynal Efficient Dissemination Algorithm for Scale-Free Topologies .............................................................310 Ruijing Hu, Julien Sopena, Luciana Arantes, Pierre Sens, and Isabelle Demeure

Applications 3 Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs ..............................................................................................................................320 José I. Aliaga, Joaquín Pérez, Enrique S. Quintana-Ortí, and Hartwig Anzt Energy-Efficient Synthetic-Aperture Radar Processing on a Manycore Architecture .........................................................................................................................................330 Zain-Ul-Abdin, Anders Åhlander, and Bertil Svensson Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit ................................................339 Michael C. Delorme, Tarek S. Abdelrahman, and Chengyan Zhao

Performance Models 2 Sampling-Based Phase Classification and Prediction for Multi-threaded Program Execution on Multi-core Architectures .................................................................................349 Chin-Hao Chang, Pangfeng Liu, and Jan-Jan Wu iMeter: An Integrated VM Power Model Based on Performance Profiling ..........................................359 Hailong Yang, Qi Zhao, Zhongzhi Luan, Depei Qian, Ming Xie, Jason Mars, and Lingjia Tang Characterization of Input/Output Bandwidth Performance Models in NUMA Architecture for Data Intensive Applications .......................................................................................369 Tan Li, Yufei Ren, Dantong Yu, Shudong Jin, and Thomas Robertazzi

Algorithms 5 Finite-State Robots in a Warehouse: Achieving Linear Parallel Speedup While Rearranging Objects .................................................................................................................379 Arnold L. Rosenberg Hysteresis Re-chunking Based Metadata Harnessing Deduplication of Disk Images ................................................................................................................................................389 Bing Zhou and Jiangtao Wen Energy-Efficient Leader Election Protocols for Single-Hop Radio Networks ......................................399 Marcin Kardas, Marek Klonowski, and Dominik Pajak

Applications 4 Backing Up Your Data to the Cloud: Want to Pay Less? ...................................................................409 Yingwu Zhu and Justin Masui Handling Uncertainty: Pareto-Efficient BoT Scheduling on Hybrid Clouds ........................................419 M. Reza Hoseinyfarahabady, Hamid R.D. Samani, Luke M. Leslie, Young Choon Lee, and Albert Y. Zomaya Parallel Birth and Death Process for Cell Nuclei Extraction in Histopathology Images ...................................................................................................................429 Christophe Avenel, Pierre Fortin, and Dominique Béréziat

Networking 2 Use of a Mobile Sink for Maximizing Data Collection in Energy Harvesting Sensor Networks ................................................................................................................................439 Xiaojiang Ren, Weifa Liang, and Wenzheng Xu Application-Aware Workload Consolidation to Minimize Both Energy Consumption and Network Load in Cloud Environments ...................................................................449 Nikos Tziritas, Cheng-Zhong Xu, Thanasis Loukopoulos, Samee Ullah Khan, and Zhibin Yu Risk Intelligence: Profiting from Uncertainty in Data Processing System ...........................................458 Si Zheng, Yunhuai Liu, Shanshan Li, Tian He, and Xiangke Liao

Short Papers Characterizing Cloud Applications on a Google Data Center .............................................................468 Sheng Di, Derrick Kondo, and Franck Cappello Protein Structure Prediction on GPU: A Declarative Approach in a Multi-agent Framework ....................................................................................................................474 Federico Campeotto, Agostino Dovier, and Enrico Pontelli Multiple-SPMD Programming Environment Based on PGAS and Workflow toward Post-petascale Computing ......................................................................................................480 Miwako Tsuji, Mitsuhisa Sato, Maxime Hugues, and Serge Petiton An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs ..........................................................................................................486 Kamesh Arumugam, Alexander Godunov, Desh Ranjan, Balša Terzic, and Mohammad Zubair Towards Hardware Realizations of Intelligent Systems: A Cortical Column Approach ............................................................................................................................................492 Anita Tino, Gul N. Khan, and Fei Yuan WormPlanar: Topological Planarization Based Wormhole Detection in Wireless Networks ..........................................................................................................................498 Xiaopei Lu, Dezun Dong, and Xiangke Liao

Java with Auto-parallelization on Graphics Coprocessing Architecture .............................................504 Guodong Han, Chenggang Zhang, King Tin Lam, and Cho-Li Wang Symbolic Analysis of Concurrency Errors in OpenMP Programs .......................................................510 Hongyi Ma, Steve R. Diersen, Liqiang Wang, Chunhua Liao, Daniel Quinlan, and Zijiang Yang Efficient Forwarding of Producer-Consumer Data in Task-Based Programs .....................................517 Madhavan Manivannan, Anurag Negi, and Per Stenström Parallelization of Particle-in-Cell Codes for Nonlinear Kinetic Models from Mathematical Physics .................................................................................................................523 Matthias Korch, Tobias Ramming, and Gerhard Rein On the Scalability of Constraint Programming on Hierarchical Multiprocessor Systems .....................................................................................................................530 Rui Machado, Vasco Pedro, and Salvador Abreu Dynamic Server Provisioning for Carbon-Neutral Data Centers ........................................................536 A.S.M. Hasan Mahmud and Shaolei Ren

Architectures 2 A Flexible Framework to Enhance RAID-6 Scalability via Exploiting the Similarities among MDS Codes ....................................................................................................542 Chentao Wu and Xubin He Load-Balanced Recovery Schemes for Single-Disk Failure in Storage Systems with Any Erasure Code ........................................................................................................552 Xianghong Luo and Jiwu Shu Temporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors ...................................562 Alberto Ros, Blas Cuesta, María E. Gómez, Antonio Robles, and José Duato Distributed Shortcut Networks: Layout-Aware Low-Degree Topologies Exploiting Small-World Effect .............................................................................................................572 Van K. Nguyen, Nhat T.X. Le, Ikki Fujiwara, and Michihiro Koibuchi

Networking 3 Efficient Routing Mechanisms for Dragonfly Networks .......................................................................582 Marina García, Enrique Vallejo, Ramón Beivide, Miguel Odriozola, and Mateo Valero Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters .............................................................................................................................................593 Timo Schneider, Torsten Hoefler, Ryan E. Grant, Brian W. Barrett, and Ron Brightwell Efficient Information Dissemination in Dynamic Networks .................................................................603 Zhiwei Yang, Weigang Wu, Yishun Chen, and Jun Zhang A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems .........................................................611 K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, and D.K. Panda

Software 2 HEUSPEC: A Software Speculation Parallel Model ...........................................................................621 Fan Xu, Li Shen, Zhiying Wang, Hui Guo, Bo Su, and Wei Chen Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations .......................................................................................631 Md. Ziaul Haque, Qing Yi, James Dinan, and Pavan Balaji High-Performance Design of Hadoop RPC with RDMA over InfiniBand ............................................641 Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. (Dk) Panda Mixed Model Universal Software Thread-Level Speculation ..............................................................651 Zhen Cao and Clark Verbrugge

Workshop Papers P2S2 : 6th International Workshop on Parallel Programming Models and Systems Software for High-End Computing A Flexible Approach to Staged Events ...............................................................................................661 Tiago Salmito, Ana Lúcia de Moura, and Noemi Rodriguez ConMR: Concurrent MapReduce Programming Model for Large Scale Shared-Data Applications ...................................................................................................................671 Fan Zhang, Qutaibah. M. Malluhi, and Tamer M. Elsyed Read-Write Lock Allocation in Software Transactional Memory .........................................................680 Amir Ghanbari Bavarsad and Ehsan Atoofian A Heterogeneous Computing Framework for Computational Finance ...............................................688 Gordon Inggs, David Thomas, and Wayne Luk A Framework for Performance-Aware Composition of Applications for GPU-Based Systems .....................................................................................................................698 Usman Dastgeer and Christoph Kessler Exploiting Execution Order and Parallelism from Processing Flow Applying Pipeline-Based Programming Method on Manycore Accelerators .....................................................708 Shinichi Yamagiwa, Ryo Jozaki, Shixun Zhang, Ryo Zaizen, and Dewen Xu Performance Tuning on Multicore Systems for Feature Matching within Image Collections .....................................................................................................................718 Xiaoxin Tang, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung Leung, and Minyi Guo X-kaapi: A Multi Paradigm Runtime for Multicore Architectures .........................................................728 Thierry Gautier, Fabien Lementec, Vincent Faucher, and Bruno Raffin Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ..........................................736 Arunmoezhi Ramachandran, Jerome Vienne, Rob Van Der Wijngaart, Lars Koesterke, and Ilya Sharapov

Tiled QR Decomposition and Its Optimization on CPU and GPU Computing System ................................................................................................................................................744 Dongjin Kim and Kyu-Ho Park Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms ...............................................................................................................................754 Jean-Noël Quintin, Khalid Hasanov, and Alexey Lastovetsky

SRMPDS: 9th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems A Scalability Model for Distributed Resource Management in Real-Time Online Applications .............................................................................................................................763 Dominik Meiländer, Sebastian Köttinger, and Sergei Gorlatch A Dynamic Resource Management System for Network-Attached Accelerator Clusters ...........................................................................................................................773 Suraj Prabhakaran, Mohsin Iqbal, Sebastian Rinke, and Felix Wolf Enhanced Resource Management Enabling Standard Parameter Sweep Jobs for Scientific Applications ...........................................................................................................783 Sonja Holl, Shahbaz Memon, Bernd Schuller, Morris Riedel, Yassene Mohammed, Magnus Palmblad, and Andrew Grimshaw Pipelining/Overlapping Data Transfer for Distributed Data-Intensive Job Execution ............................................................................................................................................791 Eun-Sung Jung, Ketan Maheshwari, and Rajkumar Kettimuthu Scheduling Data Parallel Workloads - A Comparative Study of Two Common Algorithmic Approaches ......................................................................................................798 Mahadevan Balasubramaniam, Ioana Banicescu, and Florina M. Ciorba A Model Based Load-Balancing Method in IaaS Cloud ......................................................................808 Zhenzhong Zhang, Limin Xiao, Yuan Tao, Ji Tian, Shouxin Wang, and Hua Liu Extending Battery Life of a Multi-buffered, Single-Threaded Processor in a Mobile Computing Device ................................................................................................................817 Rashid Khogali and Olivia Das

PASA: 2nd International Workshop on Power-Aware Algorithms, Systems, and Architectures Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU ..................................................826 Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong Revisiting Server Energy Proportionality ............................................................................................834 Chung-Hsing Hsu and Stephen W. Poole 2-Covered Path Routing for Antennas with Variable Transmission Ranges ......................................841 Da-Ren Chen, Chiun-Chieh Hsu, and Chiun-Fu Kuo Relating Application Memory Activity to Processor Power .................................................................849 Saman Khoshbakht and Nikitas Dimopoulos

Power-Aware Multi-data Center Management Using Machine Learning ...........................................858 Josep Ll. Berral, Ricard Gavaldà, and Jordi Torres Analytical Energy Models for MPI Communications on a Sandy-Bridge Architecture .........................................................................................................................................868 Francisco Almeida, Vicente Blanco, Isidro González, Alberto Cabrera, and Domingo Giménez

HUCAA: International Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications Efficient Offloading of Parallel Kernels Using MPI_Comm_Spawn ....................................................877 Sebastian Rinke, Suraj Prabhakaran, and Felix Wolf The DEEP Project - Pursuing Cluster-Computing in the Many-Core Era ...........................................885 Norbert Eicker, Thomas Lippert, Thomas Moschny, and Estela Suarez Integration of a Highly Scalable, Multi-FPGA-Based Hardware Accelerator in Common Cluster Infrastructures .....................................................................................................893 Oliver Knodel, Andy Georgi, Patrick Lehmann, Wolfgang E. Nagel, and Rainer G. Spallek GPU Powered ROSA Analyzer ..........................................................................................................901 Raúl Pardo, Fernando L. Pelayo, and Pedro Valero Lara Achieving Speedup in Aggregate Risk Analysis Using Multiple GPUs ...............................................909 A.K. Bahl, O. Baltzer, A. Rau-Chaplin, B. Varghese, and A. Whiteway

AWASN: International Workshop on Applications of Wireless Ad-Hoc and Sensor Networks iTraffic: A Smartphone-based Traffic Information System ..................................................................917 Yi-Ta Chuang, Chih-Wei Yi, Yin-Chih Lu, and Pei-Chuan Tsai An Indoor Collaborative Pedestrian Dead Reckoning System ...........................................................923 Yi-Ting Li, Guaning Chen, and Min-Te Sun Development of Emergency Rescue Evacuation Support System (ERESS) in Panic-Type Disasters: Disaster Detection by Positioning Area of Terminals ........................................................................................................................................931 Takafumi Nakamura, Katsunori Kogo, Jun Fujimura, Kentaro Tsudaka, Tomotaka Wada, Kazuhiro Ohtsuki, and Hiromi Okada Secure Homomorphic and Searchable Encryption in Ad Hoc Networks ............................................937 Scott C.-H. Huang, Qiao-Wei Lin, and Chih-Kai Chang Dynamic Content Adjustment in Mobile Ad Hoc Networks .................................................................943 Shih-Rong Yang, Guaning Chen, and Min-Te Sun

PSTI : 4th International Workshop on Parallel Software Tools and Tool Infrastructures Automatic Extraction of Task-Level Parallelism for Heterogeneous MPSoCs ...................................950 Daniel Cordes, Olaf Neugebauer, Michael Engel, and Peter Marwedel Toward a Performance/Resilience Tool for Hardware/Software Co-design of High-Performance Computing Systems .........................................................................................960 Christian Engelmann and Thomas Naughton Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2 ..............................................................................................970 Michael Wagner, Andreas Knüpfer, and Wolfgang E. Nagel Is Source-Code Isolation Viable for Performance Characterization? .................................................977 Chadi Akel, Yuriy Kashnikov, Pablo de Oliveira Castro, and William Jalby Event Streaming for Online Performance Measurements Reduction .................................................985 Jean-Baptiste Besnard, Marc Pérache, and William Jalby Intralayer Communication for Tree-Based Overlay Networks ............................................................995 Tobias Hilbrich, Joachim Protze, Bronis R. de Supinski, Martin Schulz, Matthias S. Müller, and Wolfgang E. Nagel Discovery of Potential Parallelism in Sequential Programs ..............................................................1004 Zhen Li, Ali Jannesari, and Felix Wolf StreamMine3G OneClick—Deploy and Monitor ESP Applications with a Single Click ....................................................................................................................................1014 Andrey Brito, André Martin, Christof Fetzer, Isabelly Rocha, and Telles Nóbrega

EMS : International Workshop on Embedded Multi-core Systems A Server Model for Reliable Communication on Cell/B.E. ................................................................1020 Rui Zhou, Huaming Chen, Qun Liu, Yong Sheng, Qingguo Zhou, Xuan Wang, and Kuan-Ching Li A Power-Aware Study of Iris Matching Algorithms on Intel’s SCC ...................................................1028 Gildo Torres, Jed Kao-Tung Chang, Fang Hua, Chen Liu, and Stephanie Schuckers Hardware-Specific Bare-Metal Microhypervisor Prototype ...............................................................1038 Ivan Kolchin, Maxim Nikolaev, Stanislav Parfenov, Oleg Popkov, and Sergey Sobolev Thermal-Aware Scheduling Collaborating with OS and Architecture ...............................................1044 Cheng-Yu Lee, Shuang-Jhu Yang, and Rong-Guey Chang Compilers for Low Power with Design Patterns on Embedded Multicore Systems ............................................................................................................................................1052 Cheng-Yen Lin, Chi-Bang Kuan, and Jenq Kuen Lee

WATCC : International Workshop on Advanced Technologies of Cloud Computing Mechanism of Automatic Deployment for Virtual Network Environment ..........................................1061 Min-Xiou Chen and Kuo-Le Mei Secure PHR Access Control Scheme for Healthcare Application Clouds ........................................1067 Chia-Hui Liu, Fong-Qi Lin, Dai-Lun Chiang, Tzer-Long Chen, Chin-Sheng Chen, Han-Yu Lin, Yu-Fang Chung, and Tzer-Shyong Chen Cycles Embedding of Twisted Cubes ...............................................................................................1077 Pao-Lien Lai, Kao-Lin Hu, and Hong-Chun Hsu A Secure Cloud-Based Payment Model for M-Commerce ...............................................................1082 Tao-Ku Chang Author Index ....................................................................................................................................1087

Suggest Documents