Performance Analysis of Transactional Applications in AMD Quad-Core and Intel i5 Processor Systems

Proc. of Int. Conf. on Advances in Communication, Network, and Computing, CNC Performance Analysis of Transactional Applications in AMD Quad-Core and...
1 downloads 2 Views 2MB Size
Proc. of Int. Conf. on Advances in Communication, Network, and Computing, CNC

Performance Analysis of Transactional Applications in AMD Quad-Core and Intel i5 Processor Systems Vijaya Shetty S1 and Dr. Sarojadevi H2 1

NMIT/Department of Computer Science and Engineering, Bangalore, India Email: [email protected] 2 NMIT/Department of Computer Science and Engineering, Bangalore, India Email: [email protected]

Abstract— A transaction processing system is basically an information based system. Performance analysis of transaction processing applications helps identify where and how our application can benefit from available hardware resources. Significant use of transactional applications in enterprises has increased the need for performance analysis of such applications. We designed and implemented an ATM multi-threaded transactional application using VB.Net; measured and compared its performance metrics with the performance metrics of its single-threaded counterpart using Intel VTune Amplifier XE 2011. For multi-threaded transactional application, elapsed time ,overhead time and CPU time are reduced by 28.625%, 49.016% and 77.725% respectively in a quad-core AMD system with 4GB RAM and 2.5GHz frequency. The measurements of elapsed time, overhead time and CPU time in Intel i5 processor system with 4GB RAM and 1.896GHz frequency is observed to be better than AMD quad-core processor system. In AMD quadcore processor, the efficiency is 0.0412; whereas in Intel i5 processor efficiency is 0.0419 for the application. In Intel i5 processor, thread concurrency is found to be slightly better. Speedup gain in AMD quad-core processor is 1.401; whereas in Intel i5 processor speedup gain is 1.659. The multi-threaded application executes 1.401 times faster in AMD quad-core processor and 1.659 times faster in Intel i5 processor when compared to single-threaded application. In both AMD and Intel processors, multi-threaded application is found to be performing better than single-threaded application. Better performance can be achieved in transactional applications if the application is multi-threaded; provided the thread count and tasks assigned to threads in the application are aligned with the concurrency level of the system. Index Terms— Performance, Hotspots, Locks and Waits, Multi-threading, Concurrency, Efficiency, Speedup

I. INTRODUCTION The goal of performance Analysis is to provide the highest performance of the system at the lowest cost. Performance analysis is required at every stage in the life cycle of a computer system. Performance analysis is required when a system administrator wants to compare a number of systems and wants to decide which system is best for a given set of applications. Performance analysis of a system also helps in determining how © Elsevier, 2014

well it is performing and whether any improvements need to be made, when there are no design alternatives [8]. Here, a system could be any collection of hardware, software, and firmware components. Our research is focussed on performance analysis of Transaction Processing Systems. A transaction processing system is basically an information based system. A transaction in general is an exchange, usually a request and response between a user (human or software) and the system. In a transaction processing system, CPU load is a common source of performance bottleneck. CPU can handle millions of calculations and instructions; but performance degrades when the number of these operations exceeds the capacity or if the instruction scheduling is inefficient. A high percentage of processor time and a high processor queue length may lead to CPU bottleneck. CPU bottleneck can be minimized using techniques such as: o If observed rate of context switching is high, reduce the thread count of the process before increasing the number of processors. o Tune the application that is leading to high CPU utilization. o Analyze the log file generated by the application and isolate the subsystem that takes the maximum amount of execution time. o For multi-threaded applications add multiple processing cores. II. RELATED WORK Multithreading is a feature that allows programs to run threads in parallel. Typically, applications use only one thread when they do not perform computation-intensive tasks. Single-threaded application represents a potential performance bottleneck in many complex software systems. The use of multiple threads allows an application to distribute time consuming tasks to different logical cores; which can be executed in parallel. Multi-threaded applications can be designed for parallel execution of threads using multiple processing elements of the processor. This gives the user the perceived appearance that the application is executing faster .This is due to the fact that while one thread is waiting for an IO, the remaining threads can make use of the available CPU. Multithreading allows working threads to execute in tandem so that they can be completed faster. In general, a multithreaded application has performance benefits such as improved responsiveness, faster applications and prioritization to allow higher priority tasks to take precedence over lower priority tasks. A few research works in the literature have demonstrated performance analysis of transactional applications on multi-core systems [1][2][3]. However to the best of our knowledge a comprehensive evaluation and comparison of multi-threaded and single-threaded transactional application on different processor systems in a realistic context does not exist. The application designed by us for the performance study of transaction processing systems is an ATM application. The ATM application is implemented as multi-tier application using VB.Net framework. We implemented both single-threaded and multi-threaded versions of the same application. The ATM application has different functional units such as Balance Inquiry, Cash Withdrawal, Fast Cash, Money Transfer, Services and Change Pin. In single threaded application, the process is handled by single thread. In multithreaded application, the single process is handled by multiple threads and each of the major functionalities is invoked in its own thread. III. APPROACH TO PERFORMANCE ANALYSIS OF APPLICATIONS Performance analysis of applications is useful in identifying how our applications execute or perform in the underlying system. The summary of performance analysis can be used to find out where and how our applications can benefit from available hardware resources. We analyzed our multi-threaded and singlethreaded applications as per the specifications of VTune Amplifier XE 2011[4] to determine the following aspects of the application. o Hotspots (functions) that consume most of the CPU time. o Code segments that do not utilize available processor time effectively. o Code segments that can be optimized for improved performance of the application. o Thread Concurrency achieved by the system while running the threaded application. A. Performance Parameters and Metrics Following performance parameters are of significant interest to performance of transactional applications. Some of the performance parameters are measured using the VTune Amplifier tool; while others are derived empirically using the information obtained from the profiler. 382

a.

b. c. d. e.

CPU time: The amount of time a thread spends executing on a logical processor. For multiple threads, the CPU time of the threads is summed. The application CPU time is the sum of the CPU time of all the threads that run the application. Elapsed time: The total time our target application ran, calculated as follows: Clock time at end of application – Clock time at start of application. Wait time: The amount of time that a given thread waited for some event to occur, such as synchronization waits and I/O waits. Efficiency: The ratio of the performance of an n-processor system to that of one-processor system is its efficiency [5][6]. Speedup: Is the measure of how much faster a task runs using the machine with the enhancement as opposed to the original machine [7][8][9].

B. Hotspot Analysis The Hotspots analysis is useful in understanding the application flow and identifying sections of code that took a long time to execute (hotspots). Hotspots represent high processor utilization and potential performance bottlenecks. Hotspots can be removed if they are not fundamental to the functionality of the application or can be optimized for improved performance. C. Thread Concurrency The number of active threads in the application corresponds to the concurrency level of an application. By comparing the concurrency level with the number of processors, we can classify how an application utilizes the processors in the system. Thread concurrency may be higher than CPU usage if threads are in the runnable state and not consuming CPU time. VTune Amplifier XE defines the target concurrency level for our application that is, by default, equal to the number of physical cores. IV. PROFILING AND ANALYSIS OF RESULTS Our multi-threaded and single-threaded ATM applications are profiled on two different multi-core systems; AMD dual-core processor system with 1.896GHz frequency and 4GB RAM; and Intel i5 system with 2.5GHz frequency and 4GB RAM. Measured performance metrics, empirically derived performance parameters, CPU usage histograms and callstack graphs for most time-consuming functions (hotspots) are presented in this section. A. Profiling Multi-threaded and single-threaded transactional applications, ATM in AMD quad-core Processor Measured performance values and profiling observations of our multi-threaded and single-threaded ATM applications in AMD quad-core system is shown in Table 1. B. CPU Usage Histogram The histogram in Figure 1 shows breakdown of elapsed time for multi-theaded and single-threaded applications. For multi-threaded application, CPU is in Idle utilisation for 81seconds and in ideal utilization for 28.322seconds.It is observed that for 15.042s only one core (poor); for 9.21s two cores (poor);for 3.07s three cores (ok) and for 1.02s four cores (ideal), were executing the threads of the application. For singletheaded application, CPU is in Idle utilisation for 17.166seconds and in poor utilization for 136seconds. As per the observation, for 134.33seconds only one logical core (poor) and for 1.67s two cores (poor)executed the application. It has been found that the CPU utilization is ideal for multi-threaded as all the four cores have been utilzed .CPU utilization is poor for single-threaded application where only two cores are utilized and utilization is less than the concurrency level of the system. Elapsed Time: The Total Elapsed Time of the applications is calculated as: ≅ + ≅ 81 + 15.022 + 9.21 + 3.07 + 1.02 ≅ 109.322 ≅ 17.166 + 134.33 + 1.67 ≅ 153.166

383

TABLE I: T ABLE OF P ERFORMANCE METRICS FOR M ULTI-T HREADED AND SINGLE-THREADED ATM APPLICATION IN AMD QUADCORE SYSTEM Performance Metric

Measured Value

Observaton

Multi-threaded

Single-thraeded

Elapsed Time

109.322s

153.166s

Total Thread Count

34

28

Application was running for 109.322seconds in multi-threaded and 153.166seconds in singe-threaded environment. Number of threads in the application is 34 in multi-threaded ;which includes 6 spawned threads and 28 event dispatching threads that are automatically generated.In single-threaded application; there are 28 threads which are automatically generated event dispatching threads.

Overhead Time

1.752s

3.966s

Time used to wait for capturing resource is 1.752 seconds in

CPU Time

30.653s

137.506s

muti-threaded, which is 3.966 seconds in single-threaded. Total time spent by application in CPU is 30.653 seconds in multi-thread ;whereas 137.506 in single-threaded.It is observed that the CPU time is approximately 4.5 times more for singlethreaded application as cores are being under-utilized. Paused Time

0s

0s

0 seconds indicates application was not paused or not interrupted.

Figure 1: CPU usage Histogram for multi-threaded and Single-threaded applications in AMD quad-core processor

Speedup Factor: The performance gain that can be obtained by improvement of some portion of a computer system can be calculated using Amdahl’s law. Amdahl’s law states that “the performance improvement to be gained from using some faster mode of execution is limited bythe fraction of the time the faster mode can be used” [7]. Using Amdahl’s law, Speedup achieved by AMD quad-core while executing multi-threaded application can be calculated as:

= =





(



)





=





=

= (

.



) .

= 1.346 ; where 384

=

. .

= 0.898



.

=

.

= 1.401

The Speedup factor of the AMD quad-core system can also be calculated as: (1) ( )

( )=

where T(n) is execution time of multi-threaded application, T(1) is execution time of single-threaded application and n is the number of threads. . = . = 1.401 (1) Efficiency: The Efficiency of the AMD quad-core processor system while executing multi-threaded application can be calculated as: T(1) ( )= nT(n) where T(n) is execution time of multi-threaded application , T(1) is execution time of single-threaded application and n is the number of threads. =

. ×

.

= 0.04121 (2)

Efficiency is an indication of actual degree of speedup performance achieved as compared with maximum value. Since , 1 ≤ S(n) ≤ n ,we have 1/n ≤ E(n) ≤ 1 (3) From (1) and (2) ,we have 1 ≤ 1.401 ≤ 34 and 0.029 ≤ 0.04121 ≤ 1;and hence, proved (3). A.2 Top Hotspots Table 2 shows top hotspots of multi-threaded ATM application in AMD quad-core system Both results show that the most time-consuming hotspot in the application is: Microsoft::VisualBasic:: ApplicationServices::WindowsFormsApplicationBase::OnRun; taking 21.027 seconds in multi-threaded application and 44.521 seconds in single-threaded application as indicated in Table 3. The hotspot functions can be optimized for better performance. TABLE II: T ABLE OF HOTSPOTS IN M ULTI-THREADED APPLICATION IN AMD QUAD-CORE SYSTEM Function Microsoft::VisualBasic::ApplicationServices::WindowsFormsApplicationBase::OnRun

CPU Time 21.027s

Microsoft::VisualBasic::Interaction::CreateObject WaitForSingleObject NtOpenKey

1.311s 1.014s 0.718s

Microsoft::VisualBasic::ApplicationServices::WindowsFormsApplicationBase::DisplaySplash [Others]

0.709s 5.874s

TABLE III: TABLE OF HOTSPOTS IN SINGLE-THREADED APPLICATION IN AMD QUAD-CORE SYSTEM Function

CPU Time

Microsoft::VisualBasic::ApplicationServices::WindowsFormsApplicationBase::OnRun

44.521s

atmminorproject::Form4::Balanceinquiry

19.480s

atmminorproject::Form4::Changepin

16.469s

atmminorproject::Form4::Fastcash

14.660s

atmminorproject::Form4::Moneytransfer

12.913s

[Others]

29.462s

385

Figure 2: Hotspots by CPU usage multi-threaded application

Figure 3: Hotspots by CPU usage single -threaded application

Figure 2 shows the CPU time utilization of the threads in multi-threaded ATM application. The six threads (thread (0x1ad4), thread (0xab0), thread (0x1b04), thread (0x1334), thread (0x1680), thread (0xe40) ) are working in poor CPU utilisation because the application is less CPU-intensive and more IO-intensive. These six threads correspond to six major functionalities of the application. Figure 3 shows the CPU utilization in single-threaded application. For single-threaded application, the main thread (0xc60) itself handles all six major functionalities of the application. B. Profiling Multi-threaded and single-threaded transactional application, ATM in Intel i5 Processor Profiling observations of our multi-threaded and single-threaded ATM application in Intel i5 Processor system is shown in Table 4. Elapsed time ,CPU time and overhead time are found to be lesser for multithreaded application than single-threaded application. TABLE IV: TABLE OF PERFORMANCE METRICS FOR M ULTI-T HREADED AND SINGLE-THREADED ATM APPLICATION IN INTEL I5 SYSTEM Performance Metric

Measured Value Multi-threaded

Single-thraeded

Elapsed Time

60.878s

101.0s

Total Thread Count

34

28

Overhead Time

1.494s

1.677s

CPU Time

14.078s

54.315s

Paused Time

0s

0s

Speedup overall multi-threaded

1.659

Efficiecy (E(n))

0.049

B.1 CPU Usage Histogram The histogram in Figure 4 shows breakdown of elapsed time for multi-threaded and single-threaded applications in Inte i5 system. For multi-threaded application, CPU is in Idle utilisation for 48seconds and in Ideal utilization for 14.078seconds.It is observed that for 4.5s only one core (poor); for 4.35s two cores (poor);for 2.988s three cores (ok) and for 1.04s four cores (ideal), were executing the threads of the application. For single-theaded application. CPU is in Idle utilisation for 17.166seconds and in poor utilization for 136seconds. As per the observation, for 134.33s only one logical core (poor) and for 1.67s two cores (poor)executed the application. C. Performance Comparison of multi-threaded and single-threaded applications in AMD quad-core and Intel i5 Systems Table 5 shows the summary of measured and empirically derived performance parameters of multi-threaded and single-threaded applications in AMD quad-core and Intel i5 processors.It is found that the performance 386

Figure 4: CPU usage Histogram for multi-threaded and single-threaded Application in Intel i5 processor

of multi-threaded application is better in both processors. Speedup gain obtained by threading the application is slightly higher in Intel i5 processor. Efficiency of the Intel i5 processor is also better while executing multi-threaded application.Figure 5 plots elapsed time in executing applications on both processors.Elapsed time for single-threaded application is high in both cases when compared to multi-threaded application.Figure 6 plots CPU time and overhead time for both applications in both the processors.CPU time is less for multi-threaded application in both processors.Overhead time is slightly more for single-threaded application when compared to multi-threaded application because the number of threads and tasks assigned to threads in multi-threaded application are within the limits of the concurrency level of the systems. TABLE V : SUMMARY OF MEASURED AND DERIVED PERFORMANCE P ARAMETERS OF M ULTI-T HREADED AND SINGLE-THREADED APPLICATIONS IN AMD QUAD-CORE AND INTEL I5 PROCESSORS Performance Metric

Elapsed Time Total Thread Count Overhead Time CPU Time Speedup overall multi-threaded Efficiecy (E(n))

Measured Value in AMD quad-core proessor Multi-threaded Single-thraeded 109.322s 153.166s 34 28 1.752s 3.966s 30.653s 137.506s 1.401 0.0412

Figure 5 :Summary of measured and derived performance parameters of multi-threaded and single-threaded applications in AMD quad-core and Intel i5 processors processors

Measured Value in Intel i5 proessor Multi-threaded 60.878s 34 1.494s 14.078s 1.659 0.049

Single-thraeded 101.0s 28 1.677s 54.315s

Figure 6 : CPU time and overhead time for multi-threaded and single-threaded applications in AMD quad-core and Intel i5 processors

387

V. CONCLUSIONS We measured, compared and analyzed performance of our multi-threaded and single-threaded ATM transactional applications using Intel VTune Amplifier XE 2011 on AMD quad-core and Intel i5 processor systems. The comparison results show that the Elapsed Time, CPU Time and Overhead Time of multithreaded application are significantly less in both the processors. Therefore, the performance is better if the application is multi-threaded; provided the thread count is in line with concurrency level of the system and underlying hardware has multiple logical CPU’s. ACKNOWLEDGMENT Our special thanks to our Management, Principal and Head of the department for their continuous research encouragement and motivating guidelines. REFERENCES [1] Ch Cao Minh, JaeWoong Chung, Christos Kozyrakis, Kunle OlukotunG, “STAMP: Stanford Transactional Applications for Multi-Processing,” pp.35-46, Workload Characterization, IISWC 2008, IEEE International Symposium, E-ISBN: 978-1-4244-2778-9. [2] JaeWoong Chung, Hassan Chafi, C. Cao Minh, Austen McDonald, Brian Carlstrom Christos Kozyrakis, Kunle Olukotun, “The Common Case Transactional Behavior of Multithreaded Programs,” Computer Systems Laboratory Stanford University, http://doi.ieeecomputersociety.org/10.1109/HPCA.2006.1598135. [3] M´arcio Castro, Kiril Georgievy, Vania Marangozova,Martin, Jean-Franc¸ois M´ehaut,Luiz Gustavo Fernandes and Miguel Santanay, “Analysis and Tracing of Applications Based on Software Transactional Memory on Multicore Architectures,” pp.199-206, Parallel, Distributed and Network-Based Processing (PDP), 2011, 19th Euromicro International Conference ISSN: 1066-6192. [4] Tutorial: Finding Hotspots Intel® VTune™ Amplifier 2013 for Linux* OS,http://software.intel.com/enus/articles/intel-vtune-amplifier-xe-2011-documentation. [5] Vijaya Shetty S, Dr. Sarojadevi H., “e-Business Performance Issues, Quality Metrics and Development Frameworks” , International Journal of Computer Applications (0975 – 8887) Volume 55– No.7, October 2012 [6] Kai Hwang, “Advanced Computer Architecture Prallelism, Scalability, Programmability”, Tata Mc Graw Hill, 2003. ISBN:0-07-031622-8 [7] John L. Hennessy and A. Patterson ,“Computer Architecture ;A Quantitative Approach”, Fourth Edition, 2006 Morgan Kaufmann,ISBN:978-0-12-370490-0 [8] R. Jain, "The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling,” Wiley- Interscience, New York, NY, April 1991, ISBN: 0471503361. [9] Dezso Sima, Terence Fountain, Peter Kacsuk, “Advanced Computer Architectures: A Design Space Approach,” Third Edition Pearson Education, 1997. ISBN:81-7808-542-9.

388

Suggest Documents