Taxonomy of Static Code Analysis Tools

Taxonomy of Static Code Analysis Tools Jernej Novak, Andrej Krajnc, Rok Žontar Faculty of Electrical Engineering and Computer Science University of Ma...
Author: Hilary Holmes
4 downloads 0 Views 278KB Size
Taxonomy of Static Code Analysis Tools Jernej Novak, Andrej Krajnc, Rok Žontar Faculty of Electrical Engineering and Computer Science University of Maribor Smetanova 17, 2000 Maribor Phone: (+386) 2-220 7187 Fax: (+386) 2-220 7272 E-mail: {jernej.novak1, andrej.krajnc1, rok.zontar}@uni-mb.si

Abstract - Static code analysis tools are becoming more and more crucial in the software development lifecycle. In this paper we will present today most commonly used static code analysis tools. Because there are many types of these tools for many purposes and many programming languages we will try to classify them according to several categories like: technology, availability of rules, supported languages, extensibility and several other categories and subcategories. The purpose of this article is not to show which static code analysis tool is superior and which not, but to construct a taxonomy of static code analysis tools. After reading this article the reader will understand features and assets of the static code analysis tools.

I. INTRODUCTION In recent years many tools for automatic identification of software defects in source code have been developed. With the use of techniques such as syntactic pattern matching, data flow analysis, model checking and verification theorems, they are more or less successful to locate errors in software code [4]. Many of these tools are seeking the same kind of programming errors, but, to date, only a few tools have been compared with each other. In this paper, we will look at what the static code analyzers can do and we will try to present the systematic tree of the features and assets of the static code analysis tools. Finally, we will attempt to compare a few tools for static analysis of code in our newly developed systematic tree.

II. REVIEW OF THE SOURCE CODE Code reviews have emerged as a successful fight against maintainability problems. Structured design and review of the source code increases reliability and the security of applications [3, 9]. The overview of a code can reduce the number of necessary corrections in the source code. Structured design and code inspections can significantly improve the security of software. Reviews of source code can be easily integrated into the life cycle of software development so that each code review is done by more experienced developers. However, we must recognize that a manual review of the complete source code can take a long time [7]. In addition to effective manual analysis, the reviewers must know the mistakes and deficiencies before they can carefully check all the source code. Because of the following facts it is recommended to review the source code with the help of tools for static analysis instead of a manual inspection: the tools are fast, the code can be inspected much more frequently, can contain the same level of knowledge as a human reviewer. However, we must be aware that these tools can not completely replace the

human hand, which is therefore still needed for the manual review.

III. STATIC CODE ANALYSIS TOOLS Static code analysis is the analysis of software that is executed not to actually run programs made in this code. When we analyze the software to perform this analysis, it is called the dynamic analysis [11]. Let us look at what the tools for static analysis offer and provide. The tools for static analysis of code are looking for common mistakes, which compilers are not looking for or they have not checked them in the past. Static analyzers can significantly contribute to a more secure and reliable software. They can help us find common software errors such as memory overruns, cross site scripting attacks, injections and various other boundary cases [9, 11]. There are many static code analyzers that work in different ways. Some static code analyzers operate on the source code, while others check the intermediate code and the libraries created. Another difference is the fact that different static analyzers operate on different programming languages such as C, C++, C# or Java. Since there are many different analyzers with many different characteristics, we decided to create the taxonomy of static analyzers and classify selected static code analyzers into different groups, to see what their capabilities are and what we can expect from them. A. What kind of errors are found by the static code analyzers Static code analysis tools are looking for a specific set of patterns or rules in the software code, very similar to the way antivirus programs search for viruses. Some of the more advanced tools make it possible to add new rules to a set of predefined ones. We must also be aware that a tool will never find any error if this behavior has not been specified with rules or patterns [11]. These are some examples of the problems [8], which can be recognized by the static code analyzers:  Syntactic problems,  unreachable source code,  undeclared variables,  non-initialized variables,  non used functions and procedures,  variables used before initialization,  non use of values from functions,  wrong use of pointers.

The greatest value of static code analysis tools is the ability to automatically recognize many common programming errors. Unfortunately, errors in the implementation are only part of the problem. Tools cannot verify the design and architectural errors in programs. They cannot find poorly made cryptographic libraries or inappropriate selected algorithms, the design problems, which can cause great confusions, cannot be highlighted. Nor can they find passwords or magic numbers embedded in the code. A further weakness of static code analysis tools is that, they are prone to "manufacture" of so-called "false warnings" or found errors, which is not actually an error. These found "mistakes" we often also identify as "false positives". In this case a tool incorrectly interprets the program code and therefore assumes that there are errors in it. This is particularly known for older, open-source static code analysis tools. These tools are generally not useful for more serious use. Even newer tools are not completely resistant to these "false positives", but things are getting better. B. Used static code analysis tools In the next section we will present some static code analyzers. We decided to use four tools for the classification to minimize the weakness of one tool compared to other tools. We decided for the classification, because the market contains a lot of static code analyzers, and in order to better classify them, we settled to have some classified into our tree with classificatory elements, which are used for static code analysis tools. C. Static code analysis tools StyleCop [1] is a code style and consistency tool for C#. It provides warnings that indicate style and consistency rule violations in C# code. StyleCop is working only on C# source code. It has about 150 rules, which are divided in seven groups:  Documentation Rules  Layout Rules  Maintainability Rules  Naming Rules  Ordering Rules  Readability Rules  Spacing Rules StyleCop also provides an extensible framework for pluggins and custom rules, so we can write custom rules which correspond to our needs. Gendarme [2] is the next static code analysis tool. It is working on Microsoft .NET assemblies so it is working on Microsoft Intermediate Language (MSIL) parsing and callgraph analysis to inspect assemblies. Gendarme is also open source. All the rules within Gendarme are easily extensible, configurable and customizable in the xml configuration file. So we can adopt rules to suite our needs. The current predifined rules in Gendarme are divided in next groups:

                

BadPractice Concurrency Correctness Design Design.Generic Design.Linq Exceptions Interoperability Maintainability Naming Performance Portability Security Security.Cas Serialization Smells Ui

Checkstyle is a development tool to help developers write Java code that adheres to a coding standard [6]. It automates the process of checking Java code to spare humans of this monotonous, but important task. This makes it ideal for projects that want to enforce a coding standard. Checkstyle is highly configurable and can be configured to support almost any coding standard. The tool can check many aspects of a source code. Historically its main functionality has been to check code layout issues, but since the internal architecture was changed in version 3, more and more checks for other purposes have been added. Now Checkstyle provides checks that find class design problems, duplicate code, or bug patterns like double checked locking. Standard checks are applicable to general Java coding style and requires no external libraries. It checks JavaDoc comments, Naming Conventions, Headers, Imports, Size violations, Coding Style, Class Design, Metrics and other important checks. Optional checks are applicable for the J2EE platform requirements. FindBugs is an open source static analysis tool that examines class files or JAR libraries for potential problems by matching the bytecode against a list of bug patterns [5]. The software is distributed as a stand-alone GUI application. There are also plug-ins available for Eclipse, Netbeans and IntelliJ IDEA. The authors of tool were David Hovemeyer and William Pugh. Unlike other static analysis tools, FindBugs doesn't focus on style or formatting; it specifically tries to find real bugs or potential performance problems. As we mentioned, FindBugs works by analyzing Java bytecode (compiled class files), so you do not even need the program's source code to use it. Because its analysis is sometimes imprecise, FindBugs can report false warnings, which are warnings that do not indicate real errors. In practice, the rate of false warnings reported by FindBugs is less than 50%. Current categories in which FindBugs [5] checks code are:  Bad practice  Correctness

      

Experimental Internationalization Malicious code vulnerability Multithreaded correctness Performance Security Dodgy

b.

c. d. e. f. g.

IV. TAXONOMY OF STATIC CODE ANALYSIS TOOLS Because of the increasing number of static code analysis tools, we decided to classify them using a taxonomy tree. Because there are already over fifty static code analyzers on the market, we therefore looked at some static code analyzers and tried to find common characteristics. These common features are presented in the figure (Figure 1). It is not necessary that each tool for static code analysis can be classified in any of these categories, however most of them belong to some category. Now let us present how the systematic tree was composed.

h.

i.

6.

1.

b.

b.

2.

b.

c.

b. c.

.NET – all programming languages which are compiled into libraries or programs of the .NET framework i. VB .NET – supports VB.NET ii. C# - supports C# Java – supports Java programming language C, C++ - supports C or C++ programming language.

Technology - which technologies are used for searching errors in code a. b. c. d.

5.

Frequently >= 3 times a year – new release of the tool is released three or more times per year Occasionally < 3 times a year – new release of the tool is released less than three times per year Obsolete 0 times a year – time from new release is more than a year

Supported languages – which programming languages tool supports a.

4.

Source code – textual source file can be loaded Byte code – file with Java Byte code or Microsoft Intermediate Language (MSIL) can be loaded

Releases – how many releases are per year a.

3.

c.

Input – what types of files can be loaded into tool a.

Dataflow – search for errors with dataflow Syntax – search for errors with syntax correctness Theorem proving – search for errors with proving different theorems Model checking – search for errors with model checking

Rules – set of rules, which are supported by different static code a.

Style – inspects the visualization look of the source code

Configurability – ability to customize tool a.

A. Categories for systematic tree

d.

7.

Possible – it is possible to extend Not Possible – it is not possible to extend

Availability – in what way is tool available a. b. c.

9.

Text document – configuration is set from text document XML – configuration is set from XML document GUI – configuration is set through graphic user interface Ruleset – tool can turn on/off set of rules

Extensibility – if tool can be extended with own rules a. b.

8.

Naming – review of the if the variables are correctly named (spelling, naming standards, …) General – general rules of the static code analysis Concurrency – errors with concurrency running code Exceptions – errors by throwing or not throwing exceptions Performance – errors with performance of the applications Interoperability – errors with common behavior Security – errors which could impact security of the application i. SQL – searches for “SQL injections” and other SQL errors ii. Buffer overflow – security errors, which take advantage from buffer overflow Maintainability – rules for better maintainability of the application

Open Source – tool is free and source code is available Free – tool is free, but source code is not available Commercial – tool is available for payment

User experience – in what way can tool be used in what is offered to us a. b.

c. d.

Environment integration – how is tool integrated with working environment Automatic locating errors in code – when tool finds an error, it can put as at the location of the error Extensive help on faults – if tool gives you help on resolving errors User interface – availability of user interface i. Command Line – it can be run from command line prompt ii. GUI – tool can be run from GUI interface

10. Output – presentation of the results from tool a. b. c. d.

Text file – tool can present results in text file List – tool can present results in custom user interface control in GUI XML file – tool can present results in XML data HTML file – tool can present results in HTML data

These are the categories/criteria for presenting the systematic tree for the static code analysis tools. In Table 1 we present some static code analysis tools and their evaluation against our systematic tree.

Rules

Technology

Supported languages

Releases

Input

Style

Naming

Dataflow General

VB

Concurrency

Syntax

C#

Exceptions

Frequently ≥ 3 times a year

.NET

Source Code

Theorem proving

Performance Interoperability

Occasionally < 3 times a year

Java

SQL

Byte Code

Model checking

Security

C, C++

Maintainability

Buffer overflow

Obsolete 0 times a year

Static Code Analysis Tools CL UI

Rulesets Open source

Not possible XML

Possible

Automatic Locating errors in code Enviroment integration

Commercial

Text document

Configurability

Extensibility

HTML

Extensive help on faults

Free GUI

GUI

Availability User Experience

XML

List

Text

Output

Figure 1 Taxonomy of static code analysis tools

Tabela 1 Analysis of selected static code analysis tools Tool

Rules

Configurability

Extensibility

Technology

Supported

Availability

languages CheckStyle

general, style,

XML file,

naming,

rulesets

Possible

Syntax,

Java

dataflow

User

Releases

Input

Output

experience Open

Environment

Frequently

Source

List,

source

integration,

>= 3 times

code

XML,

CL

a year

performance,

HTML

maintainability FindBugs

general, style

Rulesets

Possible

Syntax,

Java

dataflow

Gendarme

General, style,

Rulesets, GUI

Possible

All .NET

All options

Frequently

Byte

List,

in category

>= 3 times

code

XML

included

a year List

Open

Automatic

Frequently

Byte

source

locating

>= 3 times

code

exceptions,

errors in

a year

interoperability,

code

concurrency,

Syntax,

Open source

dataflow

security, naming StyleCop

style,

Rulesets, Text

maintainability,

file

Possible

Syntax, dataflow

naming, general

C#

Free

CL,

Occasional

Source

Environment

ly < 3

code

integration,

times a

Automatic

year

List

locating errors in code

IV. CONCLUSION Static code analysis tools can be used to find hidden errors in the implementation of programs, before the program is tested or sent into production. The correction of hidden defects in the development cycle, can reduce the effort for testing, minimize the required number of operations and reduce costs of maintaining the system. Static code analysis tools can be used in different ways, but they all lead to higher quality of software. They can also

improve a number of security problems. However, we must be aware that these tools should be used in correlation with manual code analysis and other review tools. In this way we can minimize the number of errors that are not detected by static code analyzers. These are usually errors, which cannot be expressed with rules and patterns. Static code analysis tools play an important part in the process of developing software applications. However, we must realize that final checking is still done by human; he is the one who examines finished source code and finally give the green light. In this article we presented a

taxonomy for the casification of static code analyzers. We have set a number of categories/criteria for setting up a systematic tree for the evaluation of static code analyses tools. Finally a comparison based on the established tree and presented SCAT has been performed.

REFERENCES [1] [2] [3] [4]

[5] [6]

StyleCop Documentation, Microsoft Corporation, 2008 Gendarme project, http://mono-project.com/Gendarme Sean Barnum Julija H. Allen, Software Security Engineering: Addison-Wesley, 2008. D. Plakosh, and G. A. Lewis R. C. Seacord, Modernizing Legacy Systems: Software Technologies, Engineering Processes, and Business Practices: Addison Wesley Professional, 2003. FindBugs project, http://findbugs.sourceforge.net/ CheckSytle project, http://checkstyle.sourceforge.net/

[7] [8] [9] [10] [11]

B. Boehm, Software Engineering Economics. New York: Prentice-Hall, 1981. Brad Abrams Krzysztof Cwalina, Framework Design Guidelines.: Addison-Wesley, 2008. Brian Chess, Jacob West, Secure Programming with Static Analysis, Addison-Wesley, 2007 Jack Ganssle, A Guide to Code Inspections, The Gannsle Group, 2009 Dorota Huizinga, Adam Kolawa, Automated Defect Prevention: Best Practices in Software Management, Wiley-IEEE Computer Society, 2007

Suggest Documents