An Introduction to the Clang API
Mark Wilson Senior Software Engineer Integrated Computer Solutions, Inc.
Agenda ●
Quick introduction to Clang and LLVM
●
Clang usage and error reporting
●
Using Clang with Qt
●
Basic parser and Clang terminology
●
Using the Clang API to highlight text
●
Questions/Feedback
About Clang •
One of many front-ends to the LLVM compiler infrastructure
•
LLVM (Low Level Virtual Machine) acronym no longer meaningful, LLVM is full name now
•
Designed to compile C, C++, Objective-C, and Objective-C++ to machine code
•
Apple is the primary developer of Clang; Clang is the official compiler for Apple SDK
•
Clang is the default compiler on FreeBSD
•
Clang is highly compatible with GCC
•
Clang is a production compiler for C++ 98
•
Rapidly advancing in terms of C++ 11
•
The Clang API allows full insight into any C++ code base programmatically enables relatively easy tool development
Clang Diagnostics are Simpler
GCC diagnostic for a simple error: file.cc:7:1: error: expected ';' before '}' token
The same error output by Clang: file.cc:6:11: error: expected ';' after expression i += 8 ^ ;
Even for Dreaded Template Errors #include #include int main(int argc, char** argv) { std::map aMap; aMap[1] = "clang"; }
Typical Template Error Diagnostic try.cc: In function 'int main(int, char**)': try.cc:9:11: error: invalid user-defined conversion from 'int' to 'const key_type& {aka const std::basic_string&}' [-fpermissive] In file included from /usr/include/c++/4.7/string:55:0, from try.cc:1: /usr/include/c++/4.7/bits/basic_string.tcc:214:5: note: candidate is: std::basic_string::basic_string(const _CharT*, const _Alloc&) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator] /usr/include/c++/4.7/bits/basic_string.tcc:214:5: note: no known conversion for argument 1 from 'int' to 'const char*' try.cc:9:11: error: invalid conversion from 'int' to 'const char*' [-fpermissive] In file included from /usr/include/c++/4.7/string:55:0, from try.cc:1: /usr/include/c++/4.7/bits/basic_string.tcc:214:5: error: initializing argument 1 of 'std::basic_string::basic_string(const _CharT*, const _Alloc&) [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator]' [fpermissive] try.cc:9:15: error: invalid conversion from 'const char*' to 'std::map, int>::mapped_type {aka int}' [-fpermissive]
Clang Template Error Diagnostic try.cc:9:9: error: no viable overloaded operator[] for type 'std::map' aMap[1] = "clang"; ~~~~^~
Using Clang with Qt Since Clang and GCC are compatible, you can build and link against a Qt installation that was built with GCC. Tell qmake to use the Clang compiler: qmake QMAKE_CC=clang QMAKE_CXX=clang
Basic Compiler/Parser Terminology •
Compiler function is to transform one form of code into another, e.g. C++ source => x86 assembler
•
Compiler scans the stream of characters that make up code, and tokenizes them: – – – – –
•
Numeric/string literals Punctuation Language keywords Identifiers Comments
Tokenization produces a stream of tokens, which are parsed to: – – –
Ensure correct syntax Discover the inherent structure of program Build an Abstract Syntax Tree (AST) representation of source
Abstract Syntax Tree 1 Libclang provides a cursor that follows the AST in top-down order 2
5
4
6
3 7
8
Clang Provides Tool Infrastructure API’s •
LibClang – –
•
C API, stable, allows bindings to other languages (e.g., Python) Simpler, but less control over AST
Clang Plugins – – –
•
Dynamic libraries loaded by compiler at runtime Complete control over AST Good for generating artifacts during compile time
LibTooling – –
C++ interface for writing stand-alone tools Provides common way to parse Clang command line options
libclang •
Libclang is a stable C interface to the Clang compiler
•
Entire API is in Index.h
•
Provides ability to iterate through program structure via cursors
•
Prefer libclang over the C++ interface unless you need full control over program structure – – –
•
More stable Better backwards compatibility Much simpler
Libclang is great for tool writing: – – –
Syntax checking (clang-check) Automatic fixing of compile errors (clang-fixit) Automatic code formatting (clang-format)
More Terminology Translation Unit - Basic unit of compilation in C++. Is a single source file plus any header files directly or indirectly included Index - Set of translation units that may link into an executable or library. May be many translation units in an index Cursor – “Pointer” to an element in the AST. Cursor may be hierarchical in nature, e.g., parameters are children of function
Libclang Data Types Primary libclang data types: •
CXTranslationUnit
•
CXIndex
•
CXCursor
•
CXCursorKind
•
CXToken
•
CXType
•
CXTypeKind
•
CXSourceLocation
•
CXSourceRange
Some Code – A Simple Syntax Aware Mini “IDE”
•
Highlights keywords, literals, punctuation, and comments with color
•
Read-only
Code We need a CXIndex: index_ = clang_createIndex(0, 0);
When the user selects a file, we create a CXTranslationUnit: // Produce object code, parse file as C++ const char* args[] = { "-c", "-x", "c++" }; transUnit_ = clang_parseTranslationUnit(index_, path_.toStdString().c_str(), args, 3, 0, 0, CXTranslationUnit_None);
Visiting the Source Code
We obtain the first cursor in the source from the translation unit and start visiting the AST via a user-defined visitor function: CXCursor startCursor = clang_getTranslationUnitCursor(transUnit_); clang_visitChildren(startCursor, visitor, this);
The Visitor A Clang visitor function has the signature CXChildVisitResult visitor( CXCursor cursor, // the current source cursor CXCursor parent, // the parent of the current cursor, if there is one CXClientData clientData // pointer to arbitrary user data )
CXChildVisitResult is one of three enumerators: 1.
CXChildVisit_Break Terminates the cursor traversal
2.
CXChildVisit_Continue Continues the cursor traversal with the next sibling of the cursor just visited, without visiting its children
3.
CXChildVisit_Recurse Recursively traverse the children of this cursor, using the same visitor and client data.
Token Information To highlight tokens in text, need to know: •
CXSourceLocation – file, line, column, and offset
•
CXSourceRange – start and end locations of token
•
CXCursorKind – keyword, literal, punctuation, identifier, or comment
Use QTextCursor to move through source in QTextEdit object and highlight code text.
Source Code for IDE ftp://ftp.ics.com/pub/pickup/clangfollowup.zip
References •
LLVM - http://llvm.org/
•
Clang Site - http://clang.llvm.org/
•
Building Clang – http://clang.llvm.org/get_started.html (version 3.4 as of this presentation)
•
Clang documentation - http://clang.llvm.org/docs/index.html
•
Clang Doxygen - http://clang.llvm.org/doxygen/index.html
•
Refactoring with clang http://www.youtube.com/watch?v=yuIOGfcOH0k