Unicode programming for Desktop Best Practices

Unicode programming for Desktop 5.1 - Best Practices. Version 1 Nov 08, 2002 Author: Jerzy Gruszka Senior Software Engineer – Documentum Desktop. P...
Author: Belinda Benson
8 downloads 0 Views 223KB Size
Unicode programming for Desktop 5.1 - Best Practices. Version 1

Nov 08, 2002

Author: Jerzy Gruszka Senior Software Engineer – Documentum Desktop.

Page 1 of 14

11/19/02

Copyright © 2002 by Documentum, Inc. 6801 Koll Center Parkway Pleasanton, CA 94566-3145 All Rights Reserved. Documentum ® , Documentum eContentServer™, Documentum Desktop Client™, Documentum Intranet Client™, Documentum WebPublisher™, Documentum Web Development Kit™, Documentum RightSite ® , Documentum Administrator™, Documentum Developer Studio™, Documentum WebCache™, Documentum e-Deploy™, AutoRender Pro™, Documentum Content Personalization Services™, Documentum Site Delivery Services™, Documentum Content Authentication Services™, Documentum DocControl Manager™, Documentum Corrective Action Manager™, DocInput™, Documentum DocViewer™, Documentum EContent Server®, Documentum WorkSpace®, Documentum SmartSpace®, Documentum ViewSpace®, and Documentum SiteSpace™ are trademarks of Documentum, Inc. in the United States and other countries. All other company and product names are used for identification purposes only and may be trademarks of their respective owners. The information in this document is subject to change without notice. Documentum, Inc. assumes no liability for any damages incurred, directly or indirectly, from any errors, omissions, or discrepancies in the information contained in this document. All information in this document is provided “AS IS”, NO WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE MADE REGARDING THE INFORMATION CONTAINED IN THIS DOCUMENT.

Page 2 of 14

11/19/02

TABLE OF CONTENTS Audience ............................................................................................................................................. 4 Objectives ........................................................................................................................................... 4 Character Encoding......................................................................................................................... 4 String Types .................................................................................................................................... 4 String Conversion Macros ............................................................................................................... 5 Common Compiler Errors................................................................................................................ 7 C++ File and Registry API............................................................................................................... 7 Win32 API in VB .............................................................................................................................. 8 Direct DMCL calls for UTF-8 sessions ...................................................................................... 10 Fonts .......................................................................................................................................... 12 VB controls ................................................................................................................................ 14 Summary ................................................................................................................................... 14

Page 3 of 14

11/19/02

Audience This paper is meant for Documentum Desktop Custom Applications developers to aid in their design efforts. The reader is assumed to have some basic knowledge of the following: • Coding with C/C++ and VB. • Win32 API.

Objectives   

Gain a deeper understanding of string data types on Microsoft Windows. Learn tips and techniques for writing VB and C++ code that handles Unicode data. Learn what is required to convert an existing code base from ANSI to Unicode.

Character Encoding 

UTF-16 is the encoding for Unicode strings on Windows. o UTF-16 is double byte. o For Example ASCII representation of character ‘h’ is 0x68 and Unicode is 0x0068.

String Types 

LPSTR is pointer to a null-terminated ANSI character array (array of chars).



LPWSTR is a pointer to a null-terminated Unicode character set (WCHARs). o The W in LPWSTR stands for Wide, which is MS way of saying Unicode. o LPWSTR has no embedded null characters in the middle.



BSTR is a pointer variable that points to a Unicode character array. o It is preceded by a 4-byte length field and terminated by a single null 2-byte character. o There may be additional null characters within the character array. o The pointer points to the beginning of the character array, not to the 4-byte length field. o The length field contains the number of bytes (not the number of characters) in the character array excluding the terminating null bytes.



LPCSTR and LPCWSTR can not be changed by any function that uses this type. o C stands for constant. o LPTSTR and LPCTSTR are generic data types used in conditional compilation, just like the TCHAR data type, to cover both ANSI and Unicode in a single source code. #ifdef _UNICODE typedef LPWSTR LPTSTR ; typedef LPCWSTR LPCTSTR ; #else typedef LPSTR LPTSTR ; typedef LPCSTR LPCTSTR ; #endif

Page 4 of 14

11/19/02



OLECHAR is WCHAR. typedef WCHAR OLECHAR ;



LPOLESTR is a pointer to a null-terminated Unicode character set. typedef OLECHAR* LPOLESTR ;



Visual Basic String o It is internally implemented as BSTR – it can have embedded null characters. o The terminating null is not of much use in VB. o VB String with no embedded nulls is also LPWSTR.



LPCTSTR myString = _T(“This is a test string”) ;



Standard C++ string from STL library. string type is not Unicode enabled, it only works for single byte characters: typedef basic_string string;



Use MyString wrapper class typedef TCHAR MyChar ; typedef basic_string< MyChar, char_traits, allocator > MyString ;



MFC CString o Unicode enabled and convenient to use in projects that support MFC. o Watch out for GetLength() call. It is documented to return number of bytes but it actually returns number of characters!

String Conversion Macros  Use macros to convert between different types of strings: A2BSTR() OLE2A() T2A() W2A() A2COLE() OLE2BSTR() T2BSTR() W2BSTR() A2CT() OLE2CA() T2CA() W2CA() A2CW() OLE2CT() T2COLE() W2COLE() A2OLE() OLE2CW() T2CW() W2CT() A2T() OLE2T() T2OLE() W2OLE() A2W() OLE2W() T2W() W2T() ‘A’ is an LPSTR, ‘OLE’ is an LPOLESTR, ‘T’ is an LPTSTR , ‘W’ is an LPWSTR and C is constant. 

The conversion macros call WideCharToMultibyte() or MultiByteToWideChar() and require declaration of another macro USES_CONVERSION ;



Defined in AtlConv.h header file.

Page 5 of 14

11/19/02



Using the macros in a loop can result in allocating megabytes of memory on the stack:

void BadIteratingCode(LPCTSTR lpsz) { USES_CONVERSION; for (int i = 0; i < 10000; i++) pI->SomeMethod(i, T2COLE(lpsz)); } void MuchBetterIteratingCode(LPCTSTR lpsz) { USES_CONVERSION; LPCOLESTR lpszT = T2COLE(lpsz); for (int i = 0; i < 10000; i++) pI->SomeMethod(i, lpszT); }



Never return the result of any of the macros, unless the return value implies making a copy of the data before the return:

LPTSTR BadConvert(ISomeInterface* pI) { USES_CONVERSION; LPOLESTR lpsz = NULL; pI->GetFileName(&lpsz); LPTSTR lpszT = OLE2T(lpsz); return lpszT; // bad! returning memory allocated on stack. } CString BetterConvert(ISomeInterface* pI) { USES_CONVERSION; LPOLESTR lpsz = NULL; pI->GetFileName(&lpsz); LPTSTR lpszT = OLE2T(lpsz); return lpszT; // CString makes copy }

Page 6 of 14

11/19/02

Common Compiler Errors 

Never add _T() to TRACE0, TRACE1, TRACE2… functions. Internal implementation already adds one. It doesn’t apply to TRACE. TRACE(_T(“trace message”)) ; TRACE1(“The value of myVar = %d“, myVar) ;



Always add _T() macro for literal strings and L for LPOLESTR strings: MyFunction(_T(“string literal”)) ; LPOLESTR myString = L”string literal”;



Replace string declarations char with _TCHAR, LPSTR with LPTSTR, LPCSTR with LPCTSTR.



Always use generic string manipulation routines: strcmp() -> _tcscmp() strcpy() -> _tcscpy() strcat() -> _tcscat() sprintf() -> _stprintf() strlen() -> _tcslen() strncpy() -> _tcsncpy() atoi() -> _ttoi() itoa() -> _itot() and so on… For the complete list look up the MSDN library: http://msdn.microsoft.com/library/default.asp?url=/library/enus/vccore98/html/_crt_routine_ mappings.asp



Check if other C++ routines that you use have generic versions: remove() -> _tremove() setlocale() -> _tsetlocale();

C++ File and Registry API 

Use CreateFile() API. Standard C++ wfstream() and wfostream() are only able to accept char* type for file name.



WriteFile() will write Unicode strings so you need to add a standard byte order mark 0xFFFE.



ReadFile() reades bytes. Test for presence of the byte order mark before parsing the content.



In VB call CreateTextFile(fileName, True, True) of the FileSystemObject. The last parameter sets Unicode option.

Page 7 of 14

11/19/02



WriteLine (strDRL) writes 0xFFFE header for you.



Many API functions require buffer size in bytes – do not mix it with string length.

CString strUnicodeDRL = _T(“MyDRLstring”); WriteFile(hFile, strUnicodeDRL, (strUnicodeDRL.GetLength())*sizeof(_TCHAR), lpNumberOfBytesWritten, &ovWrite) ;

CString value(_T(“True”)); RegSetValueEx(hkey, _T("DocumentInfoPaneVisible"), 0, REG_SZ, (const BYTE *) value.GetBuffer(0), value.GetLength()*sizeof(_TCHAR));

BYTE dataBuffer[MAX_PATH*sizeof(_TCHAR)]; DWORD dataBufferSize = sizeof(dataBuffer); // Get the file path from the registry if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, T("SOFTWARE\\Documentum\\Common"), 0, KEY_READ, &Hkey) == ERROR_SUCCESS) { if (RegQueryValueEx(Hkey, _T("ExportDirectory"), 0, &type, dataBuffer, &dataBufferSize) == ERROR_SUCCESS) pathname = (_TCHAR*) dataBuffer; }

Win32 API in VB  

Strings are Unicode but VB assumes the world outside is all ANSI so it will convert in/out string parameters. For example, if you call CharUpper() it will do the following: 1. The string is translated by VB from BSTR to “ABSTR” (string with all most significant bytes removed) and passed to the function CharUpperA() which treats it as LPSTR. 2. This function translates the LPSTR to an LPWSTR and passes it to CharUpperW(). 3. The Unicode function CharUpperW() processes the LPWSTR and produces an LPWSTR for output, returning it to CharUpperA(). 4. The function CharUpperA() translates the LPWSTR back to LPSTR and passes it to VB, which thinks of it as an “ABSTR”. 5. VB translates the “ABSTR” back to a BSTR!

Page 8 of 14

11/19/02



Under Windows NT or higher we can call the ‘W’ version of the function, however, VB still makes the BSTR-to-ABSTR translations and we must counteract it using StrConv().

Private Declare Function RegQueryValueEx Lib "advapi32.dll" Alias "RegQueryValueExW" (ByVal hkey As Long, ByVal lpValueName As String, ByVal lpReserved As Long, lpType As Long, ByVal lpdata As String, lpcbData As Long) Private Const STRING_BUFFER_SIZE As Integer = 261 * 2 Dim dataBuffer As String * STRING_BUFFER_SIZE Dim dataBufferSize As Long

'(_MAX_PATH+1)*2

RegQueryValueEx(resultKey, _ StrConv("RootID", vbUnicode), _ 0, _ dwType, _ dataBuffer, _ dataBufferSize) docId = Left$(StrConv(dataBuffer, vbFromUnicode), (dataBufferSize / 2) - 1)

Do not trust all native VB file calls to be Unicode! 

The following conversion will not work: SetAttr(filePath, vbReadOnly) -> SetAttr(strConv(filePath, vbUnicode), vbReadOnly)



If possible use Win32 API calls that do the same job. Replace SetAttr(filePath, vbReadOnly) with:

Private Declare Function SetFileAttributes Lib "kernel32“ Alias "SetFileAttributesW" (ByVal lpFileName As String, ByVal dwFileAttributes As Long) As Long Private Const FILE_ATTRIBUTE_READONLY = &H1 SetFileAttributes(StrConv(pathname, vbUnicode), FILE_ATTRIBUTE_READONLY)

Replace FileLen(path) with:

Private Declare Function CreateFile Lib "kernel32" Alias "CreateFileW" _ (ByVal lpFileName As String, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, pSecurityAttributes As SECURITY_ATTRIBUTES, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long Private Declare Function GetFileSize Lib "kernel32" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long Dim fileHandle As Long

Page 9 of 14

11/19/02

Dim securityAttr As SECURITY_ATTRIBUTES

fileHandle = CreateFile(StrConv(path, vbUnicode), _ GENERIC_READ, _ FILE_SHARE_READ, securityAttr, _ OPEN_EXISTING, _ FILE_ATTRIBUTE_NORMAL, 0) filesize = GetFileSize(fileHandle, 0)

Direct DMCL calls for UTF-8 sessions

LPSTR strUTF8; LPSTR retValUTF8; LPWSTR retValUTF16; boolean dmclSessionIsUtf8 = FALSE; //find out what is the client codepage set for this docbase. char* retVal = dcAPIGet( "get,a,apiconfig,client_codepage" ); if (strcmp(retVal, "UTF-8") == 0) dmclSessionIsUtf8 = TRUE; If (dmclSessionIsUtf8 == TRUE) { ConvertUTF16toUTF8(str, &strUTF8); retValUTF8 = dmAPIGet(strUTF8); ConvertUTF8toUTF16(retValUTF8, &retValUTF16); //do something //clean up memory allocated by the conversion functions. CoTaskMemFree(retValUTF8); CoTaskMemFree(retValUTF16); }

//this function allocates memory for the [in] parameter 'strConverted'. //Caller is responsible for releasing the allocated memory after it is //done using it by calling CoTaskMemFree(). int ConvertUTF16toUTF8(LPCWSTR strToConvert, LPSTR* strConverted) { int retVal; int bufferSize; if (strToConvert == NULL) return 0; //failure

Page 10 of 14

11/19/02

//calculate the number of bytes that need to be allocated bufferSize = WideCharToMultiByte(CP_UTF8, 0, strToConvert, -1, *strConverted, 0, NULL, NULL); if (bufferSize == 0) return 0; //failure

//allocate memory for the converted string *strConverted = (char*)CoTaskMemAlloc(bufferSize);

//do the actual conversion retVal = WideCharToMultiByte(CP_UTF8, 0, strToConvert, -1, *strConverted, bufferSize, NULL, NULL); return retVal; } //this function allocates memory for the [in] parameter 'strConverted'. //Caller is responsible for releasing the allocated memory after it is //done using it by calling CoTaskMemFree(). int ConvertUTF8toUTF16(LPCSTR strToConvert, LPWSTR* strConverted) { int retVal; int bufferSize; if (strToConvert == NULL) return 0; //failure //calculate the number of bytes that need to be allocated bufferSize = MultiByteToWideChar(CP_UTF8, 0, strToConvert, -1, *strConverted, 0); if (bufferSize == 0) return 0; //failure //allocate memory for the converted string where 'bufferSize' is //the number of WCHARs *strConverted = (WCHAR*)CoTaskMemAlloc(bufferSize*2);

Page 11 of 14

11/19/02

//do the actual conversion retVal = MultiByteToWideChar(CP_UTF8, 0, strToConvert, -1, *strConverted, bufferSize); return retVal; }

Fonts 



Only some fonts are capable of displaying Unicode strings:  Microsoft San Serif, Tahoma, and Arial Unicode MS on Windows 200 and Windows XP.  only Arial Unicode MS on Windows NT 4.0.  Arial Unicode MS normally is not installed and we are not allowed to redistribute it! DTC uses font mapping mechanism for MS Shell Dlg placeholder. The registry entry is created by Microsoft and it is present on NT 4.0 or higher. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\FontSu bstitutes: MS Shell Dlg = Microsoft Sans Serif 

In C++ .rc files substitute default MS Sans Serif with MS Shell Dlg for all dialogs.



In VB projects read the value of MS Shell Dlg registry string and set Font property for relevant controls in FormLoad().

Private Declare Function RegCloseKey Lib "advapi32.dll" (ByVal hkey As Long) As Long Private Declare Function RegOpenKeyEx Lib "advapi32.dll" Alias "RegOpenKeyExW" (ByVal hkey As Long, ByVal lpSubKey As String, ByVal ulOptions As Long, ByVal samDesired As Long, phkResult As Long) As Long Private Declare Function RegQueryValueEx Lib "advapi32.dll" Alias "RegQueryValueExW" (ByVal hkey As Long, ByVal lpValueName As String, ByVal lpReserved As Long, lpType As Long, ByVal lpdata As String, lpcbData As Long) As Long Private doubled Private Private Private Private Private

Const STRING_BUFFER_SIZE As Integer = 261 * 2 '(_MAX_PATH+1) * 2 for UNICODE Const HKEY_LOCAL_MACHINE = &H80000002 Const ERROR_SUCCESS = 0& Const KEY_QUERY_VALUE = &H1 Const KEY_ENUMERATE_SUB_KEYS = &H8 Const KEY_NOTIFY = &H10

Page 12 of 14

11/19/02

Private Const SYNCHRONIZE = &H100000 Private Const READ_CONTROL = &H20000 Private Const STANDARD_RIGHTS_READ = (READ_CONTROL) Private Const KEY_READ = ((STANDARD_RIGHTS_READ Or KEY_QUERY_VALUE Or KEY_ENUMERATE_SUB_KEYS Or KEY_NOTIFY) And (Not SYNCHRONIZE)) Public Sub SetRuntimeFont(destinationForm As Form) On Error GoTo HandleError Dim index As Integer Dim fontString As String Dim destFont As Font fontString = getFontFromRegistry() 'if the font is MS Sans Serif, do not set it because this is our 'default font If fontString = "MS Sans Serif" Then Exit Sub End If 'Go through For index = If TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf TypeOf

all the controls in the form and set the font 0 To destinationForm.Controls.Count - 1 destinationForm.Controls(index) Is MSForms.Label Or _ destinationForm.Controls(index) Is MSForms.TextBox Or _ destinationForm.Controls(index) Is MSForms.ComboBox Or _ destinationForm.Controls(index) Is MSForms.ListBox Or _ destinationForm.Controls(index) Is MSForms.OptionButton Or_ destinationForm.Controls(index) Is VB.CommandButton Or _ destinationForm.Controls(index) Is VB.Label Or _ destinationForm.Controls(index) Is VB.CheckBox Or _ destinationForm.Controls(index) Is VB.OptionButton Or _ destinationForm.Controls(index) Is VB.Frame Or _ destinationForm.Controls(index) Is VB.ListBox Or _ destinationForm.Controls(index) Is VB.ComboBox Or _ destinationForm.Controls(index) Is VB.TextBox Then

Set destFont = destinationForm.Controls(index).Font destFont.Name = fontString 'For some reason W2K will not correctly displayed font 'with size set at design time for Form 2.0 controls. 'The font size has to be set at runtime in 'SetRuntimeFont() function. 'This font size still can be customized for any particular 'control by modifying its size after the SetRuntimeFont() 'is called in FormLoad(). destFont.Size = 8 End If Next index Exit Sub HandleError: Debug.Print "Failed to set Font from the registry" Exit Sub End Sub

Page 13 of 14

11/19/02

Function getFontFromRegistry() As String Dim dataBuffer As String * STRING_BUFFER_SIZE Dim dataBufferSize As Long Dim dwType As Long Dim resultKey As Long Dim retReg As Long dataBufferSize = STRING_BUFFER_SIZE If RegOpenKeyEx(HKEY_LOCAL_MACHINE, _ StrConv("SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontSubstitutes", vbUnicode), _ 0, _ KEY_READ, _ resultKey) = ERROR_SUCCESS Then retReg = RegQueryValueEx(resultKey, StrConv("MS Shell Dlg", vbUnicode), 0, dwType, _ dataBuffer, dataBufferSize) If retReg = ERROR_SUCCESS Then getFontFromRegistry = Left$(StrConv(dataBuffer, vbFromUnicode), (dataBufferSize / 2) - 1) End If RegCloseKey (resultKey) End If End Function

VB controls    

Standard VB controls are not capable of displaying Unicode strings. Strings are converted into/from ANSI before they are displayed/read! We are forced to use Forms 2.0 set of controls that do not map 1:1 with standard ones. TreeView/ListView not available. You have to use third party controls. No automated way to replace currently used standard VB controls. In order to reference Forms 2.0 controls right click on your toolbox window, select components…. In the Controls tab of the Components dialog box check the Microsoft Forms 2.0 object library check box. New set of controls will show upon your toolbox window.

Summary        

Many different types of strings in the Windows world. Never mix bytes and characters. In Unicode their size is not the same. Use generic function names that will compile in both ANSI and Unicode configurations. Convert VB strings when calling Win32 API. Avoid direct DMCL calls, use DFC that does the necessary conversions for you. If you have to call DMCL, use conversion routines. Never hard code Font names. Use MS Shell Dlg mapping in both VB and C++. When adding new forms/controls in VB projects use Forms 2.0 library.

Page 14 of 14

11/19/02