Chapter 3. Introduction to SUN's disk technology

Chapter 3. Introduction to SUN's disk technology This chapter introduces you to aspects of the Sun Solaris operating system and, in particular, to the...
Author: Magnus Grant
0 downloads 4 Views 106KB Size
Chapter 3. Introduction to SUN's disk technology This chapter introduces you to aspects of the Sun Solaris operating system and, in particular, to the file system and the disk administration under Solaris as they relate to DB2 UDB. More specifically, we’ll look at the following aspects: • File systems on Solaris • Administrative tasks for file systems • NFS on Solaris • Disk management on Solaris This chapter provides enough information before you have to perform a task, such as creating a UFS file system or creating a volume under Sun’s Volume Manager. You should refer to the appropriate Sun documentation for details. There are many tasks that need to be performed. When setting up DB2 UDB on your system, you need to have a sufficient background on the aspects of the Solaris operating system as they relate to DB2 UDB so that you know what your options are and which method is best suited for your environment. The correct setup of your environment is crucial. The main goal of this chapter is to provide you with a sufficient background on the Solaris operating system as it relates to file system and disk management.

3.1 File systems on Solaris The following section looks at the file system on Solaris, some concepts you need to understand to manage file systems, and discusses various file system types.

3.1.1 Supported file system types on Solaris File systems provide another way in which the capabilities of the Solaris Operating Environment can be dynamically configured and extended to meet business needs, such as managing disk volumes and files. The Solaris operating environment uses the architecture that provides a standard interface for different file system types. Any third party who writes software that uses this interfaces can provide file systems that integrate with the Solaris operating environment. Examples of the supported extensions include the high-performance UNIX file system (UFS), the PC file system (PCFS) with long file name support, the ISO 9660 CD-ROM file system known as the High Sierra file system (HSFS), and the Veritas file system (VxFS).

© Copyright IBM Corp. 1999

69

The Solaris operating environment supports three types of file systems: • Disk-based • Network-based • Virtual 3.1.1.1 Disk-based file systems Disk-based file systems are stored on physical media, such as hard disks, CD-ROMs, and diskettes. Disk-based file systems can be written in different formats. The available formats are: • UFS - UNIX file system (based on the BSD Fast file system that was provided in the 4.3 Tahoe release). UFS is the default disk-based file system for the Solaris operating environment. Before you can create a UFS file system on a disk, the disk must be formatted and divided into slices. A disk slice is a physical subset of a disk that is composed of a single range of contiguous blocks. A slice can be used either as a raw device that provides, for example, swap space, or to hold a disk-based file system. Sun disks come formatted with standard VTOCs (volume table of contents) on them; therefore, you usually do not need to format disks. Detailed information about a disk slice will be discussed in 3.2, “Disk slice” on page 72. • HSFS - High Sierra and ISO 9660 file systems. High Sierra is the first CD-ROM file system; ISO 9660 is the official standard version of the High Sierra file system. The HSFS file system is used on CD-ROMs and is a read-only file system. Solaris HSFS supports Rock Ridge extensions to ISO 9660, which, when present on a CD-ROM, provides all UFS file system semantics and file types except for writability and hard links. • PCFS - PC file system, which allows read/write access to data and programs on DOS-formatted disks written for DOS-based personal computers. Each type of disk-based file system is customarily associated with a particular media device: • UFS with hard disk • HSFS with CD-ROM • PCFS with diskette These associations are not, however, restrictive. For example, CD-ROMs and diskettes can have UFS file systems created on them.

70

DB2 Universal Database in the Solaris Operating Environment

Besides these file systems listed above, Veritas filesystem (VxFS) is generally used as one of the disk-based filesystem. Veritas filesystem will be explained later in this chapter. 3.1.1.2 Network-based file systems Network-based file systems can be accessed over the network. Network-based file systems reside on one system, typically a server, and are accessed by other systems across the network. The Network file system (NFS) is the only available network-based file system. NFS is the distributed file system service for Solaris. With NFS, you can administer distributed resources (files or directories) by sharing them in individual clients. See 3.4, “NFS under Solaris” on page 90 for more information. 3.1.1.3 Virtual file systems Virtual file systems are memory-based file systems that provide access to special kernel information and facilities. Most virtual file systems do not use file system disk space. However, the Cache file system (CacheFS) uses a file system on the disk to contain the cache, and some virtual file systems, such as the Temporary file systems (TMPFS), use the swap space on a disk. The Unix kernel provides a standard interface to each of these file systems. As far as the user is concerned, each physical file system is accessed using the same set of Unix system calls. The aim is to provide as consistent an interface as possible. It is this consistency that allows the set of physical file systems to be represented as a single directory hierarchy. Because of their transient nature, virtual file systems are not recommended for use with DB2 UDB.

3.1.2 File system capacity With Solaris 2.6, support was added to the operating system to allow logical file sizes up to 263 bytes; this means that a file on UFS may be as large as the file system (1 TB). The Veritas journaling file system (VxFS) provides support beyond 1 TB. VxFS is supported by many UNIX based operating systems. VxFS is a transaction-based journaling file system designed to meet the enterprise computing needs of users of systems ranging from the desktop to super computers. VxFS is designed for server systems where higher performance and online file system management are required features.

Chapter 3. Introduction to SUN's disk technology

71

Table 2 shows different file system capacities: Table 2. File system capacities

File system

Max. Capacity

Max. File Size

SunOS 4.x UFS

2 GB

2 GB

Solaris 2.5.1 and earlier UFS

1 TB

2 GB

Solaris 2.6 and later UFS

1 TB

1 TB

VxFS

8,000 TB

8,000 TB

3.1.3 The Unix file system (UFS) Sun Microsystems introduced vnode/vfs framework to enable multiple file system support. The concept was first introduced in SVR4. Under the vnode/vfs framework: • The system must support multiple file system types, such as Unix (UFS, S5FS) and DOS (PCFS) simultaneously. • Different disk slices may contain different types of file systems, but once mounted, they must present a common file system interface. • File systems should support sharing of files over the network. • Vendors should be free to create their own file systems and add them to the kernel as a module. Some Unix flavors use a straight UFS file system, while others use a vendor or OS specific derivative of UFS. Usually, the man page for mkfs will tell you what the default file system for your OS is. The default file system for Solaris is UFS.

3.2 Disk slice The disk slice is an important concept in managing disks and file systems on Solaris. In the following section, we discuss what the disk slice is.

3.2.1 File system and disk slice Before any operating system can use a disk, it must be initialized. This low-level format writes the head, track, and sector numbers in a sector preamble and a checksum to every sector on the disk. At the same time, any sectors that are unusable due to flaws in the disk surface are so marked and, depending on the disk format, an alternate sector might be mapped in place

72

DB2 Universal Database in the Solaris Operating Environment

to replace the flawed sector. Most disks today come pre-formatted with a low-level format. Files stored on a disk are contained in file systems. Each file system on a disk is assigned to a disk slice, which is a group of cylinders set aside for use by that file system. Each disk slice appears to the operating system (and to the system administrator) as though it were a separate device. A disk drive in Solaris is usually divided into disk slices (disk slices are sometimes referred to as partitions) using the high-level format command. Once this is done, a file system can be laid on each of these disk slices using a command, such as mkfs or newfs. When you divide a disk drive into disk slices, you must be very cautious since disk slices on a single disk can overlap; so that a given range of the disk may be contained in more than one disk slice. If two (or more) slices that overlap are used simultaneously, no warning will be issued by the operating system, but data corruption will occur. If one or both of the slices are used as file systems, then the machine will likely panic and crash. See 3.3.6.4, “Format utility” on page 84 for detailed information about formatting disks. Certain interfaces, such as the format utility, refer to slices as partitions. When setting up slices, remember these rules: • Each disk slice holds only one file system. • No file systems can span multiple slices without a volume manager.

3.2.2 Logical disk device names in Solaris Logical disk device names are used to access disk devices (or disk slices) when you: • Add a new disk to the system. • Access (or mount) a file system residing on a local disk. • Back up a local file system. Many administration commands take arguments that refer to a disk slice or file system. When you refer to a disk device, specify the subdirectory to which it is symbolically linked (either /dev/dsk or /dev/rdsk), followed by a string identifying the particular controller, disk, and slice.

Chapter 3. Introduction to SUN's disk technology

73

/dev/[r]dsk/cW tX dY sZ Controller, Target, Drive, Slice Number Disk Subdirectory

Figure 29. String identifying the particular controller, disk, and slice.

cW stands for controller number W. This refers to the logical controller number of the device interface. For instance, a system with one SCSI interface would use c0. tX stands for target number. This is the SCSI Target ID (or SCSI Address) address of a disk connected to the controller. dY stands for the drive or unit number of the device connected to target controller tX, which is, in turn, connected to bus controller cW. sZ stands for the slice or partition number of the device you are addressing. 3.2.2.1 Specifying the disk subdirectory Disk and file administration commands require the use of either a raw (or character) device interface or a block device interface. The distinction is made by how data is read from the device. Block device interfaces include a buffer from which large blocks of data are read at once. Raw device interface does not use buffers to transfer data. Different commands require different interfaces. • When a command requires the raw device interface, specify the /dev/rdsk subdirectory. (The r in rdsk stands for raw.) • When a command requires the block device interface, specify the /dev/dsk subdirectory. • When you're not sure whether a command requires use of /dev/dsk or /dev/rdsk, check the man page for that command.

74

DB2 Universal Database in the Solaris Operating Environment

Table 3 shows which interface is required for commonly used disk and file system commands. Table 3. Device interface type required by some frequently used commands

Command

Interface Type

Example of Use

df

Block

df /dev/dsk/c0t3d0s6

fsck

Raw

fsck -p /dev/rdsk/c0t0d0s0

mount

Block

mount /dev/dsk/c1t0d0s7 /export/home/db2

newfs

Raw

newfs /dev/rdsk/c0t0d1s1

prtvtoc

Raw

prtvtoc /dev/rdsk/c0t0d0s2

3.2.2.2 Specifying the slice The string you use to identify a specific slice on a specific disk depends on the controller type, either direct or bus-oriented. Figure 30 shows how disk slices on disks with direct controllers are seen.

cX dY sZ

Slice Number (0 to 7)

Drive Number

Logical Controller Number

Figure 30. Naming convention for disks with direct controllers

Figure 31 shows how disk slices on disks with bus-oriented controllers are seen.

Chapter 3. Introduction to SUN's disk technology

75

cW tX dY sZ Slice Number (0 to 7)

Drive Number

Physical Bus Target Number

Logical Controller Number

Figure 31. Naming convention for disks with bus-oriented controllers

The naming conventions shown above are the ones on Solaris for SPARC systems. Though the naming conventions are slightly different on Solaris for x86 systems, we are only discussing SPARC systems in this book because DB2 UDB for Solaris works only on SPARC systems.

3.3 Administrative tasks This section covers, in detail, the file system management tasks under Solaris. We will first discuss how you create a file system and how to make it available by mounting it. We then describe how you should proceed to add a new disk to your system. We finally discuss how you can back up and restore file systems under Solaris.

3.3.1 File system commands There are a number of commands designed to operate on file systems, regardless of type. Table 4. File system commands

76

Command

Function

newfs

Front-end to the mkfs program making UFS file systems on disk partitions.You must be the super-user to use this command except when creating a UFS file system on a diskette.

mkfs

Constructs a file system on the disk partition.

mkfs -m

Displays the characteristics of a file system.

mount

Makes a file system available for use.

DB2 Universal Database in the Solaris Operating Environment

3.3.2 File system management tasks A file system is a complete directory structure including a root directory and any subdirectories and files beneath it. File systems are confined to a single logical volume. Some of the important system management tasks have to do with file systems, specifically: • Allocating space for file systems on logical volumes • Creating file systems • Making file system space available to system users • Monitoring file system space usage • Backing up file systems to guard against data loss in the event of system failures • Maintaining file systems in a consistent state Table 5 shows a list of system management commands that are used regularly for working with file systems. Table 5. File system management tasks

Command

Function

ufsdump

Performs a full or incremental backup of a file system.

ufsrestore

ufsrestore utility restores files from backup media created with the ufsdump command

dd

Copies data directly from one device to another for making file system backups

df

Reports the amount of space used and free on a file system

fsck

Checks file systems and repairs inconsistencies

mkfs

Makes a file system of a specified size on a specified disk partition

newfs

Front-end to the mkfs program making UFS file systems on disk partitions

mount

Attaches a file system to the system-wide naming structure so that files and directories in that file system can be accessed

umount

Removes a file system from the system-wide naming structure, making the files and directories in the file system inaccessible.

3.3.3 Creating a new file system Make sure you have met the following prerequisites:

Chapter 3. Introduction to SUN's disk technology

77

• The disk must be formatted and divided into slices before you can create UFS file systems on it. See 3.3.6.4, “Format utility” on page 84, for complete information on formatting disks and dividing disks into slices. • You need to know the device name of the slice that will contain the file system. See 3.2.2, “Logical disk device names in Solaris” on page 73 for information on finding disks and disk slice numbers. • If you are re-creating an existing UFS file system, un-mount it. • You must be a super-user. Create the UFS file system by issuing the newfs command as follows: newfs [-N] [-b size] [-i bytes] /dev/rdsk/device-name

Each command option means: • -N displays what parameters newfs would pass to mkfs without actually creating the file system. This is a good way to test the newfs command. • -b size specifies the file system block size. Default is 8192 blocks. • -i bytes specifies the number of bytes per inode. Default is 2048 bytes. • device-name specifies the disk device name on which to create the new file. NOTE

Be sure you have specified the correct device name for the slice before performing the next step. If you specify the wrong slice, you will erase its contents when the new file system is created To verify the creation of the UFS file system, check the new file system with the fsck command as follows: fsck /dev/rdsk/device-name device-name specifies the name of the disk device containing the new file

system. The fsck command checks the consistency of the new file system, reports problems it finds, and prompts you before repairing the problems.

3.3.4 Example-Creating a UFS file system The following example creates a UFS file system on /dev/rdsk/c0t3d0s7.

78

DB2 Universal Database in the Solaris Operating Environment

#newfs /dev/rdsk/c0t3d0s7 newfs: construct a new file system /dev/rdsk/c0t3d0s7 (y/n)? y /dev/rdsk/c0t3d0s7:

163944 sectors in 506 cylinders of 9 tracks,

36 sectors 83.9MB in 32 cyl groups (16 c/g, 2.65MB/g, 1216 i/g) super-block backups (for fsck -b #) at: 32, 5264, 10496, 15728, 20960, 26192, 31424, 36656, 41888, 47120, 52352, 57584, 62816, 68048, 73280, 78512, 82976, 88208, 93440, 98672, 103904, 109136, 114368, 119600, 124832, 130064, 135296, 140528, 145760, 150992, 156224, 161456

3.3.5 Mounting and un-mounting a file system Before you can access the files on a file system, you need to mount the file system. Mounting a file system attaches that file system to a directory (mount point) and makes it available to the system. The root (/) file system is always mounted. Any other file system can be connected or disconnected from the root (/) file system. When you mount a file system, any files or directories in the mount point directory are unavailable as long as the file system is mounted. These files are not permanently affected by the mounting process, and they become available again when the file system is unmounted. You can figure out which file systems are mounting by issuing the following command:

# mount / on /dev/dsk/c4t0d0s0 read/write/setuid/largefiles on Wed Jun 23 19:28:39 1999 /proc on /proc read/write/setuid on Wed Jun 23 19:28:39 1999 /dev/fd on fd read/write/setuid on Wed Jun 23 19:28:39 1999 /db2data on /dev/dsk/c2t5d0s2 setuid/read/write/largefiles on Wed Jun 23 19:28:40 1999 /tmp on swap read/write on Wed Jun 23 19:28:40 1999 /export/home on /dev/dsk/c4t11d0s6 largefiles/setuid/read/write on Thu Aug 5 10:35:07 1999

Chapter 3. Introduction to SUN's disk technology

79

Whenever you mount or un-mount a file system, the /etc/mnttab (mount table) file is modified with the list of currently mounted file systems. You can display the contents of the mount table using the cat or more commands, but you cannot edit it. Here is an example of a /etc/mnttab file: # more /etc/mnttab /dev/dsk/c4t0d0s0

/

/proc

/proc

rw,suid,dev=2f80000

930184119

fd

/dev/fd fd

rw,suid,dev=3040000

930184119

/dev/dsk/c2t5d0s2 930184120

/db2data

suid,rw,largefiles,dev=27c0402

swap

dev=1

/tmp

proc

tmpfs

ufs

rw,suid,dev=8002d0,largefiles

ufs

930184119

930184120

-hosts /net

autofs ignore,indirect,nosuid,nobrowse,dev=31c0001

930184140

auto_home

/home

930184140

-xfn

autofs ignore,indirect,dev=31c0003

/xfn

autofs ignore,indirect,nobrowse,dev=31c0002

ranger01:vold(pid442)

/vol

nfs

/dev/dsk/c4t11d0s6 933867307

/export/home

930184140

ignore,noquota,dev=3180001 ufs

930184159

largefiles,suid,rw,dev=800326

3.3.5.1 The /etc/vfstab file The virtual file system table (/etc/vstab file) was created to maintain a list of file systems and how to mount them. The /etc/vfstab file provides two important features: You can specify file systems to automatically mount when the system boots, and you can mount file systems by using only the mount point name, as the /etc/vfstab file contains the mapping between the mount point and the actual device slice name. The fields in this table are: • device-to-mount. The block special device for a local file system or the server: /dir designation for a remote one. • device-to-fsck. The raw special device to be used by fsck. • mount-point. The mount point for the file system. • FS-type. File system type, for example, ufs, nfs, pcfs, s5fs. • fsck-pass. Specifies whether the file systems are checked sequentially or in parallel.

80

DB2 Universal Database in the Solaris Operating Environment

• mount-at-boot. Specifies if the file system should be automatically mounted at boot. • mount-options. The list of comma-separated options used by mount (no spaces). A /etc/vfstab file might look something like the following. Each field must contain an entry; so, where no option is called for, a hyphen (-) is used.

#device device mount FS fsck mount mount #to mount to fsck point type pass at boot options /proc - /proc proc - no fd - /dev/fd fd - no swap - /tmp tmpfs - yes /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no /dev/dsk/c0t3d0s5 /dev/rdsk/c0t3d0s5 /opt ufs 3 yes /dev/dsk/c0t3d0s1 - - swap - no /acs/nyssa/0/swapfile - - swap - no /dev/dsk/c0t6d0s0 - /cdrom ufs - no ro tardis:/home/tardis - /home/tardis nfs - yes hard,intr,bg

The list of mounted file systems is kept in the /etc/mnttab file. 3.3.5.2 Mount a file system listed in the /etc/vfstab file To mount a file system, you need to do the following: 1. Become super-user. 2. There must be a mount point on the local system to mount a file system. A mount point is a directory to which the mounted file system is attached. 3. Mount a file system listed in the /etc/vfstab file. mount mount-point mount-point specifies an entry in the mount point or device to mount field in the /etc/vfstab file. It is usually easier to specify the mount point.

3.3.5.3 Mount a UFS file system To mount a UFS file system that is not in the /etc/vfstab file, you need to do the following: 1. Become super-user. 2. There must be a mount point on the local system to mount a file system. A mount point is a directory to which the mounted file system is attached. 3. Mount the UFS file system by using the mount command.

Chapter 3. Introduction to SUN's disk technology

81

mount [-o mount-options] /dev/dsk/device-name mount-point

Each command option means: • -o mount-options. Specifies mount options that you can use to mount a UFS file system. • /dev/dsk/device-name. Specifies the disk device name for the slice holding the file system (for example, /dev/dsk/c0t3d0s7). • mount-point. Specifies the directory on which to mount the file system. 3.3.5.4 UFS direct input/output In order to provide near-device performance, many file systems have the option to bypass the file system cache using a mechanism known as direct I/O. This reduces the overhead of managing cache allocation and completely removes all interaction between the file system and the memory system. Often in such cases the resulting performance can be many times worse, because there is no cache to buffer reads and writes; but when caching is done in the application, direct I/O can be a benefit. Another important use of direct I/O involves backups, when we don't want to read a file into the cache during a backup. Applications such as DB2 UDB do their own caching, and direct I/O offers a mechanism to avoid the double caching that would occur if they were to use a regular file system. Without direct I/O, an application reads a file block into the Solaris file system cache, and then reads it into the buffer pool of the database. So, the block exists in two places. However, with direct I/O, the block is read directly into the database cache without the need to pass though the regular file system cache. To enable forced direct I/O on a UFS filesystem: 1. Become super user. 2. Mount a file system with the forcedirectio mount option. # mount -F ufs -o forcedirectio /dev/dsk/c0t3d0s7 /export/home

3. Verify the mounted filesystem has forced direct I/O enabled. # mount /export/home on /dev/dsk/c0t3d0s7 forcedirectio/setuid/read/write/largefiles on Mon May 12 13:47:55 1997

3.3.5.5 Un-mount a file system Use the unmount command to un-mount a file system (except / or /usr):

82

DB2 Universal Database in the Solaris Operating Environment

umount mount-point mount-point is the name of the file system that you want to un-mount. This can either be the directory name where the file system is mounted, the device name path of the file system, or the resource for an NFS file system.

The following example un-mounts a local home file system: umount /export/home

The following example un-mounts the file system on slice 7: umount /dev/dsk/c0t0d0s7

3.3.6 Adding disks under Solaris Adding a new disk to your system under Solaris involves a number of steps: • Connecting the disk • Creating the device files required to access the disk • Low-level formatting of the disk • Partitioning the disk • Making a new file system on each partition • Checking the integrity of the new file systems Once the disk has been physically installed, the system should recognize a new device on the SCSI bus. 3.3.6.1 Creating device files In order to access a disk device, the proper device nodes must exist in the /dev directory. These were already created during the install of the Solaris operating system. 3.3.6.2 Low-level formatting Low-level formatting prepares a disk for use and maps out bad blocks (defects) on the drive. The actual process varies from drive to drive. Most SCSI disks come pre-formatted; so, doing a low-level format by yourself is not really necessary. 3.3.6.3 Partitioning While it is possible to use a disk drive as one large file system, you can split it into slices. Partitioning is the process of splitting a disk up into several smaller sections or partitions (or disk slices). Each partition is treated as an independent file system. This increases disk efficiency and organization by

Chapter 3. Introduction to SUN's disk technology

83

localizing data, makes it easier to back up sections of the file system, and helps to keep damage on one partition from affecting the entire drive. 3.3.6.4 Format utility The format utility is used to format, partition, and label disks. It is menu driven. The raw disk device is given as an argument; if no argument is given, format will print a list of available disks and ask the user to pick one. As Sun disks come formatted with standard volume table of contents (VTOCs) on them, it is not usually necessary to have to modify or place a VTOC on the disk. If you need to modify the VTOC in the disk, you can perform the disk partitioning by the format utility. This format utility should be used very cautiously. All precautions should be taken to ensure that disk slices are not overlapped. Overlapping disk slices can result in loss of data, file systems, and even require that you re-install your operating system. Before preparing your disk space for DB2 usage, you will want to map out what is already being used by your file systems, swap space, DB2 and other applications. You can look in your /etc/vfstab file to get the location of some of these items. You will also want to verify that you are aware of all of the DB2 raw devices currently in use on your system. Once you are aware of what is stored, and where, on your disks, you are ready to begin.

# format /dev/rdsk/c2t5d0s2 selecting /dev/rdsk/c2t5d0s2 [disk formatted] Warning: Current Disk has mounted partitions. FORMAT MENU: disk type partition current format repair label analyze defect backup verify save inquiry volname ! quit format >

84

-

select a disk select (define) a disk type select (define) a partition table describe the current disk format and analyze the disk repair a defective sector write label to the disk surface analysis defect list management search for backup labels read and display labels save new disk/partition definitions show vendor, product and revision set 8-character volume name execute , then return

DB2 Universal Database in the Solaris Operating Environment

Typing format at the prompt will perform a low-level format on the disk. This is usually not necessary with a new disk, since they generally come pre-formatted, but may help to map out any additional defects the drive may have developed. The next step is to partition the drive. Type partition at the prompt to switch to the partition menu: format> partition PARTITION MENU: 0 1 2 3 4 5 6 7 select modify name print label ! quit

change `0' partition change `1' partition change `2' partition change `3' partition change `4' partition change `5' partition change `6' partition change `7' partition select a predefined table modify a predefined partition table name the current table display the current table write partition map and label to the disk execute , then return

Type in print to get a listing of the current partition table. Note that the second partition represents the entire disk: partition> print Current partition table (original): Total disk cylinders available: 60319 + 2 (reserved cylinders) Part

Tag

Flag

0

root

wm

Cylinders 0 -

63

128.00MB

(64/0/0)

262144

1

swap

wu

64 -

127

128.00MB

(64/0/0)

262144

2

backup

wu

0 - 60318

117.81GB

(60319/0/0) 247066624

3 unassigned

wm

0

0

(0/0/0)

0

4 unassigned

wm

0

0

(0/0/0)

0

5 unassigned

wm

0

0

(0/0/0)

0

6

usr

wm

7 unassigned

wm

128 - 60318 0

Size

117.56GB 0

Blocks

(60191/0/0) 246542336 (0/0/0)

0

In our example, we will be splitting the disk up into two equal partitions, numbers 3 and 4. The first partition will span cylinders 0 through 1680; the

Chapter 3. Introduction to SUN's disk technology

85

second will span cylinders 1681 through 3360. The partition size can be specified in blocks, cylinders, or megabytes by using the b, c, and mb suffixes when entering the size.

partition> 3 Part Tag 3 unassigned Enter Enter Enter Enter partition> Enter Enter Enter Enter

Flag wm

Cylinders 0

Size 0

Blocks (0/0/0)

0

partition id tag[unassigned]: partition permission flags[wm]: new starting cyl[0]: 0 partition size[0b, 0c, 0.00mb]: 1680c 4 partition id tag[unassigned]: partition permission flags[wm]: new starting cyl[0]: 1681 partition size[0b, 0c, 0.00mb]: 1680c

Once the disk has been partitioned, the label should be written to the disk:

partition> label Ready to label disk, continue? y

The new partition table can be printed from the format utility or may be viewed using the prtvtoc command:

86

DB2 Universal Database in the Solaris Operating Environment

# prtvtoc /dev/rdsk/c2t5d0s2 * * * * * * * * * * * * * * * * * * * * *

/dev/rdsk/c2t5d0s2 partition map Dimensions: 512 bytes/sector 140 sectors/track 5 tracks/cylinder 700 sectors/cylinder 3363 cylinders 3361 accessible cylinders Flags: 1: unmountable 10: read-only Unallocated space: First Sector Last Sector Count Sector 1176000 700 1176699

Partition Tag 2 5 3 0 4 0

First Sector Last Flags Sector Count Sector Mount Directory 01 0 2352700 2352699 00 0 1176000 1175999 00 1176700 1176000 2352699

3.3.6.5 Creating new file systems Finally, new file systems can be created on the disk using the newfs command:

# newfs /dev/rdsk/c0t5d0s3 newfs: construct a new file system /dev/rdsk/c0t5d0s3: (y/n)? y /dev/rdsk/c0t5d0s3: 1176000 sectors in 1680 cylinders of 5 tracks, 140 sectors 574.2MB in 105 cyl groups (16 c/g, 5.47MB/g, 2624 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 11376, 22720, 34064, 45408, 56752, 68096, 79440, 89632, 100976, 112320, 123664, 135008, 146352, 157696, 169040, 179232, 190576, 201920, 213264, 224608, 235952, 247296, 258640, 268832, 280176, 291520, 302864, 314208, 325552, 336896, 348240, 358432, 369776, 381120, 392464, 403808, 415152, 426496, 437840, 448032, 459376, 470720, 482064, 493408, 504752, 516096, 527440, 537632, 548976, 560320, 571664, 583008, 594352, 605696, 617040, 627232, 638576, 649920, 661264, 672608, 683952, 695296, 706640, 716832, 728176, 739520, 750864, 762208, 773552, 784896, 796240, 806432, 817776, 829120, 840464, 851808, 863152, 874496, 885840, 896032, 907376, 918720, 930064, 941408, 952752, 964096, 975440, 985632, 996976, 1008320, 1019664, 1031008, 1042352, 1053696, 1065040, 1075232, 1086576, 1097920, 1109264, 1120608, 1131952, 1143296, 1154640, 1164832,

One very useful way in which drivers are used in the Solaris operating environment is to manage large numbers of disks as a single volume. Using

Chapter 3. Introduction to SUN's disk technology

87

volume management software, such as Sun's Solstice DiskSuite or Veritas VxVM, software RAID arrays can be created as well as disk mirrors, striping, and even the capability to add disk volumes to existing logical file systems. They will be discussed later.

3.3.7 Back up and restore a file system This section provides instructions for backing up and restoring data in the Solaris environment. We will look, in particular, at the ufsdump and the ufsrestore commands. 3.3.7.1 Back up a file system The program ufsdump can be used to back up a complete file system. There are 10 levels of dumps. 0-9. 0 is a full dump, while levels 1-9 are incremental dumps. The lower the number, the more complete the dump. A level 1 dump will include everything changed since the last level 0 dump. A level 9 dump will only include those files changed since the last lower numbered dump. The manuals sometimes recommend some weird dump sequence involving every possible level through different days of the week, with a monthly period, to minimize tape usage. However, this makes it nearly impossible to figure out what you need to do to restore a particular file. Pick a simple schedule that's easy to follow and stick to it. For example, if you want to use the ufsdump program to make a full dump of a root file system on c0t3d0, on a 150 MB cartridge tape unit 0, you can use the command similar to the following: /usr/sbin/ufsdump 0ufsdb /dev/rmt/0 /dev/rdsk/c0t3d0s0

where /dev/rmt/0 is the default tape drive and 0ufsdb call for: 0 - Full dump; dump level (0->9). u - Update the record for dumps, /etc/dumpdates. f - Dump file, for example, /dev/nrst8, where nrst indicates no rewind. s - Size of the tape volume you're dumping to, for example, 6000 ft. d - Tape density, for example, 54000 bpi for 8mm tape. b - Tape block size, for example, 126. v - Verify. The ufsdump command keeps records in /etc/dumpdates as shown below:

88

DB2 Universal Database in the Solaris Operating Environment

#file system level date /dev/rdsk/c0t0d0s3 /dev/rdsk/c0t0d0s0

0 Tue Apr 27 17:48:23 1999 0 Tue Apr 27 17:51:21 1999

To make and verify an incremental dump at level 5 of the usr partition of c0t3d0, on a 1/2 inch reel tape unit 1, use: ufsdump 5fuv /dev/rmt/1 /dev/rdsk/c0t3d0s6

3.3.7.2 Restore a file system You can restore entire file systems, or you can interactively restore individual files with the restore command, ufsrestore. The ufsrestore utility restores files from backup media created with the ufsdump command. This program restores files relative to your current directory. On a full restore, they place a file, restoresymtable, in the current directory that is used to pass information to a further instance of restore for restoring incremental dumps. This file can be safely removed only after all of the incremental dumps have been restored. To do a complete restore of a damaged file system, for example, /dev/dsk/c0t0d1s2, you might try the following steps: 1. Clear and re-create the file system: newfs /dev/rdsk/c0t0d1s2

2. Mount the file system temporarily: mount /dev/dsk/c0t0d1s2 /mnt

3. Move to the new file system cd /mnt

4. Restore a level 0 dump of the file system: /usr/sbin/ufsrestore -r

Later, incremental dumps can then be restored. 5. Un-mount the file system: umount /mnt

6. Check the file system for consistency: fsck /dev/rdsk/c0t0d1s2

7. Mount the file system: mount /dev/dsk/c0t0d1s2 /export/home

Chapter 3. Introduction to SUN's disk technology

89

Restore can also be run interactively, and you can specify the device. The following is an example: /usr/sbin/ufsrestore -if /dev/rst9

It uses /dev/rst9 instead of /dev/rmt/0 as the file to restore from. The ufsrestore utility then first re-creates the file system in memory so that you can use some UNIX commands, for example, ls, cd, and pwd, to move around the file system. You can then add entries to a table of files to extract from the tape. A special case is restoration of the root file system. For this, you need to boot from tape, CDROM or from other machine using network. After restoring the file system, you also need to re-install the boot block program, bootblk. The boot (1M) program, ufsboot, is loaded from disk by the boot block program that resides in the boot area of a disk partition. The ufs boot objects are platform-dependent and reside in the /usr/platform/platform-name/lib/fs/ufs directory. The platform name can be found using the -i option of the uname command. To install a ufs boot block on slice 0 of target 0 on controller 1 of the platform where the command is being run, use: /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk/dev/rdsk/c1t0d0s0

As you can see, the syntax is dependent on both the hardware platform and software version; so, read the man page before using installboot. If dump_file is specified as ’-', ufsrestore reads from the standard input. This allows ufsdump and ufsrestore to be used in a pipeline to copy a file system, for example: ufsdump 0f - /dev/rdsk/c0t0d0s7 | (cd /home;ufsrestore xf -)

3.4 NFS under Solaris It is important, when using DB2 UDB in a cluster environment, to have some knowledge on NFS as it relates to the Solaris operating system. This section provides information on how to perform NFS administration tasks, such as setting up NFS services, adding new file systems to share, and mounting file systems.

90

DB2 Universal Database in the Solaris Operating Environment

3.4.1 Sharing file systems Servers provide access to their file systems by sharing them over the NFS environment. The objects that can be shared include any directory tree, but each file system hierarchy is limited by the disk slice or partition that the file system is located on. For instance, sharing the root (/) file system would not also share /usr unless they are on the same disk partition or slice. Normal installation places root on slice 0 and /usr on slice 6. Also, sharing /usr would not share any other local disk partitions that are mounted on subdirectories of /usr. You specify which file systems are to be shared with the share command and/or the /etc/dfs/dfstab file. Entries in the /etc/dfs/dfstab file are shared automatically whenever you start the NFS server operation. You should set up automatic sharing if you need to share the same set of file systems on a regular basis. Most file system sharing should be done automatically. The only time that manual sharing should occur is during testing or trouble shooting. 3.4.1.1 Share command With this command, you can make a local file system on an NFS server available for mounting. You can also use the share command to display a list of the file systems on your system that are currently shared. The NFS server must be running for the share command to work. The NFS server software is started automatically during boot if there is an entry in the /etc/dfs/dfstab file. The command does not report an error if the NFS server software is not running; so, you must check this yourself. For example, if you want the /export/home pathname to be shared read/write only to the listed clients solaris3 and fusion-en, and no other systems can access the /export/home pathname, then issue the following command: share -F nfs -o rw=solaris3:fusion-en -d "homes" /export/home

To display a list of the file systems on your system that are currently shared, execute the share command with no argument. 3.4.1.2 Set up automatic sharing The /etc/dfs/dfstab file lists all the file systems that your server shares with its clients and controls which clients can mount a file system. If you want to modify dfstab to add or delete a file system or to modify the way sharing is done, edit the file with any supported text editor (such as vi). The next time when the computer enters run level 3, which means multi-user with NFS resources shared, the system reads the updated dfstab to determine which file systems should be shared automatically.

Chapter 3. Introduction to SUN's disk technology

91

Each line in the dfstab file consists of a share command, which is the command you type at the command-line prompt to share the file system. The share command is located in /usr/sbin. Perform the following steps to set up automatic sharing: 1. Edit the /etc/dfs/dfstab file. Add one entry to the /etc/dfs/dfstab file for each file system that you want to be automatically shared. Each entry must be on a line by itself in the file and use this syntax: share [-F nfs] [-o specific-options][-d description] pathname

2. Check that the NFS service is running on the server. You can use the following command: ps -ef | grep nfs

If this is the first share command or set of share commands that you have initiated, it is likely that the NFS daemons are not running. The following commands kill the daemons and restarts them: /etc/init.d/nfs.server stop /etc/init.d/nfs.server start

This ensures that the NFS service is now running on the servers and will restart automatically when the server is at run level 3 during boot.

3.4.2 Mounting file systems You can mount file systems in several ways. They can be mounted automatically when the system is booted, on demand from the command line, or through the automounter. The automounter provides many advantages to mounting at boot time or mounting from the command line, but many situations require a combination of all three. Figure 32 gives some guidelines on which method to use for mounting file systems depending on the future usage of the file systems.

92

DB2 Universal Database in the Solaris Operating Environment

If You Need to Mount ...

?

Enter the mountcommand manually from the command line.

Local or remote file systems infrequently

? Use the /etc/vfstab file, which will mount the file system automatically when the system is booted.

Local file systems frequently

? Remote file systems frequently, such as home directories

Use the /etc/vfstab file, which will automatically mount the file system when the system is booted in multi-user state. Use AutoFS, which will automatically mount or unmount the file system when you change into (mount) or out of (unmount) the directory. Figure 32. Guidelines on how to mount a file system

3.4.2.1 Mount file systems at boot time If you want to mount file systems at boot time instead of using the automounter, edit the /etc/vfstab file. Entries in the /etc/vfstab file have the following syntax: special fsckdev mountp fstype fsckpass mount-at-boot mntopts

See the following example: sunfire-en:/db2

-

/db2

nfs

-

yes

hard,rw,intr,bg

This is an example of the entry in the /etc/vfstab file. In this example, you want a client computer to mount the /db2 directory on the server sunfire-en at the mount point /db2 with read-write access. 3.4.2.2 Mount file systems using mount command To mount a file system manually during normal operation, run the mount command as super-user: mount -F nfs -o rw sunfire-en:/db2 /db2

Chapter 3. Introduction to SUN's disk technology

93

In this case, the /db2 from the server sunfire-en is mounted on /db2 of the local system, and it can be read and written to. Mounting from the command line allows for temporary viewing of the file system. You can un-mount the file system with the umount command or by rebooting the local host. 3.4.2.3 Mount file systems using automounter File systems shared through the NFS service can be mounted using automatic mounting. Autofs, a client-side service, is a file system structure that provides automatic mounting. The autofs file system is initialized by the automount daemon (automountd), which is run automatically when a system is booted. The automount daemon runs continuously and performs mounting and un-mounting remote directories on an as-needed basis. When a client attempts to access a file system that is not presently mounted, the autofs file system intercepts the request and calls automountd to mount the requested directory. The automountd daemon locates the directory, mounts it within autofs, and replies. On receiving the reply, autofs allows the waiting request to proceed. Subsequent references to the mount are redirected by the autofs until the file system is automatically unmounted by autofs after a period of inactivity. The components that work together to accomplish automatic mounting are: • The automount command • The autofs file system • The automountd daemon The automount command, called at system startup time, reads the master map file auto_master to create the initial set of autofs mounts. These autofs mounts are not automatically mounted at startup time. They are points under which file systems are mounted in the future. These points are also known as trigger nodes. After the autofs mounts are set up, they can trigger file systems to be mounted under them. For example, when autofs receives a request to access a file system that is not currently mounted, autofs calls automountd, which actually mounts the requested file system. Since Solaris 2.5, the automountd daemon is completely independent from the automount command. Because of this separation, it is possible to add, delete, or change map information without first having to stop and start the automountd daemon process. After initially mounting autofs mounts, the automount command is used to update autofs mounts as necessary by comparing the list of mounts in the

94

DB2 Universal Database in the Solaris Operating Environment

auto_master map with the list of mounted file systems in the mount table file /etc/mnttab (formerly /etc/mtab) and making the appropriate changes. This allows system administrators to change mount information within auto_master and have those changes used by the autofs processes without having to stop and restart the autofs daemon. After the file system is mounted, further access does not require any action from automountd until the file system is automatically unmounted. Unlike mount, automount does not read the /etc/vfstab file (which is specific to each computer) for a list of file systems to mount. The automount command is controlled within a domain and on computers through the name space or local files.

3.4.3 Un-mounting file systems You can use the unmount command to remove a remote file system that is currently mounted. The umount command supports the -V option to allow for testing. You might also use the -a option to un-mount several file systems at one time. If mount points are included with the -a option, then those file systems are un-mounted. If no mount points are included, then an attempt is made to un-mount all file systems listed in /etc/mnttab except for the required file systems, such as /, /usr, /var, /proc, /dev/fd, and /tmp. Because the file system is already mounted and should have an entry in /etc/mnttab, you do not need to include a flag for the file system type. The command cannot succeed if the file system is in use. For instance, if a user has used the cd command to get access to a file system, the file system is busy until the working directory is changed. The umount command can hang temporarily if the NFS server is unreachable. To un-mount a file system mounted on /usr/man, execute the following command: umount /usr/man

3.4.4 NFS files There are several ASCII files to support NFS activities. Table 6 lists these files and their functions. Used without arguments, mount displays a list of file systems that are currently mounted.

Chapter 3. Introduction to SUN's disk technology

95

File Name

Function

/etc/mnttab

Lists file systems that are currently mounted including automounted directories (see the mnttab man page); do not edit this file.

/etc/netconfig

Lists the transport protocols; do not edit this file.

/etc/nfssec.conf

Lists NFS security services; do not edit this file.

/etc/rmtab

Lists file systems remotely mounted by NFS clients (see the rmtab man page); do not edit this file.

/etc/vfstab

Defines file systems to be mounted locally (see the vfstab man page).

/etc/default/fs

Lists the default file system type for local file systems.

/etc/dfs/dfstab

Lists the local resources to be shared.

/etc/dfs/fstypes

Lists the default file system types for remote file systems.

/etc/dfs/sharetab

Lists the resources (local and remote) that are shared (see the sharetab man page); do not edit this file.

Table 6. NFS ASCII files

The first entry in /etc/dfs/fstypes is often used as the default file-system type for remote file systems. This entry defines the NFS file system type as the default. Only one entry is in /etc/default/fs: The default file-system type for local disks. You can determine the file system types that are supported on a client or server by checking the files in /kernel/fs.

Note

Be aware that using NFS filesystems for DB2 data or logs is not supported. The only thing you can use NFS for with DB2 UDB is for the instance home directory in a DB2 UDB EEE configuration, which will be explained in Chapter 8, “DB2 UDB Enterprise-Extended Edition” on page 279.

96

DB2 Universal Database in the Solaris Operating Environment

3.5 Disk management Disk management is a very critical factor for any DBMS. It is very important to take full advantage of the disk capability (striping, mirroring, and so on) for storing your data depending on the purpose being looked for, such as performance, reliability, and so on.

3.5.1 Logical volumes Many UNIX operating systems (such as AIX, Digital UNIX, HP-UX, IRIX, and Solaris) provide optional tools (usually called logical volume managers) to enable you to add a number of partitions in various ways to create a single, large, logical partition or a logical volume. Using logical volumes, it is possible to: • Extend an existing file system on the fly without the need to repartition the disk. • Relieve severe disk I/O bottlenecks by distributing the read/writes to multiple disk controllers. • Create file systems and individual files that are larger than a single physical disk. • Mirror disks for redundancy. For Solaris, you can use the following volume manager products: • Sun StorEdge (Veritas) Volume Manager • Solstice DiskSuite

3.5.2 Sun StorEdge (Veritas) Volume Manager Sun StorEdge Volume Manager is a software based RAID tool for managing disk volumes. This volume manager is bundled as a free product with selected SUN RAID hardware products. Sun StorEdge Volume Manager provides easy-to-use, online disk storage management for commercial computing environments. Traditional disk storage management is a labor-intensive process often requiring that machines be taken off-line, which is a major inconvenience to users. Once the system is off-line, the system administrator is faced with the tedious process of backing up existing data, manually changing system parameters, and reloading the data. In today's distributed client/server environments, users are demanding that databases and other resources be available 24 hours a day, be easy to access, and be safe from damage caused by hardware malfunction. Sun StorEdge Volume Manager provides the tools to improve performance and ensure data availability and integrity through disk usage analysis, RAID 0, 1, 0+1, and 5

Chapter 3. Introduction to SUN's disk technology

97

support, dynamic multi-pathing, hot relocation, on-line resizing and backup support. Sun StorEdge Enterprise Volume Manager gives users the flexibility to configure and tune storage systems to meet their performance, availability and reliability needs. This section is describing just a very general overview of Sun StorEdge Enterprise Volume Manger version 2.6. For detailed information, refer to the Sun StorEdge Volume Manager documentation. Sun StorEdge Volume Manager is also available as a Veritas branded product under the name Veritas Volume Manager. Sun StorEdge Volume Manager builds virtual devices called volumes on top of physical disks. Volumes are accessed by a UNIX file system, a database, or other applications in the same way as physical disk partitions (disk slices) would be accessed. Volumes are composed of other virtual objects that can be manipulated to change the volume's configuration. Volumes and their virtual components are referred to as Volume Manager objects. Volume Manager objects can be manipulated in a variety of ways to optimize performance, provide redundancy of data, and perform backups or other administrative tasks on one or more physical disk without interrupting applications. As a result, data availability and disk subsystem throughput are improved. 3.5.2.1 Basic concepts There are several Volume Manager objects that must be understood before you can use the Volume Manager to perform disk management tasks: • Physical Objects • VM disks • Disk groups • Subdisks • Plexes • Volumes

Physical objects Physical objects are physical disks and partitions. A physical disk is the storage device (media) installed in our machine. It is not necessary to be under control of the Volume Manager. We will use the short PD for physical disk. The naming convention is usually like c#t#d#, where # can be any integer 0-9. The first letter (c) refers to the controller, the second (t) to the target in the corresponding SCSI bus, and the third (d) defines the disk

98

DB2 Universal Database in the Solaris Operating Environment

number. A forth letter (s) defines the slice (or partition) number (0-7). The s2 partition defines the whole disk. Thus, a partition is a part of a physical disk and is identified to the Volume manager as c#t#d#s#. You can setup either a whole disk or individual disk partitions to be managed by Volume Manager. In most cases, it is best and recommended that the complete disk is managed by Volume Manager.

VM disk A VM disk is a contiguous area of disk space from which the Volume Manager allocates storage. When you place a partition from a physical disk under Volume Manager control, a VM disk is assigned to the partition. Each VM disk corresponds to at least one partition. A VM disk is typically composed of a public region (from which storage is allocated) and a private region (where configuration information is stored). Disk group A disk group is a collection of VM disks that share a common configuration. A configuration consists of a set of records containing detailed information about existing Volume Manager objects, their attributes, and their relationships. The default disk group is rootdg (the root disk group). Additional disk groups can be created, as necessary. Volumes are created within a disk group; a given volume must be configured from disks belonging to the same disk group. Disk groups allow the administrator to group disks into logical collections for administrative convenience. A disk group and its components can be moved as a unit from one host machine to another. Subdisk A subdisk is a set of contiguous disk blocks; subdisks are the basic units in which the Volume Manager allocates disk space. A VM disk can be divided into one or more subdisks. Each subdisk represents a specific portion of a VM disk, which is mapped to a specific region of a physical disk. Since the default name for a VM disk is disk## (such as disk01), the default name for a subdisk is disk##-##. So, for example, disk01-01 would be the name of the first subdisk on the VM disk named disk01. Be aware that a VM disk can be named anything. It is administrator’s choice. Plexes The Volume Manager uses subdisks to build virtual entities called plexes. A plex consists of one or more subdisks located on one or more disks. Followings are the examples that data can be organized on the subdisks that constitute a plex: • Concatenation • Striping (RAID-0)

Chapter 3. Introduction to SUN's disk technology

99

• RAID-5

Concatenation Concatenation maps data in a linear manner onto one or more subdisks in a plex. If you were to access all the data in a concatenated plex sequentially, you would first access the data in the first subdisk from beginning to end, then access the data in the second subdisk from beginning to end, and so forth, until the end of the last subdisk. The subdisks in a concatenated plex do not have to be physically contiguous and can belong to more than one VM disk. Concatenation using subdisks that reside on more than one VM disk is also called spanning. Concatenation with multiple subdisks is useful when there is insufficient contiguous space for the plex on any one disk. Such concatenation can also be useful for load balancing between disks, and for head movement optimization on a particular disk. Figure 33 shows how data would be spread over two subdisks in a spanned plex.

B = Block of Data B1

disk01-01

Data in disk01-01

c0b0t0d0s0 c0b0t0d0

B2 B3

Physical Disk c0b0t0d0

VM disk disk01

disk01-01 B4

disk02-01

B5

disk02-01

Data in disk02-01

c1b0t0d0s0 c1b0t0d0

B7

VM disk disk02

Physical Disk c1b0t0d0

Physical Disks

VM disks

Figure 33. Example of concatenation

100

B6

DB2 Universal Database in the Solaris Operating Environment

Plex

Data Blocks

Spanning a plex across multiple disks increases the chance that a disk failure will result in failure of its volume. Use mirroring or RAID-5 (both described later) to substantially reduce the chance that a single disk failure will result in volume failure.

Volumes A volume is a virtual disk device that appears to applications, databases, and file systems, such as a physical disk partition but does not have the physical limitations of a physical disk partition. A volume consists of one or more plexes, each holding a copy of the data in the volume. Due to its virtual nature, a volume is not restricted to a particular disk or a specific area thereof. The configuration of a volume can be changed, using the Volume Manager interfaces, without causing disruption to applications or file systems that are using the volume. For example, a volume can be mirrored on separate disks or moved to use different disk storage. A volume can consist of up to 32 plexes, each of which contains one or more subdisks. In order for a volume to be usable, it must have at least one associated plex with at least one associated subdisk. Note that all subdisks within a volume must belong to the same disk group. The Volume Manager uses the default naming conventions of vol## for volumes and vol##-## for plexes in a volume. Administrators are encouraged to select more meaningful names for their volumes. A volume with one plex is shown in Figure 34. Note that volume vol01 in Figure 34 has the following characteristics: • It contains one plex named vol01-01. • The plex contains one subdisk named disk01-01. • The subdisk, disk01-01, is allocated from VM disk disk01. A volume with two or more plexes is considered mirrored and contains mirror images of the data. Refer to “RAID 1 (mirroring)” on page 104 for more information on mirrored volumes.

Chapter 3. Introduction to SUN's disk technology

101

Disk Group rootdg

Physical Disk c0b0t0d0

c0b0t0d0s0

Volume

VM Disk disk01

vol01

disk01-01 disk01-02

disk01-01 disk02-01 vol01-01

Physical Disk c1b0t0d0

c1b0t0d0s0

VM Disk disk02

Plex

Subdisks

disk02-02

Figure 34. Sun’s Volume Manager storage concepts

3.5.2.2 Accessing volumes for I/O Once you create Volume Manager’s volumes using one of the Volume Manager interfaces, users and applications can access Volume Manager volumes in the same way that they access any disk device: • Block-special files for VM volumes are located in: /dev/vx/dsk/diskgroupname

• Character-special files are located in: /dev/vx/rdsk/diskgroupname

The variable, diskgroupname, refers to the disk group name that contains the volume. Note that volumes in the rootdg disk group are located in the /dev/vol and the /dev/rvol directories too. To create a new UNIX file system (UFS) on an VM volume, use the newfs command with a disk type argument that specifies any known disk type.

102

DB2 Universal Database in the Solaris Operating Environment

3.5.2.3 RAID levels The Redundant Array of Inexpensive Disks (RAID) is a disk array (a group of disks that appear to the system as virtual disks or volumes) that uses part of its combined storage capacity to store duplicate information about the data stored in the array. This duplicate information makes it possible to regenerate the data in the event of a disk failure. This section focuses on the Volume Manager's implementations of RAID. The Volume Manager supports the following levels of RAID: • RAID-0 (Striping) • RAID-1 (Mirroring) • RAID-0 plus RAID-1 (Striping and Mirroring) • RAID-5

RAID 0 (Striping) Striping is a technique of mapping data so that the data is interleaved among two or more physical disks. More specifically, a striped plex contains two or more subdisks spread out over two or more physical disks. Data is allocated alternately and evenly to the subdisks of a striped plex. The subdisks are grouped into columns, with each physical disk limited to one column. Each column contains one or more subdisks and can be derived from one or more physical disks. The number and sizes of subdisks per column can vary. Data is allocated in equal-sized units (called stripe units) that are interleaved between the columns. Each stripe unit is a set of contiguous blocks on a disk. The default stripe unit size is 64 kilobytes.

Note

Generally you should not use a Volume Manager to provide the data striping. Striping is useful if you need large amounts of data to be written to or read from the physical disks quickly by using parallel data transfer to multiple disks. However, you can let DB2 UDB perform data striping by defining containers (see the Chapter 4, “Creating database objects” on page 113) on multiple disks and the striping managed by a Volume Manager and the striping by DB2 UDB can collide with each other. That may result in unexpected performance problems. Therefore, generally you should not use Volume Manager to provide the data striping. Use of simple mirroring, which is discussed next is adequate.

Chapter 3. Introduction to SUN's disk technology

103

RAID 1 (mirroring) Mirroring is a technique of using multiple mirrors (plexes) to duplicate the information contained in a volume. In the event of a physical disk failure, the mirror on the failed disk becomes unavailable, but the system continues to operate using the unaffected mirrors. Although a volume can have a single plex, at least two plexes are required to provide redundancy of data. Each of these plexes should contain disk space from different disks in order for the redundancy to be effective. When spanning across a large number of disks, failure of any one of those disks will generally make the entire plex unusable. The chance of one out of several disks failing is sufficient to make it worthwhile to consider mirroring in order to improve the reliability (and availability) of a spanned volume.

RAID 0+1 (striping plus mirroring) The Volume Manager supports the combination of striping with mirroring. When used together on the same volume, striping plus mirroring offers the benefits of spreading data across multiple disks while providing redundancy of data. For striping and mirroring to be effective together, the striped plex and its mirror must be allocated from separate disks. The layout type of the mirror can be striped. RAID 5 Although both mirroring (RAID-1) and RAID-5 provide redundancy of data, their approaches differ. Mirroring provides data redundancy by maintaining multiple complete copies of a volume's data. Data being written to a mirrored volume is reflected in all copies. If a portion of a mirrored volume fails, the system will continue to utilize the other copies of the data. RAID-5 provides data redundancy through the use of parity (a calculated value that can be used to reconstruct data after a failure). While data is being written to a RAID-5 volume, parity is also calculated by performing an exclusive OR (XOR) procedure on data. The resulting parity is then written to the volume. If a portion of a RAID-5 volume fails, the data that was on that portion of the failed volume can be re-created from the remaining data and the parity. 3.5.2.4 Software RAID v.s. hardware RAID Any storage can be used as a RAID device using software RAID. In this case the operating system and software RAID package (usually Sun StorEdge Volume Manager or Solstice DiskSuite) perform extra I/O operations to the disks to implement a RAID level. A synchronous write command is only complete once the data has been written to the disks. With hardware RAID such as Sun StorEdge A3500, a controller between the system and its disks

104

DB2 Universal Database in the Solaris Operating Environment

receives the standard I/O commands from the operating system and issues appropriate I/O commands to the disks. Hardware RAID controllers also includes cache memory improve their performance. The cache is usually battery-powered, making it nonvolatile, and allowing the controller to acknowledge a synchronous write once the data is in cache rather than after it reaches the disks. 3.5.2.5 Basic administration The Volume Manager provides three interfaces that can be used to manage disks with the Volume Manager: • The Visual Administrator graphical user interface • A set of command-line utilities • The vxdiskadm menu-based interface We will now discuss some utilities that Sun StorEdge Volume Manager provides. vxdiskadm --This provides the Volume Manager Support Operations menu interface. Each entry in the main menu leads you through a particular operation by asking questions and providing you with information. Default answers are provided for many questions so that common answers can be selected easily. This script is intended primarily for users who understand only a limited set of concepts, and also for users who wish to use a simple method for doing common operations. The following is the primary menu of the vxdiskadm utility:

Chapter 3. Introduction to SUN's disk technology

105

# /usr/sbin/vxdiskadm Volume Manager Support Operations Menu: VolumeManager/Disk 1 2 3 4 5 6 7 8 9 10 11 12 13 list

Add or initialize one or more disks Encapsulate one or more disks Remove a disk Remove a disk for replacement Replace a failed or removed disk Mirror volumes on a disk Move volumes from a disk Enable access to (import) a disk group Remove access to (deport) a disk group Enable (online) a disk device Disable (offline) a disk device Mark a disk as a spare for a disk group Turn off the spare flag on a disk List disk information

? ?? q

Display help about menu Display help about the menuing system Exit from menus

Select an operation to perform:

vxdiskadd --This utility is used to add standard disks to the Volume Manager. vxdiskadd leads you through the process of initializing a new disk by asking questions and displaying information. vxdisk --This is the command-line utility for administering disk devices. vxdisk is used to define special disk devices, to initialize information stored on disks that the Volume Manager uses to identify and manage disks, and to perform additional special operations. See the vxdisk manual page for complete information on how to use vxdisk. vxdg --This is the command-line utility for operating on disk groups. This can be used to create new disk groups, to add and remove disks from disk groups, and to enable (import) or disable (deport) access to disk groups. See the vxdg manual page for complete information on how to use vxdg. vxassist --The vxassist utility is an interface to the Sun StorEdge Volume Manager that finds space for and creates volumes, adds mirrors and logs to existing volumes, extends and shrinks existing volumes, provides for the migration of data from a specified set of disks, and provides facilities for the online backup of existing volumes. The vxassist command supplies a keyword that selects the action to perform. See the following command:

106

DB2 Universal Database in the Solaris Operating Environment

vxassist make mirvol 500m layout=mirror,log mirror=ctlr !ctlr:c2

This requests that vxassist create a new mirrored volume on any disks that are not on controller 2. The selection of disks is constrained by the mirror=ctlr attribute such that no disks within a mirror can be on the same controller as any disks on the other mirror. vxprint -- The vxprint utility displays complete or partial information from records in Volume Manager disk group configurations. Records can be selected by name or with special search expressions. Additionally, record association hierarchies can be displayed in an orderly fashion so that the structure of records is more apparent. To display all subdisks and all disk groups, in sorted order by disk, use: vxprint -AGts

3.5.2.6 The Visual Administrator The Visual Administrator's primary function is to provide a graphical user interface to the Volume Manager. In addition, the Visual Administrator acts as an interface to several common file system operations; some of these file system operations are supported for the Veritas file system (VxFS) only. The Visual Administrator represents the various Volume Manager objects through icons. When a change is made to existing Volume Manager objects using the Visual Administrator (or another Volume Manger interface), every open Visual Administrator session on the same system automatically adjust its icons to reflect the change. Volumes are usually composed of plexes (mirrors), which are composed of subdisks. Volume icons, therefore, typically contain associated plex icons, which, in turn, contain associated subdisk icons. While both volume and plex icons can appear alone and with no associated components, they are not useful in this form. You can invoke the Visual Administrator by issuing the following command: /opt/SUNWvxva/bin/vxva

The Visual Administrator can also be invoked in tutorial mode by issuing the following command: /opt/SUNWvxva/bin/vxva -t

In tutorial mode (or demo mode), you can initialize virtual disks into rootdg and create volumes. In this mode, there is no effect on the real disks or real data. The following four figures illustrate the use of the Visual Administrator in tutorial mode:

Chapter 3. Introduction to SUN's disk technology

107

Figure 35. View of disks(1)

Figure 36. View of the disks of the rootdg volume(2)

108

DB2 Universal Database in the Solaris Operating Environment

Figure 37. Pull-down menus to create a volume(3)

3.5.3 Solstice DiskSuite Solstice DiskSuite 4.2 is a software product that enables you to manage large numbers of disks and the data on those disks. Although there are many ways to use Solstice DiskSuite, major functions are intended to increase: • Storage capacity • Data availability • I/O performance Solstice DiskSuite uses virtual disks to manage physical disks and their associated data. In Solstice DiskSuite, a virtual disk is called a metadevice. A metadevice is functionally identical to a physical disk in the view of an application. Solstice DiskSuite converts I/O requests directed at a metadevice into I/O requests to the underlying member disks. Solstice DiskSuite's metadevices are built from slices (disk partitions). An easy way to build metadevices is to use the graphical user interface, Solstice DiskSuite Tool, that comes with Solstice DiskSuite. To start this GUI tool, execute the following command: /usr/opt/SUNWmd/sbin/metatool

Solstice DiskSuite Tool presents you with a view of all the slices available to you. By dragging slices onto metadevice objects, you can quickly assign slices to metadevices.

Chapter 3. Introduction to SUN's disk technology

109

If, for example, you want to create more storage capacity, you could use Solstice DiskSuite to fool the system into thinking that a collection of many small slices is one physical disk. After you have created a metadevice from these slices, you can immediately begin using it just as any real disk. Solstice Disk Suite can also increase the availability of data by using mirrors and RAID-5 metadevices. Mirrors and RAID-5 metadevices replicate data so that it is not destroyed if the disk on which it is stored fails. The failed mirror or RAID-5 components can be replaced automatically by the hot-spare facility which Solstice DiskSuite supplies. This facility migrates new partitions to replace failing ones on-line. Users can continue to access the surviving copy of the data with no interruptions of operation. Solstice DiskSuite can improves both I/O and system performance by using disk striping. I/O load can be spread over several disks to increase the throughput by disk striping. Solstice DiskSuite also provides a performance monitor to more effectively monitor and manage disk subsystems. It helps to minimize performance bottlenecks by identifying potential I/O bottlenecks before they occur.

110

DB2 Universal Database in the Solaris Operating Environment

Chapter 3. Introduction to SUN's disk technology

111

112

DB2 Universal Database in the Solaris Operating Environment