![]() |
![]() HP OpenVMS Systemsask the wizard |
![]() |
The Question is: I read with interest your reply in article "(2618) RMS indexed file tuning and disk cluster factors" and noted especially comments that a disk cluster factor 50 was "large". At our installation we commonly work with cluster factors of 1024 upto 8192 blocks. File sizes for some of the most important indexed files range 0.5 million blocks upto one file of 15 million blocks per file. The disks are mostly RAID 5 sets served by HSZ50 controllers with chunk size normally 256 blocks. The file of 15 million blocks is obviously of particular interest so far as tuning goes. It resides on a raid 5 set, chunk size 256 blocks and cluster size 768. We were handed an FDL file similar to the following FILE CONTIGUOUS no GLOBAL_BUFFER_COUNT 10 ORGANIZATION indexed RECORD BLOCK_SPAN yes CARRIAGE_CONTROL carriage_return FORMAT fixed SIZE 520 AREA 0 ALLOCATION 10 BEST_TRY_CONTIGUOUS yes BUCKET_SIZE 20 EXTENSION 10 AREA 1 ALLOCATION 10 BEST_TRY_CONTIGUOUS yes BUCKET_SIZE 5 EXTENSION 10 KEY 0 CHANGES no DATA_KEY_COMPRESSION yes DATA_RECORD_COMPRESSION yes DATA_AREA 0 DATA_FILL 50 DUPLICATES no INDEX_AREA 1 INDEX_COMPRESSION no INDEX_FILL 80 LEVEL1_INDEX_AREA 1 NAME "" NULL_KEY no PROLOG 3 SEG0_LENGTH 22 SEG0_POSITION 1 TYPE string After performing an ANALYSE/RMS/FDL on this file, we obtain an FDL similar to he one listed below FILE ALLOCATION 14970624 BEST_TRY_CONTIGUOUS no BUCKET_SIZE 20 CLUSTER_SIZE 768 CONTIGUOUS no EXTENSION 65535 FILE_MONITORING no GLOBAL_BUFFER_COUNT 10 NAME "DISK14:[CABSPROD.DAT.BILLING]CIARH.DAT;119" ORGANIZATION indexed OWNER [CABSPROD,DBA_CABSPROD] PROTECTION (system:RWED, owner:RWED, group:RE, world:) RECORD BLOCK_SPAN yes CARRIAGE_CONTROL carriage_return FORMAT fixed SIZE 520 AREA 0 ALLOCATION 14921088 BEST_TRY_CONTIGUOUS yes BUCKET_SIZE 20 EXTENSION 65535 AREA 1 ALLOCATION 47872 BEST_TRY_CONTIGUOUS yes BUCKET_SIZE 5 EXTENSION 1248 KEY 0 CHANGES no DATA_KEY_COMPRESSION yes DATA_RECORD_COMPRESSION yes DATA_AREA 0 DATA_FILL 80 DUPLICATES no INDEX_AREA 1 INDEX_COMPRESSION no INDEX_FILL 80 LEVEL1_INDEX_AREA 1 NAME "" NULL_KEY no PROLOG 3 SEG0_LENGTH 22 SEG0_POSITION 1 TYPE string ANALYSIS_OF_AREA 0 RECLAIMED_SPACE 0 ANALYSIS_OF_AREA 1 RECLAIMED_SPACE 0 ANALYSIS_OF_KEY 0 DATA_FILL 77 DATA_KEY_COMPRESSION 75 DATA_RECORD_COMPRESSION 62 DATA_RECORD_COUNT 29268606 DATA_SPACE_OCCUPIED 14915500 DEPTH 4 INDEX_COMPRESSION 0 INDEX_FILL 78 INDEX_SPACE_OCCUPIED 47305 LEVEL1_RECORD_COUNT 745775 MEAN_DATA_LENGTH 520 MEAN_INDEX_LENGTH 25 When using both of these files as input to an EDIT/FDL/NOINTERACTIVE/ANALYSE= , the resultant FDL specifies a bucket size of 63 no matter what I stipulate the cluster factor to be in the input FDL. Do you think I should use this size bucket or the 20 bloc ks? Access to this file is mostly by single processes either producing copies or processing records by index. Are there any other factors which I should be considering? We also have a whole series of file between 1 and 3 million blocks which are indexed and with a supplied FDL which stipulates just one AREA for the file. The result of the ANAL/RMS/FDL suggests we split it into 2 areas with bucket sizes of 63 blocks again . This is a production system where time (and hence performance) is critical, but where little experimentation is possible so I am reluctant to suck it and see. Do you have any advice for us? What areas should we be looking at? The Answer is : Cluster factors of 1024 are reasonable, particularly when you are dealing with a small number of rather large files. The cluster size in the ANALYZE input will be used by EDIT/FDL, so you will likely have to manually edit the FDL file to ensure you have the necessary control over the bucket size. Your choice of bucket size appears appropriate for this situation. If the application primarily retrieves and reads a record by the index key, then updates and moves on to an other unrelated record, then you will typically want a smaller bucket size, otherwise bandwidth will be wasted transfering unnecessarily large blocks of data. For instance, a bucket size of 63 will cause RMS to transfer 32 kilobytes into and then back out again to update a single (say) 500 byte record. These transfers are obviously questionable extra I/O activity, at best. If adjacent records are processed sequentially, then a larger bucket size can be called for, but with a 20 block bucket size you are already reducing the number of I/O operations to once every 20 records. Once every 63 may not particularly help, may hinder performance for accessing smaller groups of records. When considering other performance factors, also consider the index depth. On the orginal file, the depth is 4. That value is inappropriate. (The proposed FDL bucket size will fix that.) Index depth and bucket size can also be related here -- if a bucket size of 63 is used to get you to an index depth of 63, then there is some incentive to go to larger bucket sizes. As for bucket size, consider a compromise -- consider a bucket size of 32 for this case. This particular value also happens to be a factor of the specificed disk cluster factor. You will be unlikely to be able to measure the effect of two areas. The use of these areas permits RMS to work with multiple bucket sizes within the file, but if both are equal (63) then this capability is obviously not particularly applicable. Multiple areas will also allow you to place each area on independent disks in (for instance) a bound volume set, but very few people go to this trouble. Multiple areas can also allow you to place all 'hot' areas for multiple files close to one another reducing average seek times and put the bulk of the data "out of the way" for occasional access. Again, this capability is infrequently used -- it requires a non-trivial effort to establish the initial layout, as well as detailed knowledge of both the application I/O patterns and the disk behaviour. Global buffers can greatly assist performance of shared files. The current value of 10 is inordinately small for a shared file -- the OpenVMS Wuzard would encourage 200 or more buffers as a test. Global buffers trade off memory use for disk I/O, and this is almost always a performance win. To learn more about the access patterns and the global buffer activity, you can enable file statistics collection and use MONITOR to track the effects of the changes.
|