The TurboIMAGE/XL FAQ Hosted by Beechglen Development Inc www.beechglen.com Last Updated: Wednesday, November 25, 1998 This is the TurboIMAGE/XL FAQ. It is meant to be a guide for beginning programmers and data base administrators. NOTE; THIS IS A WORK IN PROGRESS any input, updates, or corrections are most welcome, please send them to: imagefaq@beechglen.com Another terrific on-line resource for the HP3000 is the HP3000 FAQ. It can be located at http://www.3k.com/index_publications.html Any questions or comments should be sent to rcb@3k.com. An indispensable "On-line Encyclopedia for HP Users" can be found at http://www.robelle.com/library/smugbook/ The TurboIMAGE/XL Database Management System Reference Manual can be found at http://docs.hp.com:80/dynaweb/smpe/b1019/b840/@Generic__BookView TABLE OF CONTENTS 1.0 How do I get started? (Introduction) 2.0 What are some of the fundamentals of data management? 2.1 Placement of data on computers 2.2 Data management laws 2.3 History of data management 2.4 How files are accessed 2.5 Data base technologies 3.0 What is the Image Data Base Nomenclature? 3.1 Image terms and vocabulary 3.2 Types and relation of files 3.3 Record formats 3.4 Major subsystems 4.0 How do I define and process a Schema? 4.1 The data base program 4.2 Capacities 4.3 DBschema 4.4 DButil 5.0 What are the Data Base Access Intrinsics? 5.1 Parameters and data types 5.2 DBopen & DBclose 5.3 DBlock & DBunlock 5.4 DBerror & DBexplain 5.5 DBput & DBdelete 5.6 DBget & DBfind 5.7 DBupdate & DBcontrol 5.8 (Dbbegin,DBxbegin) & (Dbend, DBxend) 5.9 DBxundo 5.9 DBinfo 6.0 What are the Utilities and Tools? 6.1 DBSTORE & DBRESTORE 6.2 DBLOAD & DBUNLOAD 6.3 Capacity, structure change, and diagnostic utilities 6.4 Logging and recovery 6.5 DBDRIVER 6.6 MPEX 6.7 DBLOADNG/Howmessy 7.0 How do I begin the design process? 7.1 Defining the data elements 7.2 Building hierarchical structures 7.3 Folding hierarchies 7.4 Prototyping 7.5 Walk Through 8.0 How do I improve on an existing design? 8.1 Common design situations 8.2 Moving data about the shell game 8.3 Critical design assumptions ------------------------------------------------------------------- 1.0 How do I get started? (Introduction) Most people initially learn Image by a slow process of osmosis from their current application environment. Thus the success of the initial learning process usually depends on the skill and consciousness of the people that created and maintain this environment. The development, maintenance, and access of Image data bases can be viewed from many perspectives, the programming aspects, the system management aspects, Operating System aspects, and the user access aspects. This could also be stated as the push-pull relationship between those who create data bases and those who must live with them and keep them working on a day-to-day basis. There are obvious "trade offs" and gray areas between these various types of activities. On one hand, the push is to get the application developed and turned over into production as soon as possible, and on the other hand the priority is to get as many users on the system as possible while providing reasonable response times. It is therefore extremely important that both perspectives understand the limitations that Image places on the other. This is the only way to achieve a successful, cost effective balance between development and production. One way I've heard this simply stated is: "Image" is a do-it-yourself brain surgery kit, you could make yourself as smart or as dumb as you want. It all depends on what you know." Fred White, ADAGER. Specific to Image, there are many features that can be abused by developers that can prove to be a nightmare for those who have to perform maintenance. On the other hand there are many maintenance utilities and subsystems that can be very useful to developers and thus speed up implementation times. Actually to most people, be it applications or system management orientated, Image is a myth, and data bases past and present are legends. The first myth is shown in the three names that Image goes by. Image, TurboIMAGE, and TurboIMAGE/XL. Image was the original name, Turbo Image was a major rewrite of Image that was prompted by the larger number of users supported on the HP3000 Series 6x's, and 70's. And, Turbo Image/XL is a third generation rewrite of Image prompted by the migration to the MPE/XL versions of the HP3000's. From a development and system management perspective, these different versions of Image are very compatible. So, throughout this FAQ only the differences will be pointed out and they will be collectively referred to as simply Image. Another aspect of the Image myth is that it becomes all things to all people. Like any data base management application, Image has its strong points and its weak points. The latter sections of this FAQ describe some of the ways that application and data base designers can get into trouble attempting to overcome Image's weaker points. A final comment on the Image myth is that many system managers and programmers are very superstitious about their data bases. They feel that Image is built of many black boxes, mirrors, magic dust, and secret incantations the details of which are known only to wizards and high priests (HP). This FAQ methodically breaks down these black boxes explaining the components, sweeps out the magic dust, and teaches you the appropriate time and places to invoke the secret incantations. And, this will, of course, make you an Image wizard in the process. ------------------------------------------------------------------- 2.0 What are some of the fundamentals of data management? 2.1 Placement of data on computers The key reason that data is placed on computers is for the ability of the computer to retrieve, sort and summarize. One of the most important aspects of any scheme for sorting, retrieving and reporting is that data should only be sorted on a computer if it has some need to be recalled at a later date, and all data stored should be time stamped so that at some future date, its usefulness can be determined. Data that is stored in any fashion that is not likely to be retrieved and is not obsolete over time will add to the overhead and expense of the more heavily accessed data. 2.1.1 removed 2.1.2 The value of information as a corporate asset The value of information as an asset to the company is dependent on three factors: capacity, reliability and security. Capacity is the capability of the computer to meet the information demand. Information is only valuable if it can provide cost savings or business improvements on a timely basis. Reliability is the quality measurement for a computer's ability to meet information demands on a consistent basis. And security is the attribute which measures the ability to grant access to information on a need to know basis. 2.2 Data Management Laws There are certain rules of thumb, or laws, that can be described in general terms and apply to many situations, including data management. The following are my top four "laws" of data management. I hope you find them amusing and/or insightful. 2.2.1 To err is human, to really foul things up requires a data base. All methods for data storage and retrieval are prone to error situations. We simply haven't got the technology to the point where it is infallible, and certainly evolution hasn't done much to alleviate the problems of human error. The point is that all data base systems must be designed with technical and human failures in mind. To this end the KIS (keep it simple) principle is the only weapon, because the failure rate of any system increases directly with its level of complexity. 2.2.2 Separating keys and pointers from data for faster retrieval The cornerstone of information storage and our information society is the index file. We use them every day, in hundreds of different ways, woven into every day life. My favorite example is the card catalog at the library. This index allows us to search on three possible values. author, title, and subject to quickly determine if a book or books exist and their relative location in the library. The alternative method of scanning each book on each shelf would be many times more time consuming than using the card catalog. The card catalog is not free and has its drawbacks. Cards must be added and removed as books are added and removed from the library, and the books must be arranged in a specific order on the shelves. If you ever want to get on the bad side of a librarian, just put a book back on the wrong shelf. In this case the work expended in maintaining the card catalog and the location of the books is justified by the need to speedily find and retrieve books of interest. Computerized data management schemes use the same principle of separating the index information from the data to allow faster access to the specific records of interest. While computerization allows multiple indices into the data, it should be noted that the marginal value of additional indices drops significantly after two or three, while the overhead to maintain the index is usually linear. 2.2.3 It's when you do the work that counts The decision of which items to index on and how to maintain the data is highly application dependent. If an application's primary function is to produce batch reports, then building in indices that are maintained as each record is added or deleted is probably a waste. On the other hand, an application is required to retrieve transaction information based on a variety of possible key values, then it probably makes sense to maintain indices for each of the possible key values. 2.2.4 There is no such thing as a free lunch All throughout the following discussions of general data management and specific HP3000 and Image issues we will find many trade offs of functionality for a given expense. Generally, the more functionality built into an application, the greater the expense in terms of computer resources required. The only magic trick is to match the functionality of the application with the capabilities of the system and data base software. 2.3 History of Data Management The only way to completely understand the state of the art in data management, we must first discuss the evolutionary history from the past to the present. This will, hopefully, clear up some of the seemingly confusing terms and vocabulary. 2.3.1 Punched card files The first computers used punched cards to store and retrieve information. There were two ways to retrieve information, depending upon whether the file of cards was sorted or not. If the card file wasn't sorted then to find a specific card, the entire file would have to be read to find each card of interest. If however, the card was sorted on the item of interest, then we could scan the file until we passed the point where the cards of interest would have been if present. The sorting of the cards really paid off in the ability to summarize and report for different occurrences of the key value. 2.3.2 Tape files The next innovation in data management was the use of magnetic tape to store greater volumes of information and allow much faster scanning of individual records. The tapes still did not allow indexing, as you still had to read from the beginning of the file going forward until you selected the records of interest. 2.3.3 Disc files After magnetic tape, the next innovation was to spread the Magnetic material on discs and allow the read write head to move over the data. This finally allowed the ability to read records in any sequence. A good analogy would be a tape player versus a compact disc. On your tape player you must listen to each song in sequence on the tape (unless you manually rewind), but on your compact disc you can choose which songs and in which sequence you would like them to play. This capability is called random access. The next song or record played does not have to be physically adjacent to the previous. 2.3.4 Random access Random access capability finally allowed computers to take advantage of indices. As in the library example, key files were built that had pointers to the location on the disc where the data record of interest could be found. These pointers are then used to directly read the desired record. The only aspect which is random is the fact that the record could be anywhere on the disc. The time to read a given record is made up of three components. the time to locate the read/write head over the data, the time to wait for the disc to spin so the data record of interest is under the head, and the time to transfer the data from the magnetic media to the computer. The major portion of the total time is the time to locate the head over the data. 2.3.5 RAM and ROM discs As computers have become increasingly more powerful, the demand for disc IO speeds has also grown. This is because all computers from mini's to mainframes wait for disc I/o's at the same speed. One way to increase disc IO speed is to have more disc drives. Another way is to store the most active data in the RAM (random access memory) memory of the computer and only write it out to disc when the record is changed. This is required because the in memory version of the record would be lost in the event of a power failure, while the disc version of the record will be maintained until it is overwritten. ROM (read only memory) discs are becoming available that store huge quantities of information on a read only basis. These discs have many indices built in for fast retrieval of data. 2.4 How files are Accessed Now that we've discussed a little of the history, we can get down to the specific methods used by Image to access data. It is very important to understand these access methods as they form the very foundation for any further discussion. 2.4.1 Serial access Serial access is the reading of a file from front to back, or back to front. The key connotation is that the file is not sorted in any fashion, so that all records must be read to determine if the specific ones of interest are found. 2.4.2 Sequential access Sequential access is the reading of a file from front to back or from back to front, but the file is specifically sorted so that the search can be discontinued when the point at which the records should have been has past. 2.4.3 Directed access Directed access is the ability to read a specific record number in the file, without having to read any other preceding records. In and of itself, this isn't very useful unless the key value is also the relative record number, or directed access is used in conjunction with calculated or chained access. The case of the key value being the relative record number can be exemplified by having the customer number be the record number in the file, thus to retrieve a customer record we could directly read the record at the customer number location in the file with one read. 2.4.4 Chained access Chained access is simply the storing of relative record numbers with the data as pointers to other logically linked records. For example, in our customer record which we accessed by customer number, we could store the relative record number of the first order record in the orders file and in each order record store the relative record number of the next order record for that customer, thus forming a linked list or chain of entries for each customer. 2.4.5 Calculated access The simple case for the index file is that the key is also the relative record number. This, while simple to implement, leaves something to be desired when selecting possible key elements. What if we wanted to key on something other than a number, i.e. customer name. What if the number we wanted to key on was too large of a range for the disc space available, i.e. social security number. What is required is a calculation for generating a relative record number for these keys such that the result is between 1 and the capacity of the file. This process is called hashing and its purpose is to allow direct reading of the index file record which is of interest. +-----------------------------------------------------+ Data | | | | | | | | | | EOF Records | A00 | Z33 | B01 | U23 | M91 | P00 | F55 | T07 | A00 | Mark +-----------------------------------------------------+ Relative 0 1 2 3 4 5 6 ..... n Record ^ Number Direct Access Read Record #6 Serial >--^------^-----^-----^-----^-----^-----^-----^-----^-- Access Find Record With ID # = A00 Must Read Each Record 2.5 Data Base Technologies Different data base technologies are no more than the packaging of a standardized method of accessing data. The package will only be successful if it meets the tests for capacity, reliability and security. There are many different data base management packages on the market today. They can be classified into three general categories called models. hierarchical, relational and network. These models refer to the fashion in which indexing is used to link different logically related files via pointers and chains. 2.5.1 Hierarchical models In the hierarchical models of data bases, the files are linked from top to bottom, left to right, to form a single master logical record. This forms a tree structure, and there is a single master file that is called the root to which all of the remaining files are related either directly or indirectly. In this model all the indices or relationships of the files in the tree structure are fixed and cannot be changed without unloading all of the data, modifying the data base definition and reloading the data. Hierarchical models lend themselves to very large transaction processing and batch reporting applications. 2.5.2 Relational models In relational data base models the indexing or relation of one file to another is more flexible. Indexes can be built and deleted on an as required basis. This is done at the user level by linking two or more files by a common field. This can be a very powerful tool for application environments where the amount of data is small and the information requirements are very fluid. 2.5.3 Network models The network model is similar to the hierarchical model except there can be more than one root file. Image is a two level network model. The first level of files are called master data sets, and the second level of files are called detail data sets. A data set is the Image term for a file. The master data set serves as an index into up to sixteen detail data sets, and the detail data set can have indices from to one to sixteen master sets. ------------------------------------------------------------------- 3.0 What is the Image Data Base Nomenclature? 3.1 Image terms and vocabulary In order to appear to have some unique corner on data base technologies, the marketers of various data base software will invent a new set of acronyms and vocabulary to make the old and common place seem new and worth paying for. So, each data base package seems to have its own jargon, and it may help to equate these image terms with some other ones you may be more familiar with. Capacity - The number of records or entries a data set may contain. Chain - A set of logically related records in a detail data set which are linked together by pointers to the previous and next entry. Chain Head - The pointers in the master data set that point to the first and last entry in the related detail that contain the logically related entries. Data Entry - A record in the data set made up of data items or fields. Data Item - The named field of a given data type. Data Base - The set of related files or data sets. Hashing - The calculation that accepts a key value as input and generates a relative record number between 1 and the capacity of the index file. Path - The physical relationship between the master data set and the linked records of a detail data set. Relative Record Number - The number which represents the distance from the first record in the file to the current record in count sequence. Search Item - The data item whose content is used to logically form a path from the master to the linked list of detail records. 3.2 Types and relation of files 3.2.1 Manual masters Image master data sets are index files which can optionally hold data items. The master file can only have one key value and this entry must be unique. This is because key access to this file is via a calculated access based on a hashing algorithm. There are two kinds of master data sets: manual and Automatic (kind of like cars). Manual masters can have additional data items besides the search item. The name "manual" comes from the fact that these data entries must be written and deleted by an application program. Manual masters work well if there are a few small fields and there will never be a need to access the records on one of the non-search item fields. The non search item access would require a serial search of the entire master data set. 3.2.2 Automatic masters The Automatic master cannot have any non-search item fields as the data entries are added and deleted as required by Image to serve as index records for detail data sets. Automatic masters are easily added and deleted from the network design using any one of the Image utility packages. Both Automatic and manual masters can contain chain heads, pointers to logically related detail data set records. This is the real power of Image as it allows the ability to make subsets of entries in the associated detail data sets. This chain head holds the relative record numbers and count for the first and last entries of the linked list of detail records. 3.2.3 Detail data sets Detail data sets are used to hold records which are accessed via the linked lists forming chains or in front to back or back to front serial fashion. 3.2.4 Master to detail relationships The value of the path from a master to a detail data set depends entirely on the average number of detail records tied to a master entry. This ratio can range from 0 to the number of entries in the detail data set. If the average is greater than ten percent of the number of detail entries, then the key value chosen is a waste because it doesn't make a small enough subset to warrant chasing the chain. It would be easier to serially search each detail. A frequent example is plant code in a company with two or three plants. If an inventory detail with 999,991 entries, the average plant chain length would be 500,000 to 333,333, not a very useful subset. On the other hand, if the average is less than one, it is probably a design with a small detail tied to a large master. The large master is probably linked to a separate large detail. This is a waste because the master in question must have pointers to the small detail that it can never use. 3.3 Record Formats The actual file record for an image data set is made up of a bit map, pointer information, and at least one data item. The records are blocked together based on a parameter that specifies the maximum block size, and Image takes care of fitting as many records as it can in that block and wasting any leftover space. Logical Record Format of an Image Master Data Set |Chain Head for Path 1| |Chain Head for Path n| | Data Part of Entry | +---------------------+ +---------------------+ +---------------------+ |CHAIN | FIRST | LAST | |CHAIN | FIRST | LAST | |KEY | DATA ITEMS FOR| |COUNT | ENTRY | ENTRY| |COUNT | ENTRY | ENTRY| |ITEM | MANUAL MASTER| +---------------------+ +---------------------+ +---------------------+ | 3 | 3 | 1 | | 50 | 22 | 21 | |TEST | TEST DATA | +---------------------+ +---------------------+ +---------------------+ | | | | | | +-------------------------------------------+ | | | Detail Data Set | | | +-------------------------------------------+ | | | Forward | Backward | Key | Data Items | | | | Pointer | Pointer | Item | | | | +-------------------------------------------+ | +--> 1 | 0 | 2 | TEST | | | +-------------------------------------------+ | 2 | 1 | 3 | TEST | | | +-------------------------------------------+ +---------->3 | 2 | 0 | TEST | | +-------------------------------------------+ Physical Block Format of an Image Data Set +----------------------+ +---------------------+ +---------------------+ |MPE/V File Label Block| | 1st Physical Block | | Nth Physical Block | +----------------------+ +---------------------+ +---------------------+ |MPE | USER | EMPTY | |BIT | MEDIA | MEDIA | |BIT | MEDIA | MEDIA | |FILE | LABEL | TO | |MAP | RECORD | RECORD| |MAP | RECORD | RECORD| |LABEL | | END | | | 1 | N | | | | | +----------------------+ +---------------------+ +---------------------+ +------+ +-----------------------------+ +---------------------+ |MPE/XL| | 1st Physical Block | | Nth Physical Block | +------+ +-----------------------------+ +---------------------+ |FILE | |USER | BIT | MEDIA | MEDIA | |BIT | MEDIA | MEDIA | |LABEL | |LABEL | MAP | RECORD | RECORD| |MAP | RECORD | RECORD| |TABLE | | | | 1 | N | | | | | +------+ +-----------------------------+ +---------------------+ 3.3.1 Bit maps At the front of each block, Image builds a bit map to represent the availability of the records in the block. This saves time in accessing the individual entries of the block to see if it is occupied. These maps play an important role when the master or detail is serially searched, in that especially the case of masters we may read many empty blocks before one is found that contains a record. Bit Map Format of an Image Data Set +----------------------------------------------------------------+ | Bit Map | ENTRY 1| ENTRY 2| ENTRY 3| ENTRY 4| ENTRY 5| ENTRY 6| +----------------------------------------------------------------+ | 11001000 | MEDIA | MEDIA | EMPTY | EMPTY | MEDIA | EMPTY | | 00000000 | RECORD | RECORD | RECORD | RECORD | RECORD | RECORD | +----------------------------------------------------------------+ 3.3.2 Pointer information Remember that pointers are simply stored relative record numbers that form the physical structure of the data base. The pointers in master records form a chain head consisting of a first entry and a last entry pointer and a total entry counter. This chain head occupies six words (12 bytes) for each chain head or path for a total up to 16. In each master record there is also a chain for entries whose search item hashes to the same relative record number. When this "collision" happens, the entries are chained together on a secondary chain. In the detail each path has a set of forward and backward pointers which allows the ability to read, add to, and delete entries in the chain. The chain pointers in the detail take up four words (8 bytes) for each path for a total of up to 16 paths. 3.3.3 Data content The data content of each entry is made up of the data items specified for the data set. The maximum data entry size is 2048 words (4096 bytes). 3.3.4 Blocking factors The blocking of records is done by specifying a maximum block size in the data base definition, and Image works backwards to fit the optimum number of records per block as to not waste disc space. The actual record is made up of the bitmap plus the pointers plus the actual data items, and is called a media record. The default block max is 512 words (1024 bytes or four sectors) and should be changed to 2560 words (5120 bytes or 20 sectors). This will allow Image to pick the most optimum value. 3.4 Major Subsystems Image is a complete data base package. It comes with the utilities for building and installing the data base, subprograms for accessing the data base programmatically called intrinsics, utilities and tools for maintenance and recovery, and a report writer called QUERY/3000. 3.4.1 Implementation overview In the beginning, the programmer designs reports and interactive screens that will make up the particular application or system desired. The data items from the report and screen layouts are defined in terms of type and relationship to each other. The data base is designed and coded into the form of a data base definition called a schema. This schema is processed by a compiler into a set of files which form the data base. The application programmer can now build programs that access the data base via subprograms called intrinsics. During the debugging process, QUERY could be used to load test data and to interactively check the results of programs under test. The various data base utilities and tools are used to configure and maintain the data base while in production. 3.4.2 Schema definition and processing The schema is developed either by manually programming it using EDIT/3000, QUAD or some other text editor, or generated automatically from a data dictionary. Either way the data base program is compiled and the resulting object module is then used to build the final data base. 3.4.3 Data base access intrinsics The data base access intrinsics are the only way to access the data base. All utilities and applications must use this set of subprograms or privileged mode. Accessing an Image data base in privileged mode is not supported by Hewlett Packard and is best left up to the experts. These intrinsics provide the common access point for controlling security and reliability of the data contained within. 3.4.4 Utilities and tools There are many data base utility tools to facilitate maintenance of Image data bases. Later chapters will go into the HP supplied and more popular contributed library and purchasable utilities. 3.4.5 QUERY\3000 Query is a non-programming tool for entering test data, modifying limited amounts of data online, and producing simple reports. Query is also useful for displaying the current number of entries and the structure and content of data bases. ------------------------------------------------------------------- 4.0 How do I define and process a Schema? 4.1 The data base program The data base is specified using a simple program called a schema. This program is a simple free format MPE file which is input to a compiler DBschema.PUB.SYS. The schema and the output from the compiler provide important documentation regarding the installation of the data base. While the data base programming language is simple and free format, the tricky part is the use of periods, commas, and semi-colons which are used as syntax dividers. 4.2 Begin A throwback to the SPL programming language, the begin statement is used to start the schema and name the data base. Example: BEGIN DATA BASE TEST; The data base name can be up to six alphanumeric characters, the first character must be alphabetic. The individual data sets will be identified by appending a two digit numeric counter i.e. 01, 02, 03... so it usually is not a good idea to name your data base DB01, because the name of the first set will be DB0101, the second set DB0102 ..., this can be very confusing. The data base name is really DBNAME.group.account, so before the data base is built, consider what group and account it should reside in. This can save a lot of trouble and effort down the road. 4.3 Passwords Passwords are used to screen read and write access at the data set and data item. It is usually sufficient to code passwords at the set level only. Passwords at the item level add processing overhead, and this type of security is better dealt with in the application program. The Image password can be up to eight ASCII, characters, but the first character must be alphabetic. The use of lower case or non-printable characters can raise security but make it very difficult to input passwords interactively. Hard coding passwords into programs makes them very difficult to change, reading the current passwords in from a control parameter file allows them to be altered on a periodic basis without recompiling programs. Example: PASSWORDS: 10 READER; 20 WRITER; ^ ^ | | | + - Password to be supplied at data base open | +----- Arbitrary class number from 1 to 63 >> 4.3.1 User class table Each password maps to an index number from 1 to 63, this allows for up to 63 different passwords, the numbering convention used is up to the data base designer. The number is then used as shorthand notation to specify which passwords allow access to data items or data sets. A class number of 0 is assigned to an open request that specifies a password not on the password list. This class can be assigned to data items and sets to allow access even with a bad or missing password. A class number of 64 is assigned to an open request that specified a ";" and the user is signed on as the creator of the data base. The creator is simply the user who ran DBschema and DButil to build the data base. This class turns off all security and allows full access to all items and sets. 4.4 Data item definition The items section of the schema program defines each field that will exist in the data base. These fields are used later in the definition of the individual files or data sets that make up the data base. An example of the items section of the schema: ITEMS: CUSTOMER-NAME, X40 (10/20); << CHANGE TO 40 ON 1/2/88 >> ADDRESS-LINE-1, X40 (10/20); ADDRESS-LINE-2, X40 (10/20); ADDRESS-LINE-3, X40 (10/20); CITY, X40 (10/20); STATE, X2 (10/20); ZIP, X10 (10/20); ^ ^ ^ | | | | | +- Read/Write Class List | +- Type and Sub-Item Length +- Data Item Name >> 4.4.1 Item name constructs The item name can be up to 16 alphanumeric characters, but the first character must be alphabetic. The item names are used in a request list at data access time and must match exactly as defined in the schema. Thus as a practical concern, items should be named in upper case and be long enough to describe the field but extra prefixes and suffixes should be avoided. The maximum number of data items per data base is 1,023. 4.4.2 Item definition parameters The actual storage of the item is defined using three parameters, the sub-item count, the type designator and the sub-item length. The sub-item count is an array specification, it defaults to 1. Arrays are a simple method of storing related items i.e. monthly expenses etc, however most report writers do not handle sub-item counts other than the default of 1. Some of the older data bases were forced to make extensive use of sub-item arrays due to an old limitation of 255 data items per data base. The type designator is used to tell Image how to store the field within the record. Note each field must be defined to be aligned on a 16 bit boundary. 16 bit item types: I - Signed binary integer in 2's complement J - Same as I but allows Query to edit input for COBOL COMP format K - Absolute binary value R - An HP formatted floating point number 8 bit item types (must be used in multiples of two) U - ASCII data but no lower case alphabetic characters allowed X - Any ASCII data Z - Zoned decimal format 4 bit item types (must be specified in multiples of four) P - Packed decimal number Item Length COBOL FORTRAN SPL Type Bytes I 2 S9 -S9(4) COMP INTEGER INTEGER I2 4 S9(5) -S9(9) COMP INTEGER*4 DOUBLE INTEGER I4 8 S9(10)-S9(18) COMP J 2 S9 -S9(4) COMP J2 4 S9(5) -S9(9) COMP J4 8 S9(10)-S9(18) COMP K1 2 LOGICAL LOGICAL R2 4 REAL REAL R4 8 DOUBLE LONG Un n Display PIC A CHARACTER BYTE Xn n Display PIC X CHARACTER BYTE Zn n Display PIC 9 CHARACTER Pn n/2 COMP-3 NUMERIC 4.4.3 Read/write class list As mentioned above, the user class numbers are used to screen access at the data item and data set levels. This is accomplished by specification of a read/write class list at the data item and set level. The security logic works in the following fashion: At data base open the class number is mapped from the password. If the creator password is used at DBopen, the no read write screening is processed. If the password used at DBopen is not in the password list, then a class of 0 is assigned. Otherwise, the class number is simply mapped from the password provided. At data set access, the set read/write class list is checked in the following fashion. If the class is on the write side of the list specification the user has full access to all items in the set. If the class is on the read side of the list specification, this simply means that we will be allowed to process the item level security screen to see if we have read or write access to each individual item. This adds additional overhead for each data set access. If the class is not on the set read/write class list then no access is allowed to that set. At data item access, the read/write class list is processed to allow read or write access to each individual data item, and is only processed if the class number was on the read side of the class list at the set level. If the class is on the read class list for the item, then read access is granted. If the class is on the write class list then read and write access is granted. And if the class is not on the class list, then no access to the item is allowed. 4.4.4 Comments Documentation in the form of comments can appear anywhere in the data base schema simply be using << to start the comment and >> to end it. These comments can span multiple lines, and any text in between is ignored by the compiler. Comments at the item level can be useful in documenting the use of the data item. Comments can also be used to document changes made to the data base over time. 4.4.5 Sorting of items table In data bases with a large number of data items it can be very helpful if the item names appear in sorted sequence. This can save a great deal of time when referring to the schema for the item type and size in order to build data definitions for programs. The items can be sorted by using editor to specify the data items then using sort to arrange them in alphabetic sequence, and then adding the rest of the schema definition. 4.5 Sets The data set is made up of up to 255 individual data items. There are two kinds of data sets. Master key data sets and Detail Data sets. The data base can have from one to 191 data sets. The data set name can be from one to 16 alphanumeric characters, but the first character must be alphabetic. 4.5.1 Master Data Set Definition Master data sets are accessed by hashing a key value to generate a relative record number. The Image master can have only one key item. There are two kinds of Image master data sets, manual and automatic. The manual master is used if data must be stored with the key record, and the data entry must be manually written to the data base. The Automatic master cannot contain any data items other than the key data item, and is automatically added or deleted as required. This will become clearer when detail data sets are discussed. An example of a manual master data set schema definition: SETS: << data set name. type class list device class >> NAME: CUSTOMER-MASTER, MANUAL (10/20), DISC1; ENTRY: CUSTOMER-NAME(1), <<-- key item and path count >> ADDRESS-LINE-1. ADDRESS-LINE-2, ADDRESS-LINE-3, CITY, STATE; <<-- note the ; to end the list of items >> CAPACITY: 100001,101,101; << maximum capacity,[initial capacity [,increment]] >> An example of a manual master data set schema definition: SETS: << data set name. type class list device class >> NAME: ORDER-NO-MASTER, AUTOMATIC (10/20), DISC1; ENTRY: ORDER-NO(1); <<-- key item and path count >> CAPACITY: 101; The read/write class list on the data set level is specified similarly to the item level, as previously discussed, write access at the set level gives full access to all data items in the data set. Read access at the set level means that you must examine the item level access security to see if your class has read, write, or no access to each individual data item. The device class parameter on the data set allows the creator to specify which disc device class the data set is to reside on. There are some important performance notes to be made at this point. The first is that if each disc drive only has a device class of disc on it then the only class you'll be able to specify is disc. Therefore, it is a good idea to configure each disc with a unique device class i.e. disc1, for LDEV 1, disc2, for LDEV 2, etc. In order to spread out the disc I/o's and reduce access contention, it is a good practice to put masters and details on different disc drives. Due to their access method, masters tend to be very random and should be placed on non-cached discs. The access to details can range from being very sequential to very random depending on the volatility of the data entries and timeliness of proper maintenance procedures. The item names that make up the data set can be specified in any sequence. However, they must have been previously defined in the items section. There is no problem with having an item name that is identical with the set name. The path counter is used to identify both the data item that will serve as the key and the number of associated detail data sets linked to the master by the value of the key. A master can have from 0 to 16 paths to various detail data sets. A value of 0 simply is used to identify a key data item for a master with no detail data sets. The capacity of a data set can range from 1 to 2,147,483,647 data entries. When the data base is built, all disc space required to hold the specified number of entries is allocated and initialized to binary zeros. 4.5.2 Detail data set definition The detail data set is used to hold data records that occur multiple times for a given key value i.e. all order summaries for a given customer. The detail is accessed via a pointer from the associated master to a linked list of entries in the detail with the same key value. A detail unlike a master can have up to 16 keys, which are in fact pointers from associated masters. The keys in Automatic masters are added and deleted as records are added or deleted to associated detail data sets. The manual master entry for a given key value must be added by the user application prior to adding any detail records with the same key item value. An example detail data set specification would be: NAME: ORDER-SUMMARY, DETAIL (10/20), DISC2; ENTRY. ORDER-NO(ORDER-NO-MASTER), <1/4 path from master based on value of order-no >> CUSTOMER-NAME(!CUSTOMER-MASTER(TOTAL-DOLLARS)), <1/4 path from customer-master based on value of customer-name, additionally the list of all orders for the same customer-name will be maintained in sorted sequence by total-dollars >> TOTAL-DOLLARS; CAPACITY: 300000,1000,1000; << maximum capacity,[initial capacity [, increment]] >> The path names are simply the name of the associated master. A detail can have from 0 to 16 paths to the same or different master data sets. These paths serve to make subsets of records in the detail for given key values. Thus it is wise to choose paths that will yield a subset that is worth the effort of maintaining the pointers and chains. A good rule of thumb is that a path is only valuable if it can make a subset that is 15% or less of the detail records. Any greater than 15% and it becomes increasingly more advantageous to simply read all of the detail records selecting for the one you want. The sort item is used to specify maintenance of the list of data entries with the same key value in some sequence other than which they are added over time. THIS CAN BE EXTREMELY DANGEROUS if the records are not added in sorted sequence and the number of entries with the same key value is greater than 40 or 50. 4.6 End. The end statement is a throwback to the SPL programming language, its only purpose is to tell the schema compiler that this is the end of the program. However without the end statement the data base will not compile. Another note is that as will be discussed in a moment, the schema compiler requires file equations and run parameters to function properly. From time to time you will forget these complexities and run the DBschema program without them. One solution is to break and abort the program, another is to type END. and this will terminate the program. 4.7 Capacities & maximums 4.7.1 HP3000 system limitations A somewhat official operating system limits white paper can be found at: http://jazz.external.hp.com/papers/limits/os_limits.html 8190 maximum number of concurrent processes 2004 databases can be opened concurrently on a system 1140 processes can open one database in modes 1-4 (write)access when IMAGE logging is enabled. User logging is still all compatibility mode. 127 DBopens per process without calling DBclose 63 DBOPENs per process to open the same database 4.7.3 Data base level limitations: 1023 data items can be defined in a database 199 data sets can be defined in a database 63 passwords can be defined in a database (65 if 0 & ; count) 4.7.4 Data set limitations: 255 data items can be defined in a data set 16 paths can be defined in a master or detail data set 2,147,483,647 entries maximum in a data set 2,147,483,647 entries maximum in a chain 16,777,215 blocks per data set (recently this was changed from 8,388,607) 4.7.5 Data item limitations: 255 subitems can be defined in a data item 510 bytes for maximum subitem length 4096 bytes for maximum item length 4.7.6 Length limitations: 5120 bytes for maximum block length 252 bytes for maximum b-tree index key size 4.7.7 Transaction limitations: 15 databases can have simultaneous transactions (either of DBBEGIN/DBEND or DBXBEGIN/DBXEND transaction pairs) 4MB opened transactions for a process 64MB opened transactions for a volume set 4.8 DBschema The schema compiler is a program which reads the text file and compiles it into an object module called a root file. The compiler prints a heading, listing of the schema, and a summary table. The schema compiler is invoked by using the command: 4.8.1 Parameters and file equations RUN DBschema.PUB.SYS;PARM=n The parameter value of n is used to tell the compiler program what to do as far as input and output files are concerned. n = 1 ; the schema file has been file equated to DBSTEXT n = 2 ; the list " " " " " DBSLIST n = 3 ; both files have been equated as in 1 & 2 If no parameter is supplied, then the DBschema program will expect you to enter the schema at your terminal " a pretty good feat ), and you will have to break and abort or enter END. as described above. 4.8.2 Compile options The schema compiler allows for setting of compile options via a record in the text file that starts with $CONTROL, the form of the $CONTROL directive is: $CONTROL [ LIST NOLIST ] [ ,ERRORS= nnn ] [ ,LINES= nnnnn ] [ ,ROOT ,NOROOT ] [ ,BLOCKMAX= nnnn ] [ ,TABLE ,NOTABLE ] [ ,JUMBO ,NOJUMBO ] Where: LIST causes the source text to be printed to the list file, this can be a problem if your schema is long and you ran DBschema;PARM=1. The default is to list all of the text file as the schema is compiled. NOLIST specifies that only text file records with errors are listed. This can be handy for processing long text files and for keeping the content of a data base secret. ERRORS=n sets the maximum number of errors to n. The default value is 100. When the schema processor reaches this number of errors, it gives up and terminates. This is very useful for saving paper and your pride. LINES=n sets the number of lines per page for the list file, the default is 60 lines per page but this number can range from ' to 32767. ROOT tells the schema compiler to create the root file if no errors are detected. This is the default mode. NOROOT tells the compiler to compile the schema but not to create the root file even if no errors are detected. This option is very useful for testing out modifications to the schema before purging the existing data base, and saves the trouble of having to go into DButil to purge the root. BLOCKMAX sets the size of the largest physical block of records for each data set. The default is 512 words or 2560 words or 20 sectors. DBschema determines the number of media records that fit in a block. Note that DBSCHEMA chooses a block size (less than or equal to the maximum block size) that makes the best use of disk space, which often is substantially less than the BLOCKMAX. If the record size is greater than 512 words, BLOCKMAX must be set greater than or equal to the record size. In general, a setting of 2560 will allow DBschema to choose the best blocking factor. JUMBO allows a data set whose capacity generates a file greater than 4GB, to automatically become a jumbo data set. Be aware that many utilities and diagnostic tools are not yet upgraded to handle Jumbo data sets. IT IS ALSO IMPORTANT TO NOTE THAT THE JUMBO OPTION WORKS BY SPLITTING THE DATASET INTO MULTIPLE FILES THAT RESIDEING THE POSIX HFS. THESE FILE MAY NOT GET BACKED UP AND RESTORED WITH EXISTING JOB JCL SPECIFICATIONS! NOJUMBO the default disallows data sets greater then 4GB. There are two other 2 compiler directives, the $TITLE and $PAGE commands are used to format the schema listing for documentation purposes. The form of the $PAGE command is: $PAGE [["some character string"], ] The $PAGE command causes the compiler to page eject and print the content of the character string, if any, at the top of the next page. The character string will override the $TITLE command. The $TITLE command is used to specify a string of characters to be printed as a heading for each page printed. It does not cause a page to be ejected as is the case with the $PAGE command. An example of the $TITLE command would be: $TITLE [["some character string"],] The $TITLE command stays in effect until a subsequent $TITLE or $PAGE command is processed. If no character string is included, then no heading is printed. 4.9 Interpreting the compiler listing Errors are listed as the schema file is being scanned. Most errors will be due to lack of attention paid to punctuation in the form of :,, and .'s. If there are no errors, the message UNREFERENCED ITEMS. may appear. While items defined in the items section which are not specified in any of the data sets are not considered as errors, they should be removed to decrease the overhead of processing the item names list. The summary table at the end of the schema listing is very important to understand and retain, as it represents the physical realities of the data base, and some of the information cannot be found elsewhere. DATA SET NAME - Is the name given to the set in the schema SETS. section. The MPE file name will be the data base name with a counter that starts at 01 and counts up as each data set appears sequentially in the schema. TYPE - Is a one letter denotation: "A" for Automatic masters, "M" for manual masters, and "D" for detail data sets. FLD CNT - Is a counter of the number of data items in each data set. PT CT - Is a path counter. For master data sets this is the number of paths that appears in ()'s after the search item. For details this is a count of the number of associated masters, specified by putting the master set name in ()'s after the linking data item. ENTR LGTH - The length in words of the data portion for the data entry. This is the sum of all the individual data items specified for the data set. MED REC - This is the total length of the Image media record. This includes data, pointers and other internal information. The length of the media record is an important design consideration and will be covered in detail later. CAPACITY - For masters this is the exact number of entries specified in the schema in the SETS section. For details, this number may be rounded up to fill out the last full block. BLK FAC - This is the best fit of the number of media records that could fit under the BLOCKMAX specified, or the hard coded number specified in the capacity. BLK LEN - This is the actual length of the block. Any difference between this value and the next multiple of 512 is wasted disc space. The total number of wasted sectors can be calculated by multiplying the number of blocks by this difference. The total number of blocks is simply the capacity divided by the blocking factor, rounded up by one. DISC SPACE - The total number of sectors including the root file that this data base will require when the data base is created. Note that for very large data bases it is usually a good practice to check if enough disc space is available before proceeding. ROOT LENGTH - This is the length in words of the compiled root file. The size of this varies with the number of passwords, data items, sets and security specifications. BUFFER LENGTH - is the length in words of the largest block it is unused by TurboIMAGE/XL but provided for compatibility with TurboIMAGE on previous versions. TRAILER LENGTH - This is the length in words of an area that is used by Image to move data base access parameters to and from the calling program's stack. Schema processing errors are documented in Appendix A of the Image manual. Errors tend to build on each other as the compiler trips over itself, so don't let your pride get hurt when many errors are reported. Two helpful hints: one is to specify data items uppercase characters and have a handy sorted list of them as you build the data sets. Second, pay attention to the punctuation. After all, the schema is a program and therefore highly cantankerous when it comes to syntax. 4.10 DButil After the schema has been successfully compiled, the DBschema program will leave behind a root file. While interesting, this file is of little useful value until the data bases are created. The data base is built using the utility program DButil.PUB.SYS, as we see in the following discussion DButil is a general utility that allows the data base administrator to configure and maintain the data base. There are many commands available in DButil and they can be categorized them into four groups, the create and purge commands, the show command, the enable command and the set command. 4.10.1 Create & Purge commands The CREATE command initializes each data set and saves them as privileged MPE files in the same group as the root file. To create the data base, you must be logged on with the same user account group that was used when the DBschema program was executed. The format of the CREATE command is: CRE[ATE] data base name [/maintenance word] The data base name is the same as the root file name which is the same as the base name specified in the BEGIN statement of the SCHEMA. The maintenance word is a password that can be used to allow users other than the creator of the data base to use a limited set of DButil commands. This user must be logged on into the group in which the data base resides. For large data bases, it is wise to check that enough disc space is available in sufficient chunks before executing the create command. If the create operation fails due to lack of disc space, two problems must be dealt with. First the data sets built must be purged using the PURGE command, and second, free space must be found or added in sufficient quantities to reattempt the build operation. The PURGE command removes the data base files from the system directory, and once entered, the data in the data base is forever lost. So extreme caution is recommended before using this command, always make sure that you are logged on where you think you should be because many a production data base has been purged because someone thought that they where logged on in the test account! The format of the PURGE command is: PUR[GE] data base name[/maintenance word] 4.10.2 Show command Once the data base is built the default configuration can be displayed with the SHOW command. The SHOW command can provide a list of users accessing the data base, any users waiting or holding locks on the data base, logging status, configuration flags, and the buffer allocations. The format of the SHOW command is: SH[OW] database name[/maint word] ALL [OFFLINE] BUFFSPECS CAPACITY CIUPDATE DEVICE FLAGS INDEX LANGUAGE LOCKS LOGID LOGINFO MAINT PASSWORDS SUBSYSTEMS USERS MAINT - Displays the maintenance word if any. This is very useful if you have forgotten what you originally set it to. ALL - Displays the combination of MAINT, BUFFSPECS, LOCKS, USERS, LOGID, AND FLAGS. This is the parameter that is used most frequently and supplies a neat little report on the configuration of the data base. BUFFSPECS - Displays the current buffer specification. The buffer specifications will be discussed in detail in the SET command. CAPACITY - Displays a capacity report for the data base. For each data set it shows the current number of entries, the percent of maximum, and for a detail data set the dynamic expansion parameters. CIUPDATE - Displays the status of critical item update. DISALLOWED means that DBupdate can not modify a search or sort item. ALLOWED means that after a call to DBcontrol that a program can use DBupdate to modify search or sort items. And, ON means that Dbupdate can modify search and sort items. DEVICE - Displays the device assignment for each data set. FLAGS - Displays the various states of the data base configurable parameters. Each parameter is discussed in detail in the SET and ENABLE commands. LOCKS - Displays the users who are currently holding or requesting locks for the data base, data sets or data items. This can be very insightful when deadlock situations occur. This happy event will be discussed in detail when we get to the DBlock intrinsic. LOGID - Displays the current data base log identifier if any. USERS - Displays the users and application programs if any, which are currently accessing the data base. PASSWORDS - Displays the user class table and current password settings. This is handy if you forget what the passwords to the data base were and don't happen to have the schema. SUBSYSTEMS - Displays the flag that allows or disallows the data base for access by Query and other applications that check this flag. OFFLINE - Sends the information to the line printer with the formal file designator of DBUTLIST. 4.10.3 Enable and Disable of configuration flags The enable and disable commands turn on and off internal flags for access, dumping, logging, ILR and recovery. These flags are very important to understand from the standpoint that they control the way that Image operates on the data base as a whole. The form for the enable and disable commands is: EN[ABLE] database name[/maint word] FOR parm[[,parm]...] Valid parameters are: ACCESS DUMPING ILR LOGGING RECOVERY ROLLBACK AUTODEFER MUSTRECOVER PREFETCH INDEXING HWMPUT SHADOWING DIS[ABLE] database name[/maint word] FOR parm[[,parm]...] ACCESS - Allows or disallows user, or application access to a data base. This is very useful for securing the data base for periods of time i.e. weekends and holidays. It protects the data from being read, written or deleted. It does not protect from an accidental restore of a different version of the data base. DUMPING - This flag, when enabled, tells Image, upon abort, to copy the users stack, image buffers and other binary information to an "I" file created in the group where the session or job is logged on. It is very rare to see an abort while in an Image subroutine. However, an Image abort usually means that the user or application program was attempting to access the database and ran into some type of corrupted data. In this case the "I" file would be of assistance in debugging the type and extent of the damage. This flag defaults to the disabled state and remains that way until a problem situation is encountered. HWMPUT - See http://www.adager.com/TechnicalPapersHTML/HwmPut.html ILR - Flag for Intrinsic Level Recovery. This facility of Image allows the backing out of the last record written or deleted to the data base if the system failed prior to the completion of the operation. In other words, the intermediate state of the data base when a record is being added or deleted is inconsistent, and if the operating system or hardware should fail, halt or abort at this critical stage, we could be left with a damaged data base, unless ILR is enabled. Fortunately, HP3000's tend to be very reliable systems, and ILR is overkill for most data bases. However, ILR is recommended for data bases that contain data which the financial wellbeing of your company rests and cannot be recovered manually, i.e. online order entry, billing etc. The drawback to ILR is that it adds additional processing and disc I/o's to each record added or deleted, to the tune of 10% to 15%. If you are experiencing a rash of system problems, it is wise to enable ILR until things settle down. LOGGING - Setting this flag is part of a series of steps to turn the logging of transactions to disc or tape media on and off. This procedure will be discussed in detail later. PREFETCH - IMAGE is currently 'single threaded' when it comes to calling intrinsics that change pointer information. This means that only a single process is executing a call to DBput, DBdelete, or DBupdate (of a critical item) at a time. This methodology helps to insure against situations that create conflicts with pointers that would result in broken chains. In order to accomplish this, IMAGE uses a flag known as the PUT/DELETE semaphore to freeze access. This freeze is obtained at the beginning of the call and released at the end, and is held while entries and pointers are updated. The Prefetch flag tells IMAGE when to obtain the semaphore freeze. With Prefetch disabled, (the default), IMAGE obtains the semaphore before reading, updating and writing every data block which is to be modified by the call. After this is successfully completed the semaphore is released. With Prefetch enabled, IMAGE reads all the data blocks that are required for the call before the semaphore is obtained. It is the holding of the semaphore for a shorter period of time which could allow more concurrent calls but it is not guaranteed. On systems with memory contention there is the possibility that the necessary blocks may be flushed out of memory by another process before IMAGE is able to update them. This could cause the blocks to be thrashed in and out and is why the default is not to Prefetch. Thus to gain throughput from Prefetching the system must have memory and CPU available to handle the increased processing, and the application mix would have to consist of multiple processes consisting of mostly calls to DBput, DBdelete or DBupdate. RECOVERY- Enables or disables the running of the program that is used to apply transactions from the log file. Setting this flag is part of the recovery procedure, which will be discussed later. This flag defaults to the disabled state and protects from corruption for the data base by an inadvertent attempt to reapply a transaction log file. 4.10.4 Set parameters The Set command is used to configure the run time variables of the data base. These differ from the parameters of the enable/disable command, in that they are more complex than a simple on/off status. However, as in the enable/disable commands these variables can significantly affect the behavior of the data base. The format of the Set command is: SET database name[/maint word] BUFFSPECS= num bufs(from/to) LOGID=logid-value MAINT=[maintenance word] SUBSYSTEMS = NONE | READ | RW PASSWORD j=[password] LANGUAGE=langid CIUPDATE = [DISALLOWED ] [ALLOWED ] [ON ] BTREEMODE1 = [, [WILDCARD=]c] MAINT - This parameter is used exclusively by the creator of the data base to set a new maintenance word for the data base. The default is no maintenance word, so if it is desirable to allow other users who can log on into the data base's group, access to the DButil commands then this is the command to set a maintenance word. Note, other utilities, such as ADAGER, also make use of the maintenance word in this fashion. If no maintenance word is provided, then the existing maintenance word if one exists is removed. BUFFSPECS - is backward compatibility only. LOGID - Is the MPE log identifier used when the GETLOG command was issued. Use of this parameter is part of the setting up of the logging procedure and is discussed in that full context later. PASSWORD - Allows the changing of the data base passwords. The user class number must be used in the read/write security specification at the item or set level for the new password to have any effect. Usually, data base passwords are hard coded into applications and therefore changing them in this fashion can have drastic results on the ability of the user or application to open the data base. This, however is a good way to test the error logic build into the application, to see what happens when the data base cannot be successfully opened. SUBSYSTEMS - A parameter that can be used to allow or disallow read and/or write access on an application by application basis. This parameter can be checked programmatically via the DBinfo intrinsic and is built into QUERY. So in short, this is a way to control QUERY access for each data base. 4.10.5 ERASE The erase command is executed to reinitialize the entire data base to its created state. This is the time saving equivalent of purging and rebuilding the entire data base. A prime design consideration for splitting one data base into multiple data bases is that a data base can be erased many times faster than the individual records can be deleted. However, you can only take advantage of this if the records to be deleted reside entirely in their own Image data base. The format of the erase command is: ERA[SE] data base name[/maintenance word] NOTE. DButil DOES NOT ASE ARE YOU SURE" SO BE AWARE THAT ALL DATA WILL BE WIPED OUT WHEN YOU ENTER THIS COMMAND- IT IS A VERY WISE PRACTICE TO CHECE YOUR LOG ON WITH A SHOWME BEFORE ERASING THE WRONG DATA BASE. 4.10.6 EXIT The exit command gets you back to MPE. Almost all utilities allow you to enter E to exit, however to be consistent with the three letter command notation standard, you must enter EXI to exit. 4.10.7 Dbschema walk through :DBschema PAGE 1 HEWLETT-PACKARD 30391C.07.04 TurboIMAGE/3000: DBSCHEMA FRI, NOV 6, 1998, 9:11 AM (C) HEWLETT-PACKARD CO. 1987 >BEGIN DATA BASE TEST; >PASSWORDS: >10 READER; >20 WRITER; > >ITEMS: >ADDRESS-LINE-1, X40 (10/); >ADDRESS-LINE-2, X40 (10/); >ADDRESS-LINE-3, X40 (10/); >CITY, X40 (10/); >CUSTOMER-NAME, X40 (10/); >ORDER-NO, X02 (10/); >STATE, X02 (10/); >TOTAL-DOLLARS, X10 (10/); >ZIP, X10 (10/); > >SETS: >NAME: CUSTOMER-MASTER, M (10/20); > >ENTRY: CUSTOMER-NAME(1), > ADDRESS-LINE-1, > ADDRESS-LINE-2, > ADDRESS-LINE-3, > CITY, > STATE, > ZIP; >CAPACITY: 5; > >NAME: ORDER-NO-MASTER, A (10/20); > >ENTRY: ORDER-NO(1); >CAPACITY: 5; > >NAME: ORDER-SUMMARY, D (10/20); > >ENTRY: ORDER-NO(ORDER-NO-MASTER), > CUSTOMER-NAME(CUSTOMER-MASTER), > TOTAL-DOLLARS; >CAPACITY: 300000,1000,1000; >END. DATA SET TYPE FLD PT ENTR MED MAXIMUM BLK BLK DISC NAME CNT CT LGTH REC CAPACITY FAC LGTH SPACE CUSTOMER-MASTER M 7 1 106 117 5 3 352 16 ORDER-NO-MASTER A 1 1 1 12 5 5 61 16 ORDER-SUMMARY D 3 2 26 34 300000 15 511 272 INITIAL CAPACITY = 1005 INCREMENT ENTRIES = 1005 TOTAL DISC SECTORS INCLUDING ROOT: 320 NUMBER OF ERROR MESSAGES: 0 ITEM NAME COUNT: 9 DATA SET COUNT: 3 ROOT LENGTH: 652 BUFFER LENGTH: 511 TRAILER LENGTH: 256 ROOT FILE TEST CREATED. :dbutil HP30391C.07.04 TurboIMAGE/XL: DBUTIL (C) COPYRIGHT HEWLETT-PACKARD COMPANY 1987 >>create test Database TEST has been CREATED. >>show test all For database TEST Maintenance word is not present. Access is enabled. Autodefer is disabled. Dumping is disabled. Rollback recovery is disabled. Recovery is disabled. ILR is disabled. Mustrecover is disabled. Logging is disabled. Prefetch is disabled. Indexing is disabled. HWMPUT is disabled. Restart is disabled. Database has never been stored. Database has been modified since last store date. Shadowing is disabled. Subsystem access is READ/WRITE. CIUPDATE is disallowed. Dynamic capacity expansion is used for 1 data sets. Database has no B-trees. Logid is not present. XM log set : this database is not attached to an XM log set. The language is 0 - NATIVE-3000. Buffer specifications: 8(1/2),9(3/4),10(5/6),11(7/8),12(9/10),13(11/12),14(13/14), 15(15/16),16(17/18),17(19/120) No other users are accessing the database. ------------------------------------------------------------------- 5.0 What are the Data Base Access Intrinsics? The data base is programmatically accessed via calls to intrinsics that contain parameters that are passed back and forth. The parameters along with the intrinsic called act as commands to Image to perform the specified task. It will be much easier if you think of the individual intrinsics as commands that come in pairs. Just like the seven dwarfs, happy-grumpy,sleepy-Sneezy, dopey-doc, and bashful. The Image intrinsic commands can also be paired, DBopen-DBclose, DBlock-DBunlock, DBerror-DBexplain, DBput-DBdelete, DBget-DBfind, and DBupdate. Another good discussion/overview of the intrinsics can be found at: http://www.robelle.com/smugbook/imageapi.html 5.1 Parameters and data types Before getting into the discussion of the actual commands, we can cover some of the general housekeeping issues. How is the intrinsic referenced in the different programming languages. What are some of the common parameters passed back and forth. And, most importantly, how can the program tell if its call was successful or not The syntax for the intrinsic procedure call is different for various languages. The general form is presented here. See the program examples included for more detail. COBOL CALL "DBxxxx" USING parm1, parm2, parm3 ... FORTRAN CALL DBxxxx (parm1, parm2, parm3 ... ) SPL DBxxxx (parm1, parm2, parm3 ... ) BASIC 100 CALL XDBxxxx (parm1, parm2, parm3 ... ) PASCAL DBxxxx (parm1, parm2, parm3 ... ) The xxxx indicates the various commands available. The parameters vary slightly with each command and are discussed in the next section. 5.1.1 Data base parameters The three most common parameters are the base name, the mode, and the status, which are required for most intrinsic calls. The base name is the name given in the BEGIN statement of the data base schema. The mode parameter has different meanings for different Image calls, and for some calls the value is always the same. This is done for consistency and compatibility with older versions of Image. +----------------------------------------------------------------------+ | INTRINSIC| Parameters passed to and received from Image | | COMMAND | UPPER case are input parameters, lower case are returned | +----------------------------------------------------------------------+ | DBopen | BASE | PASSWORD | MODES | status | | | | | ------ | NAME | | (1-8) | array | | | | |----------------------------------------------------------------------| | DBclose | base | DATA SET | MODES | status | | | | | ------- | name | NAME OR # | (1-3) | array | | | | |----------------------------------------------------------------------| | DBlock | base | QUALIFIER | MODES | status | | | | | ------ | name | | (1-6) | array | | | | |----------------------------------------------------------------------| | DBunlock | base | DUMMY | MODE | status | | | | | -------- | name | PARAMETER | = 1 | array | | | | |----------------------------------------------------------------------| | DBerror | | | | STATUS | | buffer | | | ------- | | | | ARRAY | | | | |----------------------------------------------------------------------| | DBexplain| | | | STATUS | | | | | ---------| | | | ARRAY | | | | |----------------------------------------------------------------------| | DBput | base | DATA SET | MODE | status | LIST | BUFFER | | | ----- | name | NAME OR # | = 1 | array | | | | |----------------------------------------------------------------------| | DBdelete | base | DATA SET | MODE | status | | | | | -------- | name | NAME OR # | = 1 | array | | | | |----------------------------------------------------------------------| | DBget | base | DATA SET | MODE | status | LIST | buffer | ARG | | ----- | name | NAME OR # | (1-8) | array | | | VALUE | |----------------------------------------------------------------------| | DBfind | base | DATA SET | MODE | status | ITEM | | ARG | | ------ | name | NAME OR # | = 1 | array | | | VALUE | |----------------------------------------------------------------------| | DBupdate | base | DATA SET | MODE | status | LIST | BUFFER | | | -------- | name | NAME OR # | = 1 | array | | | | |----------------------------------------------------------------------| | DBbegin | base | LOGFILE | MODE | status | TEXT | | | | ------- | name | TEXT | = 1 | array | LEN | | | |----------------------------------------------------------------------| | DBend | base | LOGFILE | MODE | status | TEXT | | | | ----- | name | TEXT | = 1 | array | LEN | | | |----------------------------------------------------------------------| | DBinfo | base | QUALIFIER | MODE | status |BUFFER| | | | -------- | name | | | array | | | | |----------------------------------------------------------------------| | DBxbegin | base | QUALIFIER | MODE | status |BUFFER| | | | -------- | name | | | array | | | | |----------------------------------------------------------------------| | DBxend | base | QUALIFIER | MODE | status |BUFFER| | | | -------- | name | | | array | | | | +----------------------------------------------------------------------| 5.1.2 The status parameters The status parameter is the only parameter required for each and every call to Image. It is used to communicate the degree of successfulness and provide other useful data back to the calling program. The status parameter is a ten word array with the following format: Word 1, Zero if call was good or a number if something went wrong, negative numbers indicate bad parameters or conditions, positive error numbers indicate that the procedure encountered some exception condition. See appendix A of the Image manual for more details. The negative error numbers generally indicate some type of problem with the parameters being passed to the Image procedure. Problems with parameter number one range from -11 to -19, problems with parameter number two range from -21 to -29, ETC. For DBput, DBdelete, DBget, DBfind, and DBupdate: Word 2, The current entry length in words. Word 3/4, The current record number. Word 5/6, The chain length, returned by DBfind. Word 7/8, Backward pointer to the previous entry in the cabin. Word 9/10, Forward pointer to the next entry in the chain. For all the other intrinsics see individual descriptions. 5.2 DBopen & DBclose DBopen and DBclose are the first and last intrinsic commands passed to Image. They are used to begin and end the application's access to the data base. Just like any other file you must have at least read access to the data base you are attempting to open. 5.2.1 Description Upon calling DBopen Image: Opens the root file, but does not open any of the sets until referenced by the data set parameter in any of the other intrinsics. Uses the password provided to determine the security restrictions that will apply for subsequent calls under this DBopen. Uses the mode provided to determine allowable concurrent assessors. Upon calling DBclose Image: 5.2.2 Parameters The form of the DBopen intrinsic call is: DBopen (BASE,PASSWORD,MODE,status) BASE - Is a character string that contains the name of the data blank, then the data base name followed by a semicolon or a blank. If the data base is successfully opened, Image puts the data base ID word in the space you allocated in the first two bytes of the BASE array. This base ID identifies this DBopen call to Image for subsequent accesses. Therefore, if you overlay the base name variable in your program you will not be able to execute any further data base intrinsics under the current DBopen. To hard code the location of the data base or access a data base in another account, simply append the .GROUP and or .ACCOUNT to the string of characters you pass to image as the BASE parameter. PASSWORD - Is a string of characters corresponding to the passwords built into the schema followed by an optional user identifier. The password is used to establish the user's class number which is then mapped to the access capabilities at the set and item levels. The user identifier is used by the recovery program to assist in identifying users who log on under the same group and account. The construct is the left justified password followed by the optional user identifier followed by a space. If a semicolon is used the class number will be 63 if the user is logged on exactly as the user who used DBschema and DButil to create the data base. If the password provided matches one of the passwords in the password table, then the class number is mapped from the password table. If the password does not match any of the defined passwords, then a class number of zero is assigned and Image will check to see if this class number has been specified on any of the sets or items within those sets. MODE - Tells Image what type of access is requested, the valid modes are: 1 - Modify access with enforced locking, the other users may open only with modes 1 and 5. This is the most useful mode in that it allows all of the intrinsic calls. However, our password must allow read or write access to the specific sets or items requested. 2 - Update access same as mode 1 but disallows calls to DBput and DBdelete. The other users must open in either mode 2 or mode 6. I have never seen this mode used, probably included for completeness. 3 - Modify exclusive, same as mode 1 but since no one else is allowed to access the data base, no locking is required and no other concurrent modes of open are allowed. This mode has performance considerations for batch jobs that add or delete large numbers of entries. Mode 3 turns off the code required to share the data base and allows more efficient use of the buffers. 4 - Modify, allow concurrent read access. This is the same as one but no locking is required because the only other assessors allowed will be mode 6. 5 - Read, allow concurrent modify access. This is a read only mode that will allow any number of other mode 1 and ' assessors. This mode is the second most popular mode, it is used by report programs that only need to read and process the entries. 6 - Read, allow concurrent modify access. This is a read only mode that allows concurrent modes of 6, 2, 4, or 8. 7 - Read, exclusive. This mode would only be used to improve the performance of a large batch job that required the data to remain the same for the length of the procedure. 8 - Read, allow concurrent read access in mode 6. STATUS - Is a ten word array that returns information about the results of the intrinsic call, if the DBopen worked the status array would contain: Word Contents 1 The condition word of 0. 2 The user class number assigned from 0 to 64. 3 The current size of the global buffer area in words. 4 The current size of the user buffer area in words. 5/10 Not documented unless an error occurs Typical non-zero condition words: -11 Bad base parameter -21 Bad password parameter -31 Bad mode parameter -92 Data base not created The form of the DBclose intrinsic call is: CALL DBclose(BASE,DATA SET,MODE,status) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set to be closed. This parameter's usage is dependent on the mode provided. MODE - Directs Image to close a set or the entire data base. A mode of 1 terminates access to the entire data base. A mode of 2 closes the data set referenced in the DATA SET parameter. And a mode of 3 reinitializes the data set referenced in the DATA SET parameter but the set will remain opened. Modes 2 and 3 are used to reposition the current record pointer so that a DBget of the next record will return the first record in the data set. STATUS - The condition word will contain a zero if the close was performed as requested. Common non-zero status words are similar to the DBopen. 5.2.3 Walk through NOTE. THAT DBDRIVER IS DOCUMENTED IN SECTION 6.5. In these walk through pay special attention to the CPU and elapsed times, these show the relative performance profiles for each type of Image data base call. :DBDRIVER DBDRIVER C.05.00 FRI, NOV 6, 1998, 9:23 AM (C) HEWLETT-PACKARD 1978 Command: !B=TEST Command: !Q=; Command: !M=1 Command: O Elapsed time (MS) = 287 CPU = 117 Baseid=%000001 Class=64 Current DBG size=32767 DBU size=10240 Command: ?S 0000 0040 7fff 2800 0000 0191 4163 88d4 0001 0000 Command: C Elapsed time (MS) = 21 CPU = 21 Command: ?S 0000 0040 7fff 2800 0000 1193 4163 88d4 0001 0000 5.3 DBlock & DBunlock Image leaves the locking of individual data entries or groups for entries totally up to the application designer and programmer. Locking allows shared access to the data base and strategies of locking are highly application dependent. One general rule does apply however, the larger the number of shared assessors, the smaller the group of entries locked. We have seen that the mode of DBopen can serve as a form of data base lock, however these exclusive access modes would not be appropriate for an application with more than one concurrent user. So, the DBlock and DBunlock commands allow programmatic locking of the entire data base, a specific data set, or a specific set of entries within a data set with a particular data item value. IT IS EXTREMELY IMPORTANT TO UNDERSTAND THAT THESE "LOCKS" ARE LOGICALLY HELD AND THAT THERE IS ABSOLUTELY NO CHECKING DONE TO SEE IF THE LOCKED ENTRIES ARE INDEED THE ONES BEING UPDATED. IT IS ALSO EXTREMELY IMPORTANT TO UNDERSTAND THAT ALL ACCESSORS MUST LOCK AT A GIVEN LEVEL IN ORDER TO RECEIVE THE BENEFITS OF GREATER CONCURRENT ACCESS. In other words, one application locking at the data base level will negate the effects of all of the other applications locking at the data set or data item level. 5.3.1 Description Upon calling DBlock Image: Will check the current lock list to see if a conflict exists. If no conflict exists, Image will add the lock request to the lock list. If a conflict exists, then one of two things will occur depending on the mode, which tells Image to wait or return. If the mode is set for unconditional locking, the program suspends and waits until all of the conflicting entries in the lock list have been unlocked. If a conditional lock mode was requested and a conflict exists, the status is returned to the calling program which can then choose to repeat the call or go on to other transactions. NOTE THAT THE ONLY OTHER WAY TO CLEAR A PROGRAM SUSPENDED UNCONDITIONALLY ON A DBlock IS TO RESTART THE SYSTEM, BUT AN UNCONDITIONAL LOCK IS THE ONLY WAY TO GET IN LINE FOR A LOCK REQUEST. Locks should never be held in a situation where the user may have to enter some data, instead, your application should lock around the logical transaction. More on this topic as we cover additional intrinsics. Upon calling DBunlock, Image will: Verify that the mode is set to 1, remove the calling programs lock from the lock list, and reactivate any programs which had been suspended because of this entry in the lock list. 5.3.2 Parameters The form of the DBlock intrinsic call is: Call DBlock(BASE,QUALIFIER,MODE,status) QUALIFIER - Is ignored in modes 1 and 2, is the data set name for modes 3 and 4, and is an array of lock descriptions for modes ' and 6. The format of the lock descriptor is a table within a table: General format: Word Contents: 1 The number of lock descriptors 2 The first lock descriptor n The second lock descriptor m The third lock descriptor Lock descriptor format: Word Contents 1 The length of the descriptor 2 An array containing the data set, data item and a relational operator and value of the item or items to be locked. MODE - Tells Image what level and whether to lock conditionally or unconditionally. The odd modes are unconditional and the even modes are conditional. Modes 1 and 2 lock at the data base level. Modes 3 and 4 lock at the data set level and modes 5 and 6 lock on the content of a data item or items. STATUS - The condition word for DBlock is always zero. Word two of the status area must be checked to see if the lock was actually granted as requested. For modes 1 through ' this value will be one if successful and zero if not. For modes ' and this value will depend on the number of lock descriptors coded in the request. Typical non-zero condition words: 20 Data base lock conflict (modes 2,4,6) 22 Data set lock conflict (modes 2) 23 Data entry lock conflict (mode 4) 24 Data item lock conflict (mode 6) 25 Entry or entries already locked (mode 6) The form of the DBunlock intrinsic call is: Call DBunlock(BASE,DATA SET,MODE,status) DATA SET - Is not used but is probably included for some planned functionality that was never added, or to be compatible with the number and types of parameters for the DBlock call. MODE - Must be set to 1. STATUS - The condition word for DBunlock is always zero Word two of the status area must be checked to see if the unlock was actually granted as requested. For modes 1 through 4 this value will be one if successful and zero if not. For modes 5 and 6 this value will depend on the number of lock descriptors coded in the original lock request. Data set locks released are counted as one descriptor. Typical non-zero condition words: -11 Bad BASE parameter -31 Bad MODE parameter 5.3.3 Walk through !M=1 Command: LOCK Elapsed time (MS) = 0 CPU = 0 Command: ?S 0000 0001 0000 2800 0000 1199 4163 88d4 0001 0000 Command: RELEASE Elapsed time (MS) = 0 CPU = 0 Command: ?S 0000 0001 0000 2800 0000 119a 4163 88d4 0001 0000 Mixed Mode Unconditional Locking Example Time| DBopen I | DBopen II | DBopen III | DBopen IV ----|---------------|---------------|---------------|-------------- 1 |DBlock MODE= 5 | | | |ORDER-SUMMARY | | | |ORDER-NUMBER= 1| | | |<> | | | ----|---------------|---------------|---------------|-------------- 2 | |DBlock MODE=3 | | | |ORDER-SUMMARY | | | |<> | | ----|---------------|---------------|---------------|-------------- 3 | | |DBlock MODE= 5 | | | |ORDER-SUMMARY | | | |ORDER-NUMBER= 7| | |<> |<> | ----|---------------|---------------|---------------|-------------- 4 |DBunlock | | | | |<> |<> | ----|---------------|---------------|---------------|-------------- 5 |DBlock MODE= 5 | | | |ORDER-SUMMARY | | | |ORDER-NUMBER= 2| | | |<> | |<> | ----|---------------|---------------|---------------|-------------- 6 | | | |DBlock MODE= 1 |<> | |<> |<> ----|---------------|---------------|---------------|-------------- 7 | |DBunlock | | |<> | |<> |<> ----|---------------|---------------|---------------|-------------- 8 |DBunlock | | | | | | |<> ----|---------------|---------------|---------------|-------------- 9 | | |DBunlock | | | | |<> 5.4 DBerror & DBexplain These intrinsics are used to format the status array into a buffer or the STDLIST of the application program. These error messages are meant for debugging purposes and should not be presented to the end user in any form. The better application programs will print an error listing to the system line printer so that as much information about the error condition can be accurately captured as possible. 5.4.1 Description Upon calling DBerror Image will: Use the status array to determine the error from a prior call and look up that error in a message catalog. Then load the text into the buffer provided and set the length parameter to the length of the text. This buffer then can be used to print a complete error incident report which should include the application name, the users current sign on, logical device and time of day. Upon calling DBexplain Image will: Use the status array to format an almost incomprehensible error message on the applications STDLIST device. This call should be used for debugging purposes only because a user would never be able to correctly copy down or relay the formatted message to anyone who could act on it. 5.4.2 Parameters The form of the DBerror Intrinsic is: DBerror(STATUS,buffer,length) STATUS - Is the status array from a previous Image call. BUFFER - Is the text error message returned by DBerror. LENGTH - Is the number of characters in the buffer to a maximum of 72. There is no status returned from the DBerror call so we can really never know if there were any errors in the DBerror intrinsic call. The form of the DBexplain call is: DBexplain(STATUS) STATUS - Is the status array from a previous Image call. There is no status returned from the DBexplain call, so again we really never know if this intrinsic worked. 5.4.3 Walk through Command: !b=BADDB Command: !Q=; Command: !M=1 Command: OPEN Elapsed time (MS) = 58 CPU = 43 NO SUCH DATABASE Command: ?S ffff 0000 0034 0d50 0000 0191 4163 88d4 0001 0000 Command: XPLAIN TURBOIMAGE ERROR AT ???????; RETURN STATUS = -1 DBOPEN, MODE 1, ON BADDB NO SUCH DATABASE HEX DUMP OF STATUS ARRAY FOLLOWS: ffff 0000 0034 0d50 0000 0191 4163 88d4 0001 0000 5.5 DBput & DBdelete Well, now that we can open and close the data base and explain the possible errors, we can get down to the business of getting data into and deleting data out of a data base. The DBput and DBdelete intrinsic calls can only be made to data bases opened in modes 1,3 or 4, otherwise known as modify access. The user class granted from the password must be on the write side of the read/write class list for the data set whose entries are being added or deleted. Data entries are individually added by calls to the DBput intrinsic. Entries are manually added to manual masters and details by calling DBput. Image automatically adds entries to automatic masters on an as required basis to serve as index records as detail entries are added to associated masters. For details indexed by manual masters, your application must add the manual entry prior to attempting to add the detail entry. This feature can be used to make sure that, for example, the vendor number exists prior to the adding of P.O.'s to a purchase order detail. When adding to a manual master, the search item must be unique, if an entry is already present with the same search item, an error is returned. To delete an entry, you must first read the entry with the DBget intrinsic. If the entry is a detail, and it is the only entry on a chain from an automatic master, then the master entry will also be deleted. If the entry to be deleted is in a manual master, then all associated detail chains must be empty for the delete to be successful. If the data base was opened in mode 1, then the DBdelete sequence must be accompanied by a preceding call to DBlock and followed by a call to DBunlock, to insure that another user does not delete the entry between when the record was read and when it was deleted. 5.5.1 Description Upon calling DBput, Image will: Check to see if the data set is an automatic master, a manual master or a detail data set. If it is an automatic master, Image will return an error message. If the DBput is to a manual master, Image will: Check to see if the search item is in the list specified, and see if the search item is unique by hashing the key. Check to see if a lock covers the data base, data set or some value of the search item if the open mode was 1. If the primary location is occupied, Image must determine if the entry at that location is a primary entry or a secondary entry on a secondary chain to another primary. If the entry at the hashed location was a primary, then the new entry is added to its secondary chain at the nearest open record. If the entry hashed to a location that was occupied by a secondary, then the secondary is moved to another location and the new entry is made a primary at its hashed location. If the DBput is to a detail, Image will: Check the free space chain of deleted entries. If any entries have been deleted, then the most recently deleted location is used. If no records have ever been deleted or all delete entries have been reused, then new entries are added at the EOF. Add the detail to all chains from associated masters by finding each associated master chain head and modifying the backward pointer and chain count of each master. Then finding the old last entry in the chain so that it now has a forward pointing to the new record in the chain. And finally, setting the forward and backward pointers of the new entry. If the detail data set specified that its chains were to be kept in sorted sequence, then instead of adding at the end of the detail chain, Image will read up the chain to find the logical point to add the record in the chain. This can produce applications which are spectacularly slow, and should only be specified if the average chain length is small or the records are added in the same sequence as the sort item. Upon a call to DBdelete, Image will: Check to see if there is a current data entry to be deleted, and if it is, a manual master or detail entry. If no current record is present Image will return an error. If the current entry to be deleted is a manual master entry, Image will: First, check to see if all of the chain heads are empty. If not, Image will return an error for the DBdelete call. The Image will identify the type of master entry to be deleted. The entry can be either a primary with no secondaries, a primary with secondaries or a secondary entry. To delete a primary entry with no secondaries, Image sets the entry to zeros, flips the bit in the bit map to 0, and updates the free space count in the label of the data set. To delete a primary entry that has secondary entries, Image finds the next entry in the secondary chain and migrates it to the primary location, overlaying the primary to be deleted, zeros out the secondary's old location, flips the bit in the bit map and updates the free space count on the label of the data set. To delete a secondary entry, Image hashes the key value of the secondary to find the primary entry, locates the preceding and following entries in the secondary chain, if any, and updates their forward and backward pointers, updates the secondary chain counter in the primary entry, flips the bit in the bit map and updates the free space count on the label of the data set. 5.5.2 Parameters The form of the DBput Intrinsic is: DBput(BASE,DATA SET,MODE,status,LIST,BUFFER) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set to which the entry is to be added. MODE - Must be set to 1. STATUS - The condition word will contain a zero if the entry was added as requested. When successful, the status will also hold other interesting information: Word Contents 1 The condition word 2 Word length of the buffer array 3/4 Two word relative record number of the new entry 5/6 Count of the number of entries in the chain, if this was 7/8 a master, it is the number of secondaries, if this was a detail, it is the number of entries in the detail chain. 9/10 If this was a detail this is a pointer to the next entry on the chain, If a master its zeros. LIST - This is the list of data items to be added in the entry. This list can be the data item names, numbers, or a special character. The data item numbers are simply the number given in the sequence defined in the item table of the schema. Two special characters may be used, the * in the list tells image to use the identical list as was specified in the previous call, an A tells Image to use all of the data items for the data set. Both of these special lists save processing time because if data item names are specified, Image must look them up in the item table and cut and paste the contents of the buffer to match the actual entry data record. All of the search or sort items must be in the list provided or else Image will not be able to complete the DBput request. BUFFER - Is an array containing the values of the data items to be added. The individual fields of the buffer must correspond to the sequence and type of the items in the LIST. Typical non-zero condition words: -12 No lock covering entry to be added. -14 Cannot do a DBput in current DBopen mode. -21 Bad data set in DATA SET parameter. -23 Do not have write access to data set. -24 Cannot write to an Automatic master. -52 Bad LIST or item in list. -53 Missing search or sort item. 16 Data set is full (delete entries or resize) 43 Attempt to add duplicate master entry. 1xx Missing manual master entry on path number xx The form of the DBdelete Intrinsic is: DBdelete(BASE,DATA SET,MODE,status) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set to which the entry is to be removed. MODE - Must be set to 1. STATUS - The condition word will contain a zero if the entry was removed as requested. When successful, the status will remain as it was after the preceding DBget: Word Contents 1 The condition word. 2 Zero. 3/4 Two word relative record number of the deleted entry. 5/6 Count of the number of entries in the chain, if this was 7/8 a master it is the new number of secondaries, if this was a detail it is not modified. 9/10 Not modified from the preceding call. Typical non-zero condition words: -12 No lock covering entry to be deleted. -14 Cannot do a DBdelete in current DBopen mode. -21 Bad data set in DATA SET parameter. -23 Do not have write access to data set. 17 No entry has been previously found. 44 Attempt to delete a master with detail chains. 5.5.3 Walk through Command: !Q=CUSTOMER-MASTER Command: !M=1 Command: !L=CUSTOMER-NAME Command: !E=TEST CUSTOMER1 Command: PUT Elapsed time (MS) = 41 CPU = 19 Command: ?S 0000 0014 0000 0005 0000 0001 0000 0000 0000 0000 Command: !E=TEST-CUSTOMER2 Command: PUT Elapsed time (MS) = 3 CPU = 2 Command: ?S 0000 0014 0000 0004 0000 0002 0000 0000 0000 0000 Command: DELETE Elapsed time (MS) = 2 CPU = 2 Command: ?S 0000 0000 0000 0004 0000 0002 0000 0000 0000 0000 5.5.4 Image master secondary chain processing NOTE: THIS SECTION NEEDS UPDATING FOR MASTER DYNAMIC EXPANSION! Secondary master entries happen when unlike search item values hash to the same relative record address. The frequency of this event is dependent upon the capacity of the master and the content of the search item. When this happens Image finds the nearest open entry using the bit map, and places this entry at that relative record number and identifies it as a secondary by chaining it to the entry that got put in the primary location first. A secondary chain consists of the first entry to hash to a given relative record address and all of its secondaries linked together with forward and backward pointers. It is important to understand the performance and practical implications of secondary chains because they can cause severe performance degradation and erroneous processing. The secondary chain head is made up of three fields, a count of the number of entries in the chain, a forward pointer to the next entry on the chain, and a backward pointer to the previous entry on the chain. If the count is 1 then the entry is a primary with no secondaries. If the count is 0 then the entry is a secondary to some other primary entry. And if the count is greater than 1 then the entry is a primary and the count is the number of secondary entries on the chain. An example of the ORDER-NO-MASTER after four entries have been added: ORDER-NO-MASTER Relative| Secondary Chain Head | Detail Chain Head | Record |Count|Backward|Forward|Count|Backward|Forward|ORDER-NO --------|-----|--------|-------|-----|--------|-------|---------- 180 | | | | | | | --------|-----|--------|-------|-----|--------|-------|---------- 181 | 2 | 187 | 187 | 1 | 133 | 133 | 1 --------|-----|--------|-------|-----|--------|-------|---------- 182 | 1 | 0 | 0 | 1 | 2 | 2 | 9 --------|-----|--------|-------|-----|--------|-------|---------- 183 | | | | | | | --------|-----|--------|-------|-----|--------|-------|---------- 184 | | | | | | | --------|-----|--------|-------|-----|--------|-------|---------- 185 | 1 | 0 | 0 | 1 | 511 | 511 | 5 --------|-----|--------|-------|-----|--------|-------|---------- 186 | | | | | | | --------|-----|--------|-------|-----|--------|-------|---------- 187 | 0 | 0 | 0 | 1 | 358 | 358 | 7 --------|-----|--------|-------|-----|--------|-------|---------- 5.5.5 Migrating secondaries and DBput Suppose that an order number of 10 is to be added, and it hashes to relative record number 182. Relative record number 182 has a primary entry so Image will attempt to locate the new secondary entry in the same block as its primary and link them together via forward and backward pointers. ORDER-NO-MASTER Relative| Secondary Chain Head | Detail Chain Head | Record |Count|Backward|Forward|Count|Backward|Forward|ORDER-NO --------|-----|--------|-------|-----|--------|-------|---------- 180 | 0 | 0 | 0 | 1 | 5 | 5 | 10 --------|-----|--------|-------|-----|--------|-------|---------- 181 | 2 | 187 | 187 | 1 | 133 | 133 | 1 --------|-----|--------|-------|-----|--------|-------|---------- 182 | 2 | 180 | 180 | 1 | 2 | 2 | 9 --------|-----|--------|-------|-----|--------|-------|---------- 183 | | | | | | | Now suppose that order number 22 is to be added, and it also hashes to relative record number 182. This new entry is also linked to the secondary chain in the following fashion. ORDER-NO-MASTER Relative| Secondary Chain Head | Detail Chain Head | Record |Count|Backward|Forward|Count|Backward|Forward|ORDER-NO --------|-----|--------|-------|-----|--------|-------|---------- 180 | 0 | 0 | 183 | 1 | 5 | 5 | 10 --------|-----|--------|-------|-----|--------|-------|---------- 181 | 2 | 187 | 187 | 1 | 133 | 133 | 1 --------|-----|--------|-------|-----|--------|-------|---------- 182 | 3 | 183 | 180 | 1 | 2 | 2 | 9 --------|-----|--------|-------|-----|--------|-------|---------- 183 | 0 | 180 | 0 | 1 | 101 | 101 | 22 Migrating secondaries happen when a new entry's search item value hashes to a relative record occupied by a secondary entry, or when a primary entry with secondary entries is deleted. In the first example order number 2 is added and hashes to relative record number 183, which is already occupied by a secondary entry. In this case the secondary entry is moved to relative record number 18' so that the new entry can occupy its rightful primary location. ORDER-NO-MASTER Relative| Secondary Chain Head | Detail Chain Head | Record |Count|Backward|Forward|Count|Backward|Forward|ORDER-NO --------|-----|--------|-------|-----|--------|-------|---------- 180 | 0 | 0 | 184 | 1 | 5 | 5 | 10 --------|-----|--------|-------|-----|--------|-------|---------- 181 | 2 | 187 | 187 | 1 | 133 | 133 | 1 --------|-----|--------|-------|-----|--------|-------|---------- 182 | 3 | 184 | 180 | 1 | 2 | 2 | 9 --------|-----|--------|-------|-----|--------|-------|---------- 183 | 1 | 0 | 0 | 1 | 1 | 1 | 2 --------|-----|--------|-------|-----|--------|-------|---------- 184 | 0 | 180 | 0 | 1 | 101 | 101 | 22 --------|-----|--------|-------|-----|--------|-------|---------- In this second example order number 1 is to be deleted, and this will cause the next entry in the secondary chain to move into the primary location. ORDER-NO-MASTER Relative| Secondary Chain Head | Detail Chain Head | Record |Count|Backward|Forward|Count|Backward|Forward|ORDER-NO --------|-----|--------|-------|-----|--------|-------|---------- 180 | | | | | | | --------|-----|--------|-------|-----|--------|-------|---------- 181 | 2 | 187 | 187 | 1 | 133 | 133 | 1 --------|-----|--------|-------|-----|--------|-------|---------- 182 | 2 | 184 | 184 | 1 | 5 | 5 | 10 --------|-----|--------|-------|-----|--------|-------|---------- 183 | 1 | 0 | 0 | 1 | 1 | 1 | 2 --------|-----|--------|-------|-----|--------|-------|---------- 184 | 0 | 0 | 0 | 1 | 101 | 101 | 22 --------|-----|--------|-------|-----|--------|-------|---------- 5.6 DBget & DBfind This is where all of the discussion about access methods pays off. How information is retrieved from the data base is critical to the success of the application. Hopefully, you have a standard application that reads the data 70% of the time and writes it 30% of the time. In other words data is only writing out to disc if you are going to read it back more than once. The DBfind and DBget Intrinsics work together to allow rapid retrieval of data entries. As we'll see, Image uses directed, calculated, serial and chained access via these two intrinsics for all entry retrieval. DBfind is actually short for DBfindchainhead, remembering the long name will remind you that the DBfind intrinsic does not actually find any data. This intrinsic is solely used to set up the current path for subsequent chained access into a specific detail data set. 5.6.1 Description Upon calling DBget, Image will: Check to see if the data set is opened and if not open it. Then verify that the user class number has read access to all of the items in the list. The next step depends on the mode of the DBget: MODE = 1, Reread the current record: (masters & details) First, Image will determine if there is a current record from a previous DBget, if not, then an error will be returned. Then Image will check the bit map to the current record's block to see if a record at that location still exists. If so, then the record is read and the requested items are returned in the buffer. MODE = 2, Read the next valid record: (masters & details) Image will scan the bit maps in the current and subsequent blocks until an entry is found. Then the record is read and the requested items are returned in the buffer. MODE = 3, Read relative record number N: (masters & details) Image will read in the block containing the requested record, check the bit map to see if a record at that location exists and if so, read the record and return the requested items in the buffer. MODE = 5, Read the next record in the chain: (details only) Image will get the block of the next record on the chain which is the current path, check the bit map to see if a record exists at the forward pointer location and if so, read the record and return the requested items in the buffer. MODE = 6, Read the previous record in the chain: (details only) Image will get the block of the prior record on the chain which is the current path, check the bit map to see if a record exists at the backward pointer location and if so read the record and return the requested items in the buffer. MODE = 7, Read by calculating a relative record number (masters) Image will hash the argument provided generating a relative record number. The block containing that record is read in and the bit map is checked to see if a record exists at that location. If no record exists, then Image will return an error. If a record exists, then Image must read the record and compare the argument to the search item value of that record. If they match, then the requested items are loaded into the buffer. If they don't match, then Image must read each record on the secondary chain, matching on the argument and search item values. If no secondary entry matches, then Image returns an error. If a secondary does match, then Image loads the items requested into the buffer. MODE = 8, Read the primary entry (masters) This is identical to mode 7 except that the primary record will be returned even if a secondary argument is provided. This call was probably provided because this logic is required for a DBdelete of a secondary master entry. This saves chasing the secondary chain back to the primary entry to update the secondary chain count. Upon calling DBfind, Image will: First determine the appropriate master by matching the data item name specified to a unique master. The argument provided is hashed in a form of DBget mode=7 to locate the master entry which contains the chain head required. If no master entry exists, then an error is returned. If a master entry does exist, then the number of entries on the chain, and the forward and backward pointers are returned. 5.6.2 Parameters The form of the DBget Intrinsic is: DBget(BASE,DATA SET,MODE,status,LIST,buffer) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set from which the entry is to be retrieved. MODE - Determines the access method to be used: 1 - Reread the current record, very useful for making sure that someone has not modified or deleted it before we attempt to do the same to someone else. 2 - Serially read the next entry in the data set. For masters this could cause a great number of IO's as the next valid entry can only be found by looking in the bit maps of the next blocks until one is found that has an entry. 3 - Reverse read the previous entry in the data set. For masters this could have the same adverse effects as the forward serial read. 4 - Use the argument parameter to read the entry at a specific relative record location. This is useful if you have stored the relative record number of the entries in another file. Query uses this trick when sorting a large file. Instead of building a sort file as large as the entire detail, it builds a sort record consisting of the sort items and the relative record number of the entry, then after the sort, goes back to get the rest of the data using directed DBgets. 5 - Chained read of the next entry in the detail chain. Must be preceded by a DBfind at some point to establish the current path. 6 - Reverse chain read of the previous entry in the detail chain. Must be preceded by a DBfind call to establish the current path. 7 - Calculated read to locate a master entry that matches the value in the argument parameter. 8 - Read of the entry that occupies the primary address of the value in the argument parameter. This was provided because the DBdelete intrinsic needed to be able to quickly get to the primary entry when deleting a master entry that happened to be a secondary. STATUS - The condition word will contain a zero if the entry was retrieved as requested. When successful, the status will also hold other interesting information: Word Contents 1 The condition word 2 Word length of the buffer array 3/4 Two word relative record number of the entry retrieved. 5/6 If this was a master, it is the number of secondaries, If this was a detail, it is set to zero. 7/8 If this was a detail, this is a pointer to the prior entry on the current chain, if a master, it's zeros. 9/10 If this was a detail, this is a pointer to the next entry on the current chain, if a master, it's zeros. LIST - This is the list of data items to be retrieved into the buffer. This list can be the data item names, numbers, or a special character. The data item numbers are simply the numbers given in the sequence defined in the item table of the schema. Two special characters may be used, the * in the list tells image to use the identical list as was specified in the previous call, an @ tells Image to use all of the data items for the data set. Both of these special lists save processing time because if data item names are specified, Image must look them up in the item table and cut and paste the contents of the buffer to match the actual entry data record. All of the search or sort items must be in the list provided or else Image won't be able to complete the DBput request. BUFFER - Is an array to receive the values of the data items. The individual fields of the buffer will correspond to the sequence and type of the items in the LIST. Typical non-zero condition words: -11 Bad BASE parameter. -21 Bad DATA SET parameter. -31 Bad MODE parameter. -52 Bad LIST or item in list. 10 Beginning of data set from a call in mode 3. 11 End of data set from a call in mode 2. 12 Tried to read before the first record from a call in mode 4. 13 Tried to read past the end of file from a call in mode 4 14 Reached the top of the chain from a call in mode 6. 15 Reached the end of the chain from a call in mode 5. 17 No entry at/for the argument value specified. 1, Broken chain, Image detected that the entry reached via a call in mode 5 or 6 does not belong on the current chain. This can happen quite easily if your application doesn't issue a lock covering the current chain, and another user deletes the entry your application "thinks" is the next entry. It can also happen for real and can be very difficult to fix. This is why we will discuss data logging and recovery in upcoming chapters. The form of the DBfind Intrinsic is: DBfind(BASE,DATA SET,MODE,status,ITEM,ARGUMENT) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set from which the entry is to be retrieved. MODE - Must be 1 for a DBfind call. STATUS - The condition word will contain a zero if the chain head was retrieved as requested. When successful, the status will also hold other interesting information: Word Contents 1 The condition word 2/3 Zeros 5/6 The number of entries in the chain. This can be very useful in that the application can very quickly determine how many records will have to be processed to satisfy a given request. 7/8 Pointer to the relative record number in the detail data set of the last entry in the chain. 9/10 Pointer to the relative record number in the detail data set of the first entry in the chain. ITEM - This is the search item whose chain head we are looking for. The combination of this ITEM and the detail set name in DATA SET are all Image needs to determine the appropriate master to access and retrieve the chain head. ARGUMENT - Is the value of the data item named in the ITEM parameter. Typical non-zero condition words: -11 Bad BASE parameter. -21 Bad DATA SET parameter. -31 Bad MODE parameter. -52 Bad LIST or item in list. 17 No entry at/for the argument value specified. 5.6.3 Walk through 5.7 DBupdate & DBcontrol It used to be that DBupdate could only change the value of non-search and non-sort items of the current record. If a search or sort item must had to be changed then a call to DBdelete and DBput was required. This was one of the major design flaws of Image. To change a search item of one path into a detail, you were forced to delete the entire entry and re-add the entry to all paths. The difference between systems resources for the two types of operations is substantial. Now database access modes 1, 3, or 4, allow DBupdate to modify the values of detail data set search and sort items if permitted by the critical item update (CIUPDATE) option setting in combination with a call to DBcontrol. Master data set key item values cannot be modified with DBupdate. 5.7.1 Description Upon calling DBupdate, Image will: Check to see if the user class has write access to the data items in the list provided. Then Image will check to see if there is a current record. If not, Image will return an error. Image will then compare the value each item in the buffer to the value of each item in the record. If an item is different, then Image checks to make sure that if it is a search or sort item that CIUPDATE is allowed. 5.7.2 Parameters The form of the DBupdate Intrinsic is: DBupdate(BASE,DATA SET,MODE,status,LIST,BUFFER) BASE - Is the unmodified parameter used in the DBopen call. DATA SET - Is the name of the individual data set in which the entry is to be updated. MODE - Must be set to 1. STATUS - The condition word will contain a zero if the entry was updated as requested. Word Contents 1 The condition word 2 Word length of the buffer array 3/4 Same two word relative record number of the entry retrieved with the preceding DBget. 5/6 If the preceding DBget was a master, it is the number of secondaries. If this was a detail, it is set to zero. 7/8 If the preceding DBget was a detail, this is a pointer to the prior entry on the current chain, if a master, it is zeros. 9/10 If the preceding DBget was a detail, this is a pointer to the next entry on the current chain. LIST - This is the list of data items to be updated in the entry, this list can be the data item names, numbers or a special character. The data item numbers are simply the numbers given in the sequence defined in the item table of the schema. Two special characters may be used, the * in the list tells image to use the identical list as was specified in the previous call, an @ tells Image to use all of the data items for the data set. Both of these special lists save processing time, because if data item names are specified, Image must look them up in the item table and cut and paste the contents of the buffer to match the actual entry data record. BUFFER - Is an array containing the values of the data items to be added. The individual fields of the buffer must correspond to the sequence and type of the items in the LIST. Typical non-zero condition words: -12 No lock covering entry to be added in DBopen mode 1. -14 Cannot do a DBupdate in current DBopen mode. -21 Bad data set in DATA SET parameter. -52 Bad LIST or item in list. 17 No current entry, try a DBget first. 41 Attempt to update a search or sort item. 42 Attempt to update a read only item, check your user class number against the data set and item read/write lists. 5.7.3 Walk through DBDRIVER C.05.00 TUE, NOV 10, 1998, 3:27 PM (C) HEWLETT-PACKARD 1978 Command: !B=TEST Command: !M=3 Command: !Q=; Command: O Elapsed time (MS) = 285 CPU = 120 Baseid=%000001 Class=64 Current DBG size=32767 DBU size=10240 Command: !Q=ORDER-SUMMARY Command: G Elapsed time (MS) = 17 CPU = 16 030061 052105 051524 026503 052523 052117 046505 051040 0 1 T E S T - C U S T O M E R 020040 020040 020040 020040 020040 020040 020040 020040 020040 020040 020040 020040 020040 031062 030060 020040 2 2 0 0 020040 020040 Command: !E=02 Command: !M=1 Command: !L=ORDER-NO Command: UPDATE Elapsed time (MS) = 1 CPU = 1 DBUPDATE attempted to modify value of critical item--key, search or sort Command: !M=5 Command: KONTROL Elapsed time (MS) = 1 CPU = 0 Command: ?S 0000 0000 0000 0001 0000 319b 4163 88d4 0005 0000 Command: !M=1 Command: UPDATE Elapsed time (MS) = 14 CPU = 8 Command: EXIT 5.8 DBbegin & Dbend These intrinsics are used to bracket a logical transaction in the Image transaction log file. A logical transaction may consist of many Intrinsic calls to perform a particular task. Transaction logging recovery uses the DBbegin and DBend to determine if the entire logical transaction was completed. If DBRECOVER does not find a corresponding DBend to a DBbegin, it will not apply the partially completed logical transaction. Both intrinsics allow posting of a text message to the log file. This can be useful if you intend to build custom programs to process the log file to print reports. 5.8.1 Description Upon calling DBbegin, Image will: First check to see if DBbegin has been called twice without a call to DBend, you can only have one logical transaction going at a time. Then write a transaction begin record out to the log file. This record may contain text passed via the DBbegin call. 5.10.2 Parameters The form of the DBbegin Intrinsic is: DBbegin(BASE,LOGTEXT,MODE,status,TEXTLEN) BASE - Is the unmodified parameter used in the DBopen call. LOGTEXT - Is an array containing up to 512 characters or 25 words of binary data. This data is only used if you have a custom program for processing the Image transaction log file. MODE - Must be set to 1. STATUS - The condition word will contain a zero if the record was written as requested. Word Contents 1 The condition word 2 Whatever it was prior to this call TEXTLEN - The number of words in the LOGTEXT. Typical non-zero condition words: -151 TEXTLEN too large -152 Called DBbegin twice in a row without a DBend 71 Logging not enabled The form of the DBend Intrinsic is: DBend(BASE,LOGTEXT,MODE,status,TEXTLEN) BASE - Is the unmodified parameter used in the DBopen call. LOGTEXT - Is an array containing up to 512 characters or 25 words of binary data. This data is only used if you have a custom program for processing the Image transaction log file. MODE - Tells Image how to end the logical transaction: 1 - Write the end of logical transaction record to the logfile. 2 - Write the end of logical transaction record to the logfile and flush the transaction logging buffers in memory to disc or tape. This insures that the logical transaction just completed is captured in the log file. STATUS - The condition word will contain a zero if the record was written as requested. Word Contents 1 The condition word 2 Whatever it was prior to this call TEXTLEN - The number of words in the LOGTEXT. Typical non-zero condition words: -151 TEXTLEN too large -153 Called DBend without a prior call to DBbegin. 71 Logging not enabled 5.9 DBxbegin, DBxend, & DBxundo These intrinsics are called to dynamically commit or roll back a sequence of DBput, DBdelete, and Dbupdate calls that completed successfully inside a dynamic transaction. The start of the sequence is designated by a previous call to DBxbegin. DBxundo cannot be called to roll back a transaction if AUTODEFER is enabled because transaction management is exclusive of AUTODEFER. 5.9.1 Description Upon calling DBxundo, Image will undo all previous calls to DBput, DBdelete, and Dbupdate since DBxbegin was called. A DBxundo is implicit performed if the program ends without a call to DBxend! This feature is very handy for protecting logical transactions in the event of a program abort. 5.9.2 Parameters The form of the DBbegin Intrinsic is: DBxbegin(BASE,LOGTEXT,MODE,status,TEXTLEN) BASE - Is the unmodified parameter used in the DBopen call. LOGTEXT - Is an array containing up to 512 characters or 25 words of binary data. This data is only used if you have a custom program for processing the Image transaction log file. MODE - Must be set to 1. STATUS - The condition word will contain a zero if the record was written as requested. Word Contents 1 The condition word 2 Whatever it was prior to this call TEXTLEN - The number of words in the LOGTEXT. Typical non-zero condition words: -151 TEXTLEN too large -152 Called DBxbegin twice in a row without a DBxend The form of the DBxend Intrinsic is: DBxend(BASE,LOGTEXT,MODE,status,TEXTLEN) BASE - Is the unmodified parameter used in the DBopen call. LOGTEXT - Is an array containing up to 512 characters or 25 words of binary data. This data is only used if you have a custom program for processing the Image transaction log file. MODE - Tells Image how to end the logical transaction: 1 - End of dynamic transaction. 2 - End of dynamic transaction and if logging write the end of logical transaction record to the logfile and flush the transaction logging buffers in memory to disc or tape. This insures that the logical transaction just completed is captured in the log file. STATUS - The condition word will contain a zero if the record was written as requested. Word Contents 1 The condition word 2 Whatever it was prior to this call TEXTLEN - The number of words in the LOGTEXT. Typical non-zero condition words: -151 TEXTLEN too large -153 Called DBxend without a prior call to DBxbegin. 71 Logging not enabled The form of the DBxundo Intrinsic is: DBxundo(BASE,LOGTEXT,MODE,status,TEXTLEN) BASE - Is the unmodified parameter used in the DBopen call. LOGTEXT - Is an array containing up to 512 characters or 256 words of binary data. This data is only used if you have a custom program for processing the Image transaction log file. MODE - Must be an integer equal to 1. STATUS - The condition word will contain a zero if the record was written as requested. Word Contents 1 The condition word 2/10 Whatever it was prior to this call TEXTLEN - The number of words in the LOGTEXT. Typical non-zero condition words: -215 XM error encountered when rolling out dynamic transactions. -11 Bad database reference. -151 Text length greater than 512 bytes. -218 Output deferred is incompatible. 5.9.3 Walk through Command: !B=TEST Command: !M=3 Command: !Q=; Command: O Elapsed time (MS) = 369 CPU = 120 Baseid=%000001 Class=64 Current DBG size=32767 DBU size=10240 Command: !M=1 Command: S Elapsed time (MS) = 0 CPU = 0 Command: ?S 0000 0040 7fff 2800 0000 31a4 4163 88d4 0001 0000 Command: !Q=CUSTOMER-MASTER Command: !E=TEST CUSTOMER Command: !L=@ Command: P Elapsed time (MS) = 19 CPU = 17 Command: ?S 0000 006a 0000 0003 0000 0001 0000 0000 0000 0000 Command: W Elapsed time (MS) = 104 CPU = 32 Command: ?S 0000 006a 0000 0003 0000 31a6 4163 88d4 0001 0000 Command: G Elapsed time (MS) = 0 CPU = 0 NO CURRENT RECORD OR THE CURRENT RECORD IS EMPTY (CONTAINS NO ENTRY) 5.10 DBinfo This intrinsic is used to programmatically access the root file and return information about the data base. The type of information that can be returned is mostly limited to the item, set and path definitions. DBinfo can be used to construct applications that are smart enough to pre-map data item names to numbers, or check to see if there is enough room in the data set for the number of entries about to be added. 5.10.1 Description Upon calling DBinfo, Image will: Perform the appropriate lookup requested in the MODE parameter and return the data into the buffer. 5.10.2 Parameters The form of the DBinfo Intrinsic is: DBinfo(BASE,QUALIFIER,MODE,status,buffer) BASE - Is the unmodified parameter used in the DBopen call. QUALIFIER - Is the data item or data set name or number depending on the MODE selected. MODE - Tells DBinfo what type of information to return. Modes 101 to 104 return data item information. Modes 201 to 204 return data set information. Modes 301 and 202 return path identifying information. Modes 401 and 402 return logging information and mode 501 returns the subsystem access flag from the root file. 101 - Looks up data item number and type of access for data item provided in the QUALIFIER parameter. The data item number is returned in the buffer. If the data item number is negative then the item can be updated or the entry containing the item can be added using DBput or removed using DBdelete. This mode is useful for finding the data item numbers for specific data items. This can speed up access to the data items in a data set if a call is repeated many times. 102 - Looks up the definition of a data item provided in the QUALIFIER. The data item name, data type, sub-item length, and sub-item count is returned in the buffer. 103 - Looks up all of the data items in the data base. The qualifier is not used. The format of the data items returned is the same as mode 102. 104 - Looks up the definition of all the data items for the data set name or number provided in the QUALIFIER. The definition of each data item as in mode 102, are returned in the buffer. 201 - Looks up data set number and type of access for the data set provided in the QUALIFIER parameter. The data set number is returned in the buffer. If the data set number is negative, then an entry in the data set can be added using DBput or removed using DBdelete. This mode is useful for finding the data set numbers for specific data sets. This can speed up access to the data set if a call is repeated many times. 202 - Looks up the definition of a data set provided in the QUALIFIER. The data set name, set type, entry length, blocking factor, number of entries, and the capacity of the set are returned in the buffer. This mode is useful for writing a custom program to check the capacity and percent full factor of each data set. 203 - Looks up all of the data sets in the data base and describes the type of access available. The QUALIFIER is ignored, and a word array is returned that contains a count of the number of data sets in word one and the data set numbers available in the rest of the array. The data set number is negative if data entries can be added using DBput and removed using DBdelete. 204 - Looks up all of the data sets in which the data item passed in the QUALIFIER occurs. The format of the buffer is the same as in mode 203. 301 - Looks up the paths into a detail or out of a master data set identified in the QUALIFIER. The path information is returned as a word array in the buffer. The first word is the path count, then for each path the data set number, search item number, and sort item number are returned. The path information is arranged in the same sequence as was specified in the schema. 302 - Looks up the search item for a master or the primary path search item for a detail. The search item number is returned in the Qualifier. 401 - Looks up current transaction logging information. The QUALIFIER is ignored. The first four words of the buffer contain the current log identifier name. The fifth through seventh words contain flags for data base logging, user logging, and a transaction flag. Words eight and nine contain the current user transaction number. 402 - Looks up Intrinsic Level Recovery information. The QUALIFIER is ignored. Word one of the buffer is a flag indicating whether ILR is enabled. Words two through four are the data and time that ILR was enabled, and words five through fourteen indicate the last intrinsic and data set that was processed via ILR. 501 - Looks up the current subsystem access flag. The qualifier is ignored. The buffer contains one word which is zero. If no access should be allowed, one if only read access is allowed and three if both read and write access are allowed. STATUS - The condition word will be zero if the information request successfully processed. Word Contents 1 The condition word 2 The length of the buffer returned BUFFER - Is returned to the calling program, the contents vary depending on the MODE selected. Typical non-zero condition words: -31 Bad mode parameter 50 The buffer array was too small to hold the requested data. Remember that certain modes can cause a very large buffer to be returned. 5.10.3 Walk through Command: !B=TEST Command: !Q=; Command: !M=1 Command: O Elapsed time (MS) = 463 CPU = 111 Baseid=%000001 Class=64 Current DBG size=32767 DBU size=10240 Command: !M=203 Command: I Elapsed time (MS) = 10 CPU = 10 3 -1 -2 -3 Command: !Q=1 Command: !M=202 Command: I Elapsed time (MS) = 23 CPU = 6 CUSTOMER-MASTER M 106 3 0 0 5 5 ------------------------------------------------------------------- 6.0 What are the Utilities and Tools? 6.1 DB2disk & Disk2DB These are very useful contributed system library (CSL) utilities That can unload all or part of a data base to a flat file and put The flat file back in to all or part of the data base. If you don't Have a copy of the CSL you can contact WWW.INTEREX.ORG. 6.2 DBUNLOAD & DBLOAD DBUNLOAD is a utility that copies the data entries out of the individual data sets to tape. This utility is useful for small data bases to change the data base design, capacities, or to pack the detail chains. For larger Image data bases these tasks will be many times easier and faster using one of the many data base restructuring tools available today. The following discussion about DBUNLOAD and DBLOAD serve as a generic example of the unload and load process, even though in their case the operation is to tape. The DBUNLOAD program first checks to see if it can gain exclusive access to the data base. If not, it prints the error message DATA BASE IN USE and terminates. The DBUNLOAD program has two entry points, CHAINED and SERIAL. The default mode of operation is CHAINED. In serial mode the entries are read using DBget mode 2, in chained mode the detail data sets are unloaded via DBget mode 5. The path used in the chained unload is determined in the schema by specifying the "!" as a primary path indicator. If no primary path was specified, then the first path will be used. If a broken chain is encountered, the DBUNLOAD will attempt to process the chain backwards (mode 6(c) and see if all of the entries can be read. Data on missing entries is printed before the data set summary. The data sets are processed in the same sequence as they were defined in the schema. The DBget intrinsic is called and the data set number is appended to the front of the record as it is written out to tape. To DBUNLOAD an Image data base: [FILE DBUNLOAD[=dbname] [;DEV=tapeldev] RUN DBUNLOAD.PUB.SYS [,CHAINED] [,SERIAL] WHICH DATA BASE base name[/maintenance word] DATA SET set number: n ENTRIES END OF VOLUME v, e ERRORS RECOVERED DATA BASE UNLOADED END OF PROGRAM Where: dbname - Is a file ID that will appear on the console to help identify the reply being requested, and the tape(s) being read. tapeldev - Is the logical device number or class of the tape drive to be used in the store operation. base name - Is the name of the data base to be stored. You can't specify a group or an account, the ones you are currently logged on are assumed. maintenance word - good luck trying this without one if you aren't logged on as the creator of the data base. The DBload utility program reads the data entries unloaded to tape and DBputs them back to the data base. The data base name must be the same as the data base unloaded and must exist in the same group and account as you are currently logged on. The changes that you can make to the actual structure of the data base with the DBUNLOAD, DButil, and DBLOAD utilities are very limited. You can change the capacities of individual data sets, add data items only to the end of the data entry, and add data sets at the end of the schema. For these reasons many other utilities have been developed to perform the more often required types of data base changes on just the data sets that require modification. For some applications whose design requires heavy access to detail chains that average a number greater than the number of entries in the detail block, the effects of a chained DBUNLOAD and DBLOAD can be dramatic. The benefits of packing the detail chains must be balanced by the amount of time and effort expended in keeping them packed. For small data bases, the maintenance sequence for packing the detail entries would be to store the data base off to tape with DBSTORE or the MPE STORE command, DBUNLOAD with the CHAINED entry point, ERAse the data base with DButil, and to DBLOAD. THIS CAN BE A VERY LONG PROCEDURE! PLEASE TAKE THE TIME TO ESTIMATE THE NUMBER OF HOURS, DAYS OR WEEKS THIS MAY TAKE BEFORE JUMPING IN WITH BOTH FEET. An example of a typical reload happened on a Thanksgiving day weekend. A DBUNLOAD/DBRELOAD was started on the previous Wednesday afternoon. The DBUNLOAD took a little over three hours, and produced almost two reels of tape at 1600BPI. By Monday morning the DBRELOAD was two inches into the front of the first reel, and moving mighty slow. Needless to say, there were some very anxious users and systems people. Their first question was, "How long will it take to complete the DBRELOAD" and their second question was, "What do we do now" Well, after looking at the schema for the data base, and doing some simple arithmetic based on the number of IO's required and the present IO rate of their system, The conclusion was that the DBRELOAD would be completed about mid-Friday morning! This forced the decision to abort the reload and restore the data base. It turned out that they simply wanted to expand the capacity of one of the smaller detail data sets. They were using the DBUNLOAD and DBLOAD utilities because they did not have ADAGER or one of the other many data base capacity changing utilities. The end solution was to restore the data base to one of the company's systems that had one of these tools and perform the capacity change. The entire process took 45 minutes. To DBLOAD an Image data base: [FILE DBLOAD[=dbname] [;DEV=tapeldev] RUN DBLOAD.PUB.SYS,defer WHICH DATA BASE base name[/maintenance word] DATA SET set number: n ENTRIES END OF VOLUME v, e READ ERRORS RECOVERED DATA BASE LOADED END OF PROGRAM Where: dbname - Is a file ID that will appear on the console to help identify the reply being requested, and the tape(s) being read. tapeldev - Is the logical device number or class of the tape drive to be used in the store operation. defer - Tells Image to enable autodefer for the run of the DBLOAD program. This can speed up the DBLOAD since this tells Image to write the buffer out when full, not entry by entry. If the system fails you would have to start the reload over again anyway so getting it over with as soon as possible is desirable. base name - Is the name of the data base to be stored. You can't specify a group or an account, the ones you are currently logged on are assumed. maintenance word - good luck trying this without one if you aren't logged on as the creator of the data base. 6.3 Capacity, structure change, and diagnostic utilities 6.3.1 ADAGER (The Adapter/Manager for Image/3000 Data bases) http://www.adager.com/AdagerGuideTOC.html 6.3.2 Flexibase for Image database monitoring, maintenance and management http://www.triolet.com/Flexibase.html 6.3.3 DBGENERAL (Bradmark) Demo request, no online documentation http://www.bradmark.com/dgimage2.htm 6.4 Logging and Recovery Transaction logging and recovery represents a recognition of the fact that computer systems are subject to software and hardware failures. In order to protect the reliability of the data stored in the data base in between DBSTORE or MPE STORE operations, the data base software must provide a secure method for capturing and reapplying transactions to the data base. The most frequent cause requiring transaction recovery is the inadvertent restoring of data bases over top of the current production version of the data base. Second is a hardware failure of a disc that contained part of the data base. And third is the accidental use of QUERY to delete records online. 6.4.1 Procedures for logging and recovery There are three separate procedures involved with Image logging: installation, startup and shutdown. The initialization procedure is a one time event and seems more complicated than it is really is. The startup and shutdown procedures are usually incorporated into the full backup jobs and the system start up and shutdown processes. The TurboIMAGE manual describes the necessary steps in great detail in the logging and recovery section. 6.4. Overhead of logging Construct your own test with DBdriver your results should be similar. KEY W/O WITH VALUE LOGGING LOGGING !E=01 CPU = 23 CPU = 24 !E=02 CPU = 3 CPU = 4 !E=03 CPU = 3 CPU = 4 !E=04 CPU = 2 CPU = 4 !E=05 CPU = 3 CPU = 4 !E=06 CPU = 2 CPU = 3 !E=07 CPU = 3 CPU = 4 !E=08 CPU = 5 CPU = 4 !E=09 CPU = 2 CPU = 4 !E=10 CPU = 3 CPU = 4 TOTAL(2/10)=26 =35 LOGGING OVERHEAD= 35/26 = 35% 6.5 DBDRIVER DBDRIVER - Is an HP supplied utility program that allows calling the Image intrinsics without writing a program. DBDRIVER is useful for demonstrating and testing how the Image intrinsics work, without the structure of a programming language. DBDRIVER is executed simply by entering: :RUN DBDRIVER.PUB.SYS[,entry point] This allows all Image intrinsics to be executed by entering a one letter command. The appropriate parameters must be set prior to executing the command. The command abbreviations are as follows: B[egin] Calls DBbegin C[lose] Calls DBclose D[elete] Calls DBdelete E[rror] Calls DBerror F[ind] Calls DBfind G[et] Calls DBget I[nfoY Calls DBinfo K[ontrol] Calls DBcontrol (C=Close) L[ock] Calls DBlock M[emo] Calls DBmemo N[d] Calls DBend (E=Error) O[pen] Calls DBopen P[ut] Calls DBput Q[uit] Calls DBxend R[elease] Calls DBunlock (U=Update) S[tart] Calls DBxbegin T[pi] Calls DBtpiinfo (TPI products installed) U[pdate] Calls DBupdate W[aive] Calls DBxundo X[plain] Calls DBexplain (E=Error) Y Calls DBcall (description of the previous call(c) Z Calls DBmaint (PM only) DBDRIVER also allows parameters to be set before each call and displayed after the call is completed. To set parameter values you used an - and a letter to signify which parameter you are setting. To display a parameter value you use a " and a letter to signify which parameter to display. Base - !B=Data Base Name Password - !Q=Password Mode - !M=Mode Status - !S=Status Array List - !L=List of Item Names Buffer - !E=Buffer Contents Argument - !A=Argument value DSET - !Q=Data set Name Item - !I=Item Name DBDRIVER has special commands and alternate entry points that are useful for documentation and debugging purposes. Other DBDRIVER Commands: /V - Displays version, update, and fix /D - Debug /S - System Debug (Privileged Mode) /E - Exit /M - Monitor - Sends all terminal activity to LP /W - Turns off Monitor (Dumps to LP) //text - Comments /C - Displays Code Segment # Tables of Image Procedures. /R nl,n2,n3 - Turn on Repeat - Execute Command n1 * n2 Times - Displays Timings after each n2 Times - Add n3 to Each Word in Buffer Each Time - Turn OFF Repeat DBDRIVER Entry points: VUF - This entry point displays the version/update/fix levels of the IMAGE program files in PUB.SYS and the version/update/fix of the IMAGE intrinsics to which DBDRIVER was linked by the loader. PURGE - This entry point prompts for the name of the I-file to be purged. After the purge, the program terminates. I-files are created by IMAGE when dumping is enabled. OOPS - This entry point prompts for purging the DBS and DBG. CLONE - Prints the specified I-file to the printer and terminates. If the user doesn't have privileged mode (PM) then the program displays "No dice!" and terminates. PRIV - If no PM then responds "No dice!" and terminates. Otherwise, DBDRIVER runs in privileged mode. This allows calls to uncallable intrinsics, and access to privileged functions of callable intrinsics. 6.6 MPEX LISTF DB If you have MPEX from VESOFT, it has a very useful capability to perform a LISTFDA command which will list the capacities, blocking factors, device locations and data set types for all or seleted data bases on your system. :RUN MPEX.PUB.VESOFT %LISTF fileset,DB [;listfile] FILENAME SET NAME TY ENT- CAPA- FULL% BLK SEC- %BLOCK SECTR PE RIES CITY FCT TORS WASTED WASTE The LISTF ,DB can show and select on the following criteria: - data set's file name and Image set name - data set type - number of entries in the data set - number of sectors the data set takes up - the wasted space due to blocking of entries 6.7 DBLOADNG/HOWMESSY (Robelle) Robelle's How Messy tutorial as a self-extracting archive can be found at http://www.robelle.com/library/smugbook/howmessy.html DBLOADNG is a contributed library program that can read every entry and every chain in the data base. This function in and of itself is very valuable in that it verifies the structural integrity of the data base. In addition, the program returns statistics about the capacity of the data sets and the locality of the entries on secondary or detail chains. RUNNING Dbloadng :RUN DBLOADNG.PUB.TELESUP Data Base Loading Analysis -- DBLOADNG V2.7f Data Base name: TEST Password (;) : Mode (5) : 5 Data Set (@) : Data Base name: END OF PROGRAM The formal file designator for the output file is LOADREPT which is equated to the device class LP. The report can be sent to another printer via a file equation, IE FILE LOADREPT;DEV=LP2. The program can take a long time and consume a high percentage of the CPU so it is highly suggested that this is run from a batch job at off hours. The report is usually only a few pages, so if run from a batch job set up a file equation to defer the printing so that the report doesn't get lost. The report contains the following headings and statistics. Don't let the volume of numbers intimidate you. With a few rules of thumb you'll be able to rip through this report with the best of them. Data Base - Is the name of the data base requested User Access Number - Is actually the user class number returned from the DBopen. It comes from looking up password provided in the passwords table of the root file. Global DBCA Size - Is the number of words that the global control block currently occupies. The amount of space required can be analyzed from the table reported generated by DBschema. Local DBCA Size - Is the size in words of the local buffers for the current DBopen. Data Set # - Is the sequence number of this data set as specified in the schema. In large data bases with non-descript data set names, it may be easier to refer to the data sets by number. Data Set Name - Is the name given to this data set in the schema. Type - M for manual master, A for automatic master, and D for detail. Capacity - Maximum number of entries that the data set can contain. Entries - The current number of entries actually in the data set as a result of a call to DBput. % Load Factor - This is calculated as the number of entries divided by the capacity times 100. The load factor is a very important statistic for Image data sets. If the data set becomes full then you have two unpleasant options, delete some entries or increase the size of the data set. % Secondaries - Is calculated for master entries as the number of entries that have a zero in their chain count divided by the current number of entries times 100. The occurrence of secondaries is governed by two things, the load factor and the content of the key being hashed. As the load factor increases, so does the incidence of secondary entries, and the more similar the key values being hashed, the more secondary entries there are. Max Blocks - This is the largest occurrence of contiguous blocks without a free entry. On a DBput, if the entry being added is a secondary entry or is a primary that has its entry already occupied by a secondary, then Image will have to find a free entry to place the new secondary or migrate the existing secondary to. This means that adjacent blocks will have to be read in and their bit maps searched until a free entry is located. This can be quite time consuming. In general, the same factors governing the occurrence of secondaries will drive the size of this value. However, for large masters the percents can hide the effects of a local cluster of entries. Blockfactor - This is the number of entries per block that Image assigned based on the media record size and the BLOCKMAX specified in the schema. A low block factor results from the data content of the entry and the number of paths specified. A small number of entries per block aggravates the effects of secondary chains and can explain large Max Block values. "*", "S", "!" - The * means that this is a stand alone master or detail with no paths. The S shows that this path is sorted, and the ! indicates that this is the primary path as defined in the schema. Search Item - Is the data item name of the key item as specified in the schema. Masters may only have one search item, and details can have up to 16 search items. Max Chain - This is the length of the longest secondary chain for masters and the longest detail chain for the path into a detail. Avg Chain - This is the average length of the secondary chain for a master or the average chain length for a path into a detail. As the average chain length approaches ten percent of the number of entries, the benefits of maintaining the chain are outweighed by the overhead to maintain it. It becomes faster and simpler to read the detail serially selecting the entries of interest. Std Dev - Is the standard deviation of the average chain length. This value indicates the normalcy of the spread of the length of the chains. In other words a comparison of the average and maximum chain lengths. A small standard deviation indicates a uniformity of chain lengths. A large standard deviation shows that the average chain length is based on a wide spread of chain lengths and that further investigation into the data base design is required. Expected Blocks - Is a calculation of the average number of blocks that each chain should require based on the current blocking factor. This value is on average the perfect case number of IO's required to read an entire chain. Average Blocks - Is the actual average number of IO's required to read an entire chain. Note that this number is only valid if the standard deviation of the average chain length is small. Elongation - This is the ratio of perfect blocking to actual average blocking, or more simply stated, as a multiplier of resources CPU and Disc IO's required beyond the perfect case. If this ratio is one or close to one, then the benefits of performing a chained unload and load or DETPACK of these paths will be negligible. However as this ratio grows then the benefits of repacking can be very great, if this path is used frequently. % Inefficient Pointers - This is calculated as the total number of pointers to other blocks divided by the total number of pointers times 100. This percentage also is a good indicator of the expected benefits of reorganizing. Number of Primaries - Actual number of master entries that have a secondary chain counter greater than zero. Number of Secondaries - Actual number of master entries that have a zero in the chain counter. Number of Chains - For masters, this is the number of entries that have a secondary chain counter greater than one. For details, this is the number of nonzero master chain heads. External Pointers - This is a count of the number of entries that have forward pointers to other blocks. If the block is adjacent or close by, it may not require an IO to get the next block, depending on the use of MPE\V disc caching or MPE\XL mapped files. Contiguous Pointers - Is a count of the number of pointers that point to the next adjacent block. If the average chain is longer than the blocking factor, then this can't be avoided. Generally the larger the blocking factor and the shorter the average chain, the smaller occurrence of contiguous pointers. This is where buffering and disc caching really pay off. Total Blocks - Calculated as the number of entries divided by the blocking factor rounded up. Full Blocks - The number of blocks that have no free entries. Empty Holes - This is a count of secondary pointers that skip over free entries in masters. These holes are created because Image does not migrate secondaries when other primary entries are created. This is because of the overhead involved and the chance that the primary may be re-added as with the case of a DBdelete and a DBput to change the value of a search item for some other path in a detail. This count is a good indicator of volatility in a master. When this number becomes large, a reload or change of capacity of the master will repack the secondary chains. Messages: Capacity is not prime - The master data set indicated has a capacity which is not a prime number. Generally, capacities that are even numbers or that have many repeating digits yield a greater number of secondary entries. The message also provides the next higher and lower prime numbers. Load factor greater than 80% - When a data set becomes full you have two options, delete some records or increase the capacity. Both options can take a great deal of time, so it's a good idea to know well in advance and make appropriate plans to increase capacity well in advance. When master data sets get past the 50% full mark, secondary entries become an issue. When the data set gets greater than 80% full nearly every entry will be a secondary. This causes extra processing overhead that can be simply avoided by maintaining capacities at a reasonable level. Secondary Percentage greater than 30% - This means that greater than three out of ten entries is a secondary, and that all operations to this data set incur overhead processing these secondary chains. The solution is to increase the capacity and investigate the key values being used. Character keys should vary the most on the right, and numeric keys should vary the most on the left. Average chain length greater than 1.5 - When the average secondary chain length grows past 1.5 entries, each operation to the data set is requiring on average 1.5 times as much resources as normal. This is another method for assessing the existence of poor capacity or hashing performance. The solution is the same as when the secondary percentage is greater than 30%. Number of entries read not equal to the Image count - This is a common message for Image data bases that have been in existence for a long time and have experienced a number of system failures. The number of entries that Image thinks are in use differs from the actual number of records found by actually reading the entire data set. This message could also indicate that the data base was modified during the DBLOADNG run. But it also could indicate more serious problems. For detail data sets, it is very important for the records used counter to be in synchronization with the number of actual entries, because if the counter drops to zero, Image will start adding records at the first entry, as if the data set had been erased. If however the counter is off, then this will not happen and each new entry will be added on the delete space chain in last deleted next used sequence. The performance implications are dramatic. ------------------------------------------------------------------- 7.0 How do I begin the design process? Probably the most mystical part of any data base management system is. how does the design specification actually come into being. This is more of an art than a science, but like any artistic endeavor, three things are certain. Practice makes permanent, perfection is fleeting, and you'll always wish you would have done something else when you have run out of time. The initial design process can be broken up into four smaller projects which should get simpler as you proceed, depending upon how well the previous step was accomplished. The first step is to define the data elements that will become the palette of colors from which the data base will be painted. The second step is to hierarchically relate each data item to each other (sounds more complicated than it really is). The third step is to fold the hierarchies previously created into a two-level network data base. And the last design step is to prototype and test the initial design for structural conflicts. Another good basic discussion on data base design can be found at: http://www.aics-research.com/pdf/chap2.pdf 7.1 Defining the data elements Data elements are defined by collecting all of the application documentation and coordinating content of each screen and report. It is very time consuming to redefine the size of a field once you get to the prototyping phase, so take your time and get the type and size of each data entry down pat. The naming convention should be kept simple. Picture an end user typing the data item name into some report generating program. Don't make the mistake of mixing upper and lower case, it looks neat but it will make programming and the use of ad hoc report writers much more difficult. Keep in mind that storing numbers as type X and type U means that the compiled application programs will spend many CPU cycles converting these display fields to binary values in order to do any arithmetic functions. 7.2 Building hierarchical structures The next step, which is almost never done, is simply to make a two dimensional matrix by listing each data element horizontally and vertically. In each cross-match of data items, list the ratio of each data item to the other. The relationship will be zero for none, or one for 1:1, two for 2:1 and so forth. This, in effect, builds a table that represents a hierarchy that will be invaluable in the next step. 7.3 Folding hierarchies into two level networks The final step in defining the schema is to collapse the hierarchy of data items into Image's two possible levels of structure, namely master and detail data sets. This is done by originally defining each group of related data items in as many detail data sets as are required. The process can be more easily pictured as raking the data items into piles of data items that are then defined as detail data sets. The next step is to place the capacity values on each data set. The capacity of each data set will determine the need for key values. Remember the example of the library with only one or two hundred books. The card catalog is really a waste because all of the titles can be searched in short order. So the first rule is. if the detail is less than two hundred entries forget about any key paths and either scan the table for each access or build an array to hold them in your application program. The next step is to identify the need for online access to the detail data set. If a data item in the data set makes a natural search item, then define an automatic master, and place a capacity on this master. If the ratio of master entries is 1:1 then you've got a good index, as the ratio of detail entries increases i.e. 1:100 then the value of the index starts to drop. In the process of selecting index data items remember to select no more than three data items to serve as paths. The marginal value of each additional path drops off dramatically after the third path while the overhead of maintaining the path is a linear function. If no data item in the proposed detail data set is a likely index value, you have two options. One is to concatenate two or more data items together hoping that the ratio of the new concatenated key to detail entries will be small enough to be useful. The other choice is to invent a new unique index data item. Both choices have their good and bad points, the concatenated key is discussed shortly. Finally, if any of the details has only one index, this may be a candidate for combining the two data sets into a single manual master. This will reduce the amount of IO to process data entries, but there are two major drawbacks to manual masters with data. One is that the data content reduces the number of key records per block, and the second is that the only way to retrieve data on any of the non-key data items is to serially search the entire master, reading empty blocks and all. 7.4 Prototyping to resolve structural conflicts There is no substitute for filling the data base up with test data and running prototype applications and reports against the data base. If you wait until the data base goes into production to perform this type of testing live, it will be very difficult to make any changes in the design. Without testing under a production-sized load, you will not find the types of problems that will lead to performance degradation later on. This is especially true of the higher level programming languages and report writers that unless specifically directed will perform all data retrieval operations via a serial read of the data sets required. This method works very well on small test data bases, but when the data base starts to fill up, the run time of the reports and response times at the terminals also go up. 7.5 Walk through Suppose that we needed to design an Image data base for order entry. The first step would be to collect existing documentation, invoice forms, order books, price sheets, ETC. The second step would be to design the basic screens and reports required to eliminate the paper system. These interactive screens and report layouts allow definition of the data item names, types and sizes. The net result should be a table of data items and a list of access requirements. The list of access requirements is derived from the functionality of the interactive screens. For example the order entry screen requires the ability to look up any order based on the order number, customer number, sales account. ETC. These requirements should be listed in a separate table along with acceptable transaction response times. Order Data Base Data Items Table Item Names Type Len BUYER-NAME, X 30 BUYER-PHONE, Z 10 CUSTOMER-ADDR-1, X 30 CUSTOMER-ADDR-2, X 30 CUSTOMER-CITY, X 20 CUSTOMER-NAME, X 30 CUSTOMER-NUMBER, X 6 CUSTOMER-PO, X 16 CUSTOMER-STATE, X 2 CUSTOMER-ZIP, X 10 ITEM-NUMBER, X 16 LEAD-TIME, I 1 NET-WEIGHT, I 2 ORDER-NUMBER, X 6 ORDER-DATE, Z 6 ORDER-STATUS, X 2 ORDER-TYPE, X 2 PART-NUMBER, X 16 PART-ADDED, Z 6 PART-DESCRIPTION, X 30 PART-ORDER-QTY, I 2 PART-PRICE, I 2 PART-SHIP-QTY, I 2 PROMISE-DATE, Z 6 SALES-ACCT, X 12 SCRAP-PCT, I 1 SETUP-QTY, I 2 SETUP-UNITS, I 2 SHIP-ADDR-1, X 30 SHIP-ADDR-2, X 30 SHIP-CITY, X 20 SHIP-DATE, Z 6 SHIP-NAME, X 30 SHIP-STATE, X 2 SHIP-ZIP, X 10 STD-UNITS, I 2 UNIT-MEASURE, X 2 WANT-DATE, Z 6 Order Data Base Access Requirements Table 1. Retrieval of customer information by CUSTOMER-NUMBER and SALES-ACCT. 2. Retrieval of order information by ORDER-NUMBER, PART-NUMBER, SALES-ACCT, and ORDER-STATUS. 3. Retrieval of part information by PART-NUMBER. The next step is to cross-relate the access requirements with the data items table. This allows the bundling of data items into data set candidates. The question to ask at each cross point is, "Are these data items related, and if so, is it a one to one relationship, or a one to many relationship> For example, for each CUSTOMER-NUMBER, how many BUYER-NAME's will there be. The answer is 1. For each SALES-ACCT how many CUSTOMER-NUMBER's" The answer is many (shown as N). Order Data Base Data Items Relational Table Item Names Type Len CUSTOMER ORDER PART SALES ORDER NUMBER NUMBER NUMBER ACCT STATUS -------- ------ ------ ----- ------ BUYER-NAME, X 30 1 BUYER-PHONE, Z 10 1 CUSTOMER-ADDR-1, X 30 1 CUSTOMER-ADDR-2, X 30 1 CUSTOMER-CITY, X 20 1 CUSTOMER-NAME, X 30 1 CUSTOMER-NUMBER, X 6 1 N CUSTOMER-PO, X 16 1 N CUSTOMER-STATE, X 2 1 CUSTOMER-ZIP, X 10 1 ITEM-NUMBER, X 16 1 1 N LEAD-TIME, I 1 1 NET-WEIGHT, I 2 1 1 N ORDER-NUMBER, X 6 1 N N ORDER-DATE, Z 6 1 N ORDER-STATUS, X 2 1 1 ORDER-TYPE, X 2 1 N PART-NUMBER, X 16 1 1 N PART-ADDED, Z 6 1 PART-DESCRIPTION, X 30 1 1 N PART-ORDER-QTY, I 2 1 N PART-PRICE, I 2 1 1 N PART-SHIP-QTY, I 2 1 N PROMISE-DATE, Z 6 1 N SALES-ACCT, X 12 1 1 1 N SCRAP-PCT, I 1 1 SETUP-QTY, I 2 1 SETUP-UNITS, I 2 1 SHIP-ADDR-1, X 30 1 1 N SHIP-ADDR-2, X 30 1 1 N SHIP-CITY, X 20 1 1 N SHIP-DATE, Z 6 1 1 N SHIP-NAME, X 30 1 1 N SHIP-STATE, X 2 1 1 N SHIP-ZIP, X 10 1 1 N STD-UNITS, I 2 1 UNIT-MEASURE, X 2 1 1 N WANT-DATE, Z 6 1 1 N After the data items have been related the next step is to break out the one-to-one relationships into natural data sets. Notice that at this stage in the process all data sets should not be prejudged as master or detail data sets. The one-to-many relationships mean that by definition they will have to be studied further to decide if they will make useful key data elements. Order Data Base Natural Data Sets Table Sets: Name: CUSTOMERS; Entry: CUSTOMER-NUMBER, CUSTOMER-NAME, CUSTOMER-ADDR-1, CUSTOMER-ADDR-2, CUSTOMER-CITY, CUSTOMER-STATE, CUSTOMER-ZIP, SHIP-NAME, SHIP-ADDR-1, SHIP-ADDR-2, SHIP-CITY, SHIP-STATE, SHIP-ZIP, BUYER-NAME, BUYER-PHONE; Name: ORDERS; Entry: ORDER-NUMBER, ORDER-TYPE, ORDER-STATUS, ORDER-DATE, CUSTOMER-NUMBER, SHIP-NAME, SHIP-ADDR-1, SHIP-ADDR-2, SHIP-CITY, SHIP-STATE, SHIP-ZIP, SHIP-DATE, PROMISE-DATE, WANT-DATE, CUSTOMER-PO, PART-NUMBER, PART-DESCRIPTION, NET-WEIGHT, UNIT-MEASURE, PART-ORDER-QTY, PART-SHIP-QTY, PART-PRICE, ITEM-NUMBER, SALES-ACCT, LEAD-TIME; Name: PARTS; Entry: PART-NUMBER, PART-DESCRIPTION, SALES-ACCT, PART-ADDED, SCRAP-PCT, NET-WEIGHT, UNIT-MEASURE, ITEM-NUMBER, PART-PRICE, LEAD-TIME, SETUP-QTY, SETUP-UNITS, STD-UNITS; These data sets were made up by simply reading down the columns that contained all 1's. These should default to detail data sets with an automatic master linked via the cross-referenced data item. This gives a first stage design that looks like this: +-----------+ +-----------+ +-----------+ | CUSTOMER | | ORDER | | PART | | AUTOMATIC | | AUTOMATIC | | AUTOMATIC | | MASTER | | MASTER | | MASTER | +-----------+ +-----------+ +-----------+ | | | | | | | | | \|/ \|/ \|/ V V V +-----------+ +-----------+ +-----------+ | CUSTOMER | | ORDER | | PART | | DETAIL | | DETAIL | | DETAIL | +-----------+ +-----------+ +-----------+ What about those one-to-many relationships. To analyze the value of these data elements as possible key values we need to build a final table that takes into account the number of expected entries in the detail data sets and the number of possible key values. The number of possible key value occurrences is divided into the number of expected entries in the associated detail data sets. This number is the expected average chain length, and thus can be used to determine the usefulness of the potential key value. Order Data Base One-to-Many Occurrences Table Data Set Names #ENTRIES CUSTOMER ORDER PART SALES ORDER NUMBER NUMBER NUMBER ACCT STATUS -------- ------ ------ ----- ------ # Of Occurrences 700 5,000 1,200 50 2 -------- ------ ------ ----- ------ CUSTOMER-DETAIL 700 1 14 ORDER-DETAIL 5,000 1 100 2,500 PART-DETAIL 1,200 1 Notice that the SALES-ACCT item is a good key candidate, while the ORDER-STATUS item with only two possible occurrences does not create a small enough subset of the ORDER-DETAIL to make it faster to read the chain than the entire data set. +-----------+ +-----------+ +-----------+ +-----------+ | CUSTOMER | | SALES-ACCT| | ORDER | | PART | | AUTOMATIC | | AUTOMATIC | | AUTOMATIC | | AUTOMATIC | | MASTER | | MASTER | | MASTER | | MASTER | +-----------+ +-----------+ +-----------+ +-----------+ | / \ | | | / \ | | | / \ | | \|/ / \ \|/ \|/ V V V V V +-----------+ +-----------+ +-----------+ | CUSTOMER | | ORDER | | PART | | DETAIL | | DETAIL | | DETAIL | +-----------+ +-----------+ +-----------+ This design allows for all of our access requirements except the capability to access the order detail data set based on ORDER-STATUS. After further investigation it was discovered that this capability is required to access the order entries that had an ORDER-STATUS of O for opened so that once per day they can be printed on a pick ticket report. This requirement is easily accommodated by adding a separate detail to hold the order numbers as orders are added. Also at this time we can examine the design for single pathed detail data sets that are candidates for being turned into a manual master. This should only be done if you can answer yes to the following questions: Would you bet $10,000 that no one would ever want to search this detail based on one of the other data items Will the data set contain more than 100 entries In regard to the first question, it is very simple using ADAGER or some other data base change utility to add an automatic master to an existing detail. But, it can be very difficult to modify applications that formerly accessed a master to perform a DBfind to access the new detail. The second question really applies to having a key into any data set, if the number of records is small (fewer than 200(c) why have a key in the first place. With these final changes in place, our data base design now looks like this: +-----------+ +-----------+ +-----------+ +-----------+ | CUSTOMER | | SALES-ACCT| | ORDER | | PART | | AUTOMATIC | | AUTOMATIC | | AUTOMATIC | | MANUAL | | MASTER | | MASTER | | MASTER | | MASTER | +-----------+ +-----------+ +-----------+ +-----------+ | / \ | | / \ | | / \ | \|/ / \ \|/ V V V V +-----------+ +-----------+ +-----------+ | CUSTOMER | | ORDER | |OPEN-ORDER#| | DETAIL | | DETAIL | | DETAIL | +-----------+ +-----------+ +-----------+ The formal ORDERS schema: PAGE 1 HEWLETT-PACKARD 32215C.00.42 TurboIMAGE/3000. DBschema (C) HEWLETT-PACKARD CO. 1978 Begin Data Base ORDERS; Passwords: 1 ANY ; Items: BUYER-NAME, X 30 ; BUYER-PHONE, Z 10 ; CUSTOMER-ADDR-1, X 30 ; CUSTOMER-ADDR-2, X 30 ; CUSTOMER-CITY, X 20 ; CUSTOMER-NAME, X 30 ; CUSTOMER-PO, X 16 ; CUSTOMER-STATE, X 2 ; CUSTOMER-ZIP, X 10 ; CUSTOMER-NUMBER, X 6 ; ITEM-NUMBER, X 16 ; LEAD-TIME, I 1 ; NET-WEIGHT, I 2 ; ORDER-NUMBER, X 6 ; ORDER-DATE, Z 6 ; ORDER-STATUS, X 2 ; ORDER-TYPE, X 2 ; PART-NUMBER, X 16 ; PART-ADDED, Z 6 ; PART-DESCRIPTION, X 30 ; PART-ORDER-QTY, I 2 ; PART-PRICE, I 2 ; PART-SHIP-QTY, I 2 ; PROMISE-DATE, Z 6 ; SALES-ACCT, X 12 ; SCRAP-PCT, I 1 ; SETUP-QTY, I 2 ; SETUP-UNITS, I 2 ; SHIP-ADDR-1, X 30 ; SHIP-ADDR-2, X 30 ; SHIP-CITY, X 20 ; SHIP-DATE, Z 6 ; SHIP-NAME, X 30 ; SHIP-STATE, X 2 ; SHIP-ZIP, X 10 ; STD-UNITS, I 2 ; UNIT-MEASURE, X 2 ; WANT-DATE, Z 6 ; Sets: Name: CUSTOMER-MST,Automatic(/0,1); Entry: CUSTOMER-NUMBER(1); Capacity: 1009; Name: SALES-ACCT-MST,Automatic(/0,1); Entry: SALES-ACCT(2); Capacity: 101; Name: CUSTOMER,Detail(/0,1); Entry: CUSTOMER-NUMBER(CUSTOMER-MST), CUSTOMER-NAME, CUSTOMER-ADDR-1, CUSTOMER-ADDR-2, CUSTOMER-CITY, CUSTOMER-STATE, CUSTOMER-ZIP, SHIP-NAME, SHIP-ADDR-1, SHIP-ADDR-2, SHIP-CITY, SHIP-STATE, SHIP-ZIP, SALES-ACCT(!SALES-ACCT-MST), BUYER-NAME, BUYER-PHONE; Capacity: 1000; Name: ORDER-MST,Automatic(/0,1); Entry: ORDER-NUMBER(1); Capacity: 10007; Name: ORDERS,Detail(/0,1); Entry: ORDER-NUMBER(ORDER-MST), ORDER-TYPE, ORDER-STATUS, ORDER-DATE, CUSTOMER-NUMBER, SHIP-NAME, SHIP-ADDR-1, SHIP-ADDR-2, SHIP-CITY, SHIP-STATE, SHIP-ZIP, SHIP-DATE, PROMISE-DATE, WANT-DATE, CUSTOMER-PO, PART-NUMBER, PART-DESCRIPTION, NET-WEIGHT, UNIT-MEASURE, PART-ORDER-QTY, PART-SHIP-QTY, PART-PRICE, ITEM-NUMBER, SALES-ACCT(!SALES-ACCT-MST), LEAD-TIME; Capacity: 10000; Name: PARTS,Manual(/0,1); Entry: PART-NUMBER(0), PART-DESCRIPTION, SALES-ACCT, PART-ADDED, SCRAP-PCT, NET-WEIGHT, UNIT-MEASURE, ITEM-NUMBER, PART-PRICE, LEAD-TIME, SETUP-QTY, SETUP-UNITS, STD-UNITS; Capacity: 5003; Name: OPEN-ORDERS,Detail(/0,1); Entry: ORDER-NUMBER; Capacity: 1000; End. DATA SET TYPE FLD PT ENTR MED CAPACITY BLK BLK DISC NAME CNT CT LGTH REC FAC LGTH SPACE CUSTOMER-MST A 1 1 3 14 1009 36 507 120 SALES-ACCT-MST A 1 2 6 23 101 22 508 24 CUSTOMER D 16 2 151 159 1002 3 478 1340 ORDER-MST A 1 1 3 14 10007 36 507 1116 ORDERS D 25 2 136 144 10002 3 433 13340 PARTS M 13 0 53 58 5003 8 465 2508 OPEN-ORDERS D 1 0 3 3 1002 167 512 28 TOTAL DISC SECTORS INCLUDING ROOT: 18495 NUMBER OF ERROR MESSAGES: 0 ITEM NAME COUNT: 38 DATA SET COUNT: 7 ROOT LENGTH: 1308 BUFFER LENGTH: 512 TRAILER LENGTH: 256 ------------------------------------------------------------------- 8.0 How do I improve on an existing design? 8.1 Common design situations The majority of system managers and programmers inherit someone else's data base design. Does this mean that you are stuck and there is not much that can be done. The answer is certainly no! On most HP3000 systems, the Image data bases are at the heart of all the applications and are the place where the real work gets done. If you've read up to this point, you should be aware of the many possible configuration issues that are important whether you originally designed the data base or not. The following discussion is an attempt to identify some of the common design problems built into Image data bases and methods by which you can minimize their impact. 8.2 The concatenated key Very often in the design of a data base you'll have a group of related data items that form a nice detail data set, but no one item is a natural key. One choice would have been to assign an artificial key, as is the case with customer numbers, vendor numbers etc. These numbers serve as unique identifiers and make good key values because there is usually only one customer record per customer number or vendor record per vendor number. However, it is not always possible to build such indices, and the only other choice is to pick two or more data items in the data set that when put together make a small enough chain in the detail as to deserve an index. An example would be concatenating the customer name and zip codes together creating a new data item called NAME-ZIP. This is purely a bet that we don't get too many customer names that are the same in any given zip code. As a side note. it is a good idea when concatenating data items together, to place the data items that vary the most to the right, and the more static data items to the left. This is because the hashing algorithm that generates the relative record number from the data item content starts at the right and uses these bits to "randomize> the rest of the bits from right to left. The net result will be a more even spread of entries in the master data set. 8.2.1 Using DBfind on multiple paths to find the shortest chain One method that sounds complicated, but in reality is very simple to implement, is to make use of the feature of DBfind that brings back the number of entries on the chain. When a screen is designed, it is very helpful to highlight the fields that can be used to search the data, and if the request includes multiple search items into the detail data set, it would be advantageous to perform multiple DBfinds and then do the chained reads on the shortest chain. Or if the chains are greater than a predetermined length, request the user to enter any additional data. 8.2.2 Implications for data dictionaries In order to use a data dictionary with concatenated keys, the data dictionary must be able to break out the data items into sub-fields. This is very common today but when data dictionaries first became available it was a big problem. 8.2.3 Implications for report generators The real problems with concatenated keys come out when reports are generated. Unless the report writer is capable of breaking out the sub-fields, you may have great difficulty entering the selection criteria as a mess of concatenated characters. Query does not have the ability to prompt for parts of a data item, and many a person has experienced great frustration attempting to perform a Query find on a concatenated key. 8.3 The need for generic keys Many applications have the need to be able to retrieve entries on a generic or close match basis. There is no method built into Image to provide this functionality, although some have tried. There are add-on software packages that build data structures within Image data sets that can provide some of these capabilities. The following discussion covers some ideas that may work for you. 8.3.1 The two bytes, four bytes, 6 bytes approach One method is to create a new data item that is a shortened version of the complete data item. An example would be the first two characters of the customer name. This key would allow chained retrieval of all customers whose names started with the same two characters. This could be a very long chain, so we could add another new data item that is the first four characters of the customer name, and so forth. Remember, the amount of overhead to add and delete a detail entry is directly proportional to the number of associated masters. Thus this method, while providing a limited solution, is very expensive in terms of the CPU and Disc resources to make it work. 8.3.2 Using KSAM versus BTREES NOTE: NEED TO ADD BTREE DISCUSSION IN THIS SECTION Hewlett-Packard does provide a file system and access method to facilitate generic key retrieval, in the form of KSAM, an acronym for keyed sequential access method. KSAM files are not widely used for primary data storage because, while providing generic access functionality, it has three design flaws that make it unsuitable for production data storage. The first is that locking can only occur at the file level, this causes very long response times when more than ten users are running an application against a common KSAM file. Second, KSAM files do not share buffers for all users on the system, and therefore the more users who have the file opened, the more memory is required for redundant buffers. And lastly, KSAM files can be corrupted easily when a system failure, abort or halt is encountered, and it can take a long time to recover these files afterwards. However, one solution to the need for generic access is to unload the master and load the data into a KSAM file on a nightly basis. Then if a generic access is required, the KSAM file is searched to bring back the exact key value of interest, which is then used to perform a DBfind to bring back the detail data entries. 8.3.3 Third Party Indexing (TPI) NOTE: NEED TO ADD TPI DISSCUSSION 8.3.4 Sound-indexing, the removal of vowels in character keys Many generic accesses are an attempt to find close matches on name data items, CUSTOMER-NAME, VENDOR-NAME, EMPLOYEE-NAME, etc. One method that works well is to create a squashed name by removing all of the vowels and repeating consonants, and truncating after four characters have been found,. thus HORNSBY becomes HRNS. This allows for many different spellings that sound alike, and will allow a chained read of names that are close sounding approximations. 8.4 Open vs. closed order data sets A very common design problem is moving data from one data set to another, usually as the status or timing of the data changes, for example, moving orders from open orders to closed orders, or inventory transactions to inventory history. The jobs or process that move these entries around tend to be very CPU and Disc IO intensive, not to mention tricky to program and test. The following describes these common problems and provides some alternatives. 8.4.1 Moving data from one path to another In many applications it is necessary to process transactions from one state to another. To illustrate, let's use an example of a simple order entry data base with an open order detail and a closed order detail with two associated masters both based on order number. Now, every day at shipping time the open orders are picked and moved to the closed orders. This requires a DBfind/DBget/DBdelete on the open orders and a DBput on the closed orders. This can be a very time consuming process that takes longer as the number of orders increase. A better method would have been to keep an open orders detail that contained only the order number of unpicked orders, and an order detail that contains all orders with a new field called ORDER-STATUS. Thus the processing at shipping time would be DBget/DBdelete on open orders and a DBget/DBupdate on the orders data set. Note the DBdelete is not as intensive because there is no path into open orders, it is a simple stand alone detail.. 8.4.2 Sequential processing and DBdelete (moving secondaries) One of the things to be very careful about with Image data bases is when serially processing an Image master and performing DBdeletes. You must manually check the status after the DBget or re-read the current record to verify that a secondary that was in a higher relative record number hasn't moved to the current record which you've just deleted. If you don't check this condition manually, it could lead to a situation in which your program appears to have processed all of the entries but really has not. This problem gets worse as the number of entries grows and the master becomes saturated. 8.4.3 Using a DBupdate versus a DBput and DBdelete One way to improve response time is to include a deleted flag in the data entry and logically delete records by updating this flag instead of actually performing the DBdelete. This is very handy for data entries that tend to be extremely volatile. It is also handy when someone accidentally deletes an entry or group of entries inadvertently. The entries can be physically deleted when the space is required, and the processing time is available. Another benefit of this method is that it can be used to always be adding to the end of a detail set, a feature which can substantiate performance gains if the detail entries for a given chain are added at the same time, i.e. order line items in an order detail. The drawback is that all applications using the data set would have to recognize and ignore the logically deleted entries. 8.5 The need for archiving of data from the data base In the great hurry to get data into the data base, the more important question of how long will it stay and how to get rid of it are often overlooked. This over site will lead to a startling discovery about Image which is IT TAKES JUST AS LONG TO DELETE AN ENTRY AS TO PUT IN THE ORGINAL ENTRY. So, if you have a staff of order entry clerks entering orders two shifts per day, five days a week for two years, how long will it take to archive out the shipped orders For many applications the problem is not knowing what entries to keep and which to delete, and for others the problems is when to actually delete the entries. 8.5.1 Government and corporate regulations One of the main reasons that data is allowed to accumulate in a data base is a lack of corporate policy on record keeping, or on the other hand some applications are forced by various government regulations to keep records online. The decisions on policy for record keeping should be made if at all possible before the design of the data base or the purchase of a package. How the various applications deal with monthly amounts of entries will be dramatically different than processing two or three years of entries. 8.5.2 Using expiration dates on data entries The problem of knowing which entries to keep and which to delete can be very difficult if two types of data items are not included in the data entry. One data item is an expiration date. That simply is a timer that tells when it's okay to delete the entry. And the other is a last accessed date, which is updated when the data entry is retrieved and/or updated. Both of these data items can work together to provide a method for deleting entries. 8.5.3 The problems of month and year-end archiving By including expiration and last accessed data items, it is possible to perform archival runs on daily and weekly cycles instead of combining this type of processing with other month and year-end activities. These periods are usually heavy enough without adding the additional overhead of archival. 8.6 Validation of logical relationships Logical relationships are the interdependencies of data that the application builds into the data base design. For example, a single customer entry per customer number, entries in an inventory data base that are referenced by the production planning database. These relationships should be well documented, and auditing applications put in place to verify or balance one data base against another. The point being is that it is much better to find and correct problems by cycle checking the data bases than in a big month or year-end run. 8.6.1 Locking strategies Locking across multiple data bases requires multiple rin (MR) capability, which can be very dangerous if implemented improperly. Poor locking strategies are one of the primary ways in which data bases or data sets within a data base can get out of balance. A key strategy is to perform locks in the same sequence in all applications, and to lock and reread entries prior to an update transaction. If the same locking sequence is used throughout all applications, one of the major causes of deadlocks will be avoided. Besides requiring a restart of the systems, deadlocks also contribute to fragmented application of transactions that can logically corrupt data sets or data bases. Other logical corruption occurs by coding applications so that they blindly DBupdate or DBdelete/DBput the current record. To insure logical integrity, the application should reread the current record to see if it still exists, and if so, if it is the same as it was when it was initially read. The situation where multiple users go after the same record simultaneously is application dependent. But it happens more frequently than you would think. The results are that an update that was thought to be processed gets "unprocessed> or you might even get a false indication of a broken chain because the next or previous record was deleted before you got to it. 8.6.2 Matching multiple data bases Unfortunately, the only way to match and verify multiple data sets or data bases is to build an application program that audits the content of the entries. It is sometimes useful to create control data sets that contain record counters and "hash> totals. A "hash> total is simply the total accumulation of one or more of the data entries in a data set. In its simplest form, it could be the total of all of the dollar-sales data items for all data entries. These record counters and totals provide a quick method of balancing even very large files. 8.6.3 Matching data bases across multiple systems One of the data management problems unique to distributed computers is the management of multiple data bases on different computer systems. There are some third party products that attempt to apply transactions from one data base to the other and vice versa, but the amount of time spent keeping the data bases in synchronization is always underestimated, and it always turns out that specialized applications are required to assist in auditing and adjusting the content of the data bases. 8.6.4 Audit trails for data base changes The most valuable aid in the battle to keep data bases balanced is the transaction audit trail. This report is a history of the detail and total net changes to the data base data sets. Once the data base gets out of balance, the process of putting it back is predicated in a large part by someone's ability to figure out what happened. Without an audit trail, it is almost impossible to recreate the transactions exactly. These reports are some of the easiest to produce as the application program updates, adds, or deletes entries in the data base. But all too often the only information these programs generate is the end of program message. 8.7 Validation of physical relationships Physical relationships are the internal pointers that Image maintains for you. The path from a master to a detail, and the chains maintained in the detail are made up of pointers and counters that physically make up the data base structure. These pointers are protected as much as possible by Image and MPE, but computers as reliable as they are, are still prone to failures. As have witnessed repeatedly, human error can be even more devastating. 8.7.1 Finding broken chains There are two ways to find a broken chain in an image data base. One way is just to wait, and sooner or later at two or three in the morning during a crucial month-end or year-end job, you'll find one. The other way is to go looking for them, and the only way to validate an Image data base is to read down each chain. The contributed library utility DBCHECK, and ADAGER's diagnostic functions should be run on a periodic basis. 8.7.2 Setting up a logging strategy When the data base becomes physically corrupted via hardware, software or human error, you are faced with the decision of whether to attempt to patch up the data base if possible, or restore the data base from a backup copy. If the physical media has been destroyed by a head crash or partially overwritten, the chances of patching up the data bases are slim. If however the extent of damage is a broken chain or two, these can sometimes be mended by doing a chained unload and load of the affected data sets. The only other option depends on if you were logging the transactions to a log file on a separate disc or tape drive. In this case you can restore from the backup and reapply the logged transactions with little or no loss of production data. The length of time between backups of the data base should be determined by the number of transactions in the interval. Generally, the greater the volume of transactions, the shorter the interval between back ups. 8.7.3 Using data base audit utilities The best way to use these audit utilities is to set up production job streams for each data base: !JOB DBCHEKUP,CREATOR.DATABASE :RUN DButil.PUB.SYS SHOW DBname ALL EXIT :RUN QUERY.PUB.SYS B=dbname ; 5 FO SETS EXIT :RUN DBLOADNG.PUB.TELESUP dbname ; !EOJ If you are unsure about which data bases are located in what accounts on your system, a neat trick is to perform a LISTF @01.@.@,2. This will list all files that end in 01 in each group and account. An Image data base must have at least one data set, which will be the data base name with 01 appended for a file name. The files listed with a PRIV file code are Image data sets. 8.8 Changing a key value A major design problem is updating a search item value. The only way that Image allows this to happen is via a DBdelete and a DBput combination. This problem is usually worse for details as they can have up to sixteen search items. 8.8.1 Calculating number of IO's needed The main issue for deleting and adding a data item is the CPU and Disc IO required to perform the transaction. A system with this kind of application will usually be able to serve about half as many users as the same system performing simple DBputs. The number of IO's required for each DBdelete and DBput is on average 2+2(number of paths if a detail, thus a detail with three paths would require sixteen IO's to complete the transaction. But we must also multiply by the number of users out of the total signed on users who will perform this transaction in a given time period. Thus, when deciding on key values, we must take into consideration the likelihood that the data item may be changed. Usually these types of key values can be stored in separate data sets (remember the open versus closed orders example(c) and eliminate the need for making the data item a search item and also the need to perform a DBdelete/DBput versus a DBupdate. 8.8.2 Assessing effects of volatility The performance impact of changing a key value for each transaction should be thoroughly tested and understood before going into production. These are the kind of gotchas that can create the need for a total rewrite before the application can go into production. 8.8.3 Implications for locking strategies When deleting an entry from one path and adding to another path, the only choice is to lock at the data set level, and as this is a relatively long transaction, the wait time for other transactions can become excessive. This can be verified by using DButil and using the show command to see how often and how long users are waiting in the locking queue. 8.9 Data content of masters Manual masters can have optional data items that can have dramatic design implications. Problems can occur with serial retrieval and decreased number of records per block. 8.9.1 Data in a master must be retrieved serially The fact that an entry can be retrieved only by one unique key value means that all other data items in the data entry can only be retrieved via a serial search. For example, an employee manual that uses the social security number as the unique key. What do you do when someone doesn't know their social security number, serial search until you match on last name. This could take awhile if you have a large number of employees. 8.9.2 Wide masters allow fewer keys per block The other issue is that the number of key entries per block decreases as the record gets wider. This can aggravate the processing of secondaries by increasing the likelihood that the secondary chain spans multiple blocks. 8.9.3 Creating a new detail data set The solution is to drop the master down to a detail data set and add the paths required for online interactive access. This can not be done with a simple DBUNLOAD/DBLOAD, you must have a data dictionary unload/load utility, a data base restructuring utility, or break down and write a custom program. 8.10 Removed 8.11 Eliminating keys that are not used Probably the most powerful thing you can do to an Image data base to improve performance is to remove paths to details that aren't being used. The secret is picking out these paths and finding the computer time to actually make the changes. Unused paths can be identified by the average and maximum chain lengths. A path with very long chains cannot be used interactively and probably take longer to process than reading the detail serially. The main benefits of removing a path are reducing the amount of CPU time and disc IO's for each transaction and the amount of disc space required for the master and the detail. 8.11.1 Wasted CPU time and disc IO's The number of disc IO's to DBput or DBdelete is 2+2(number of paths). By deleting a path we reduce the multiplier by one. So, going from four paths to three paths means going from ten IO's to eight IO's, for a savings of 20%. In general, for DBputs and DBdeletes the amount of CPU required is directly related to the number of disc IO's, thus we would expect a corresponding savings in CPU time. The number of IO's actually required can be greatly increased by the presence of secondary chains in the case of associated masters and by the use of sort and extended sort items in the detail. 8.11.2 Wasted disc space Each entry in the master must have six words for each associated detail, and each entry in the detail must have four words for each set of forward and backward chain pointers. This doesn't sound like much, but it adds up fast. 8.12 Logging and recovery The simple reason for logging is that it is the only way to recover transactions that have been applied to the data base since the last backup. For some applications, losing data is not a big problem, but for most it's a major catastrophe. Image logging is not very difficult to install, and transactions are not that difficult to recover. The hard part is making sure that logging is always enabled. System restarts, and various restructuring utilities turn off logging and usually manual intervention is required to re-enable the data base for logging. 8.12.1 Predicting recovery time The amount of time between backups is determined by the number of transactions in the log file. Since each transaction must be DBput, DBdelete, or DBupdated back into the data base, the amount of time to recover can be as long as the amount of time between backups. It can be very difficult to estimate the actual number of transactions and their type. This is why DBRECOV can be run to produce a transaction statistics report that identifies the transactions type and count. With this data and the formula for number of IO's per transaction, you can estimate the number of hours, days or weeks it would take to recover the transactions if you had to. 8.12.2 Intrinsic level recovery (ILR) 8.13 Error handling When designing an Image application, consideration should be given to the processing of errors and the facility for debugging and tracing the conditions which lead to the error. When consult on data base and application design, my favorite trick is to use DButil to change the data base passwords to some bogus values, to see how the application will handle the error on the DBopen call. Very often I'll find that the program doesn't even check the status after the DBopen and aborts on the next intrinsic call. Other applications will display the standard Image error message by calling DBexplain or DBerror. But the best applications will print an error log report stating the date, time, program, user logon and the Image error, along with other useful information generic to the particular application. This type of error handling is basic to being able to successfully support the application once it goes into production. You simply can't count on the user being able to document and relate enough information to quickly diagnose and solve the problem. 8.13.1 Logging and tracking of image errors By printing out error occurrence reports you can track the errors over time to pick out trends or possible application flaws. This is also useful to spot users who may be in need of training and assistance. There are usually two types of problems that arise in Image applications. reproducible hard errors, and erratic intermittent errors. like to classify problems as type A "twisty little passages all alike> and type A "twisty little passages all different". The trick to solving the reproducible problems is to be able to set up the exact conditions when the problem or error occurred. And the trick to solving the intermittent problems or errors is to gather as much hard documentation as possible and to try to find common patterns. 8.13.2 User interface to image errors Image errors are simply not intended for end user consumption. When designing your application system, build your own message catalog of errors and warning messages. Your HP3000 comes with the facility to build, access and maintain custom message catalogs. The HP3000 also comes with an extensive help facility that includes the capability for building custom help catalogs, but this is another topic altogether. 8.14 Critical design assumptions In any design, you are forced to make certain assumptions based on the knowledge of the applications at that time. This next section is dedicated to a list of assumptions that frequently cause major design problems in the long run. The fun and challenging part of debugging Image data base performance problems is figuring out what the data base designer must have been thinking (or not thinking of(c) when the data base was under construction. 8.14.1 The ratio of reads to writes On any HP3000 the operating system and the Image data base software has been optimized for a mix of 75% read orientated applications and 25% write orientated applications. Systems whose design deviates from this average will not get the same number of interactive users with the same response time profiles as the three to one read bases systems. The 75% reads and 25% writes rule of thumb The assumptions made about performance of Image data base applications in regard to the average number of users that a specific model of HP3000 can support is based largely on applications that DBget three entries for every one DBput. Application designs that do not match this standard profile i.e. order entry, will not get the same number of users with the same response times as the standard applications. This is because it takes a lot more work to perform DBputs in terms of CPU and Disc IO's and DBputs to a shared data base require at least set level locking. So what has to be done to the design of applications that perform three DBputs for each DBget. The answer: is keep the structure of the data base as simple as possible, minimizing the number of paths into details, and thus the amount of work for the DBput to perform. Another solution is to edit and write the transactions to a stand alone detail, and then have a batch background process running at a lower priority that reads the stand alone transaction detail and posts it to the appropriate data set. This disassociates the data entry user from the DBput with attached masters, thus allowing maximum throughput from the data entry application. But whatever the work around, keep an eye out for these DBput orientated applications, it is much easier to design transaction files into the application in advance than attempting to retrofit once the application is in production. 8.14.3 Unload and reload times When designing a data base, one of the considerations as when to split up data bases and data sets is the amount of time it will take to unload and reload the data base and each data set. When this estimate exceeds two days, make sure that you have funds in your project budget for one of the data base tools that can unload and load the database or individual data sets. Even if you don't plan to pack detail data sets on a regular basis, when you come across a broken chain, you'll be faced with a decision to restore back to an old version of the data base prior to the corruption (if you have one(c) or to unload and reload the data base or affected data sets. 8.14.4 Using sorted chains The assumptions made about the overhead of designing in sorted chains have been historically one of the biggest Image data base design problems. This is because of two reasons. the first is that the data base designer is usually also the application programmer looking for ways to cut corners in writing application code, and the second reason is that the performance problems do not show up until you perform volume testing or go into production. 8.14.4.1 Location of the sort item in the record When placing a sort data item in a data entry, care should be taken to place the item at the end of the data entry. This is because when determining the proper location in the chain for the data entry being added, Image will first compare the sort data item values if they are equal, then Image will use any data items in the data entry that are specified after the sort data item as sub sort items. Needless to say the overhead for sub sort items can be very costly, especially if they were specified unknowingly. 8.14.4.2 Adding entries in the chain in other than sort sequence When data entries are added to a detail that has a sorted chain, Image will start at the end of the chain examining each entry to find the correct location in the chain to add the new entry. If the detail entries are added in descending sorted sequence and no entries exist on the chain that have a sort item value greater than the sorted entries, the overhead of maintaining the sorted chain is minimal. If, however, the entries are added in random sequence, the amount of work in terms of CPU and disc IO's increases with each entry added to the chain. This can lead to batch jobs that run for days, and interactive response times measured in hours! 8.14.4.3 Making use of chronological sequencing If no sort item is specified, then Image will add entries to the chain in chronological order. So if you want the entries on a sorted, then simply add them to the chain in sorted sequence. Beware that some data base restructuring tools do not preserve the chronological sequencing of detail chains when they unload and reload. 8.14.5 Using sort intrinsics Most sorted chains can be replaced by simply sorting the entries at the time at which they need to be presented in sorted order. More often than not this is in a batch job anyway. MPE comes with a SORT and MERGE utility program which allows programmatic calls to sort and merge data entries. These are much more flexible and powerful than using Image sorted chains. 8.14.6 Master hashing performance profiles All key types do not hash the same. The design decision on the data type and content of a master search item is usually based on the assumption that hashing is uniformly efficient no matter what the key type or content. The truth is that hashing performance varies greatly for different data types, and even for the same data types with different data contents. 8.14.6.1 Short character keys Short character keys are probably the most widely used key data type. Short character are those that range from X10/U10 to X2/U2. This type of key exhibits especially poor hashing performance when the value of the key occurs in sequence. For example, date keys are typically X6, and the value of date is MMDDYY. Storing the content in this manner is going to cause problems for two reasons, one is that to sort on date you would have to specify two sort fields, YY and MMDD. The second is that when hashing, Image starts with the right most bytes and uses these to flip and twiddle the bits going to the left. In this case the YY part of the key will be used every time and we'll wind up with a large percentage of entries on a secondary chain. So if your data base has these short character keys, you can do two things to improve the hashing performance. One is to make the right most bits as variable as possible, and the other is to size the short character masters at 45% to 50% full. 8.14.6.2 Sequential numeric keys Image uses a different method to hash numeric key values data types of I,J,K, and R. The right most 31 bit of the key item value is loaded into a double integer, reduced by 1, divided by the capacity of the master data set, then the remainder is multiplied by the capacity of the master data set and incremented by 1 to finally arrive at a primary relative number. So, if the capacity were 100 and the key value was 50, then the relative record number would be calculated as follows: 1. Subtract 1 from the right most 31 bits 50 - 1 = 49 2. Divide by the capacity of the master 49 / 100 = 0.49 3. Multiply the remainder by the capacity .49 * 100 = 49 4. Add 1 to the result of step 3 49 + 1 = 50 For numeric key types, if the left most 31 bits is less than the capacity of the master, then the primary relative record address will be the value of those 31 bits. Let's do an example of a key value of 231 with the capacity being 100. 1. Subtract 1 from the right most 31 bits 231 - 1 = 230 2. Divide by the capacity of the master 230 / 100 = 0.23 3. Multiply the remainder by the capacity .23 * 100 = 23 4. Add 1 to the result of step 3 23 + 1 = 24 When using numeric data types, you are in direct control of record placement via the content of the key value. This can be a method of guaranteeing that no secondary entries ever exist, if and only if the key value is between 1 and the capacity of the data set. However, if the key values are greater than the capacity of the data set, then you could find yourself in a situation where every entry being added is a secondary entry. 8.14.6.3 Modeling hashing performance In either case, whether you go with character or integer keys, it is always a good idea to load your proposed data base up with good test data to see what the occurrence of secondaries will be. Then you can alter the type of the key, the content of the key value, or the capacity to see if you can improve on the hashing profiles. Secondaries are only of consideration if the chains point to other blocks. If the chain is wholly contained in the current block then the effort to process the chain is minimal. However, if the chain crossed many blocks, then multiple IO and deblocking operations are required. This leads back to the discussion of the data content of masters. Remember that the more data items contained in the master record, the fewer data entries per block. Thus, masters that have data items generally have more secondary chain pointers to other blocks. 8.15 Capacity planning NOTE: THIS SECTION NEEDS TO BE REDONE IN LIGHT OF MDX & DDX 8.15.1 The MDX/DDX trap The use of MDX,DDX and utilities for altering data set capacities make it very easy to forget that some day this data set will get corrupted, and if logging isn't being used, the only choice will be an unload and a reload of at least the data sets involved. THERE IS A BIG DIFFERENCE BETWEEN CHANGING THE CAPACITY OF A DETAIL DATA SET AND UNLOADING AND RELOADING IT. So whenever capacities are changed, the amount of time to DBunload and DBreload should be calculated. If the time required is not acceptable, then your only alternative is transaction logging. 8.15.2 Hashed and serial read performance issues Increasing the capacity of a master data set will generally improve the overall hashing performance, unless you pick a bad value for the new capacity. There is no method for accurately predicting the hashing performance for a large master data set, but even numbers and multiples of ten are surefire losers. When increasing the capacity of data sets, be aware that applications that search serially will take longer. This will affect master serial searches immediately and detail serial searches as new entries are added. 8.16.2 The marginal benefits of repacking Reloading or packing detail data sets is a way of reducing the number of IO's required to retrieve a detail chain. The performance improvement can be measured in the elongation factor before and after the reload or pack. This difference represents the average number of IO's per chained access saved. 8.16 Removed 8.17 Placement of data sets on disc drives With MPE/XL mapped file access, the emphasis on file placement has waned. However, for IO intensive systems, or systems with a large number of disc drives, disc management is still a big problem. The following is a brief discussion of how to balance IO across disc drives, which data sets to separate, and the effects of using a separate volume for Image logging. 8.17.1 Balancing disc IO HP disc drives can perform a maximum of twenty to twenty-five IO's per second, and as we've seen, the DBput and DBdelete intrinsics can generate a large number of IO's. The best method for balancing IO is to audit the IO requirements for the entire system, building a list of busiest files. This data can be gathered by using the contributed library program FILERPT. This list can then be used to separate the busiest files onto separate disc drives. On MPE/XL systems, the primary concerns are the larger randomly accessed (Image masters) files, as the smaller files will be wholly mapped into memory. 8.17.2 Separating master and details In general, related master and detail data sets should be placed on separate disc drives. This is because they create head contention and thus slow down the DBput, DBfind/DBget, or DBdelete. This can be a very complicated task for a system with many data bases and many data sets. Again, we are only concerned with the most active data sets. 8.17.3 Separate volumes for disc logging Using up a disc drive for logging Image transactions to disc is usually a big waste and very much overkill. Simply taking care to locate the logfile on a separate disc from the data base data sets and documenting the label location of the logfile is going to be sufficient for most sites. The disc, if not used as a separate volume, could add significantly to the overall throughput of the system.