P A R S E L I B --------------- PLB stands for parselib. It processes OMF object and library files and produces a pattern file. Command line: parselib [-sw or @file] input-file pattern-file The command line switches may be placed in an indirect file - one switch per line. The input file is an object file or a library file. If the extension is omitted, "LIB" extension is assumed. The output file is a pattern file. Its default extension is "PAT". A pattern file is a simple text file. Each function is represented by one line (warning: the lines may be very-very long, tens of kilobytes, so don't edit pattern files with a text editor). Format of this file is described in the PAT.TXT file. Usually plb is launched without switches: plb cl1 borland will take "cl2.lib" as input and produce "borland.pat" file. You may use the -a switch to append to the output file: plb -a cl2 borland will append patterns of functions from "cl2.lib" to "borland.pat" The output file must exist if the -a switch is used. Description of switches ----------------------- -a Append to the output file. The output file must exist and its last line must be '---' -c... If the input file contains the "ctype" array, you may use this switch to allow parselib to detect the "ctype" array and produce a special record in the pattern file for it. "Ctype" array requires special handling because it resides in data segment and normally would be skipped by parselib. You should specify ctype array name: -cctype_name Use this switch only if you are processing a non-standard C library. -d Turn on debugging. Displays lots of debugging information. -e Skip unnamed functions. Experimental switch. I don't recommend to use it - it is better to recognize even unnamed functions rather than silently skip them. -i The input file is an IBM OMF file. By default parselib assumes the input file to be a MS OMF file. -l... This switch is required only for startup object modules. It should not be specified for regular libraries. This switch contains information how to proceed if the startup module is found in the executable file. It allows you to specify names of signature files to be applied automatically. Signature file names are separated by ':'. Optional signature files are specified as l=signame Also, you may specify the OS type and the application type. Format of this switch is signature names and directives spearated by colons ':', for example: o=type:a=type:l=lib1/lib2/lib3:m=hints:s=off/signame o=type specifies OS type if the startup module is found. Valid values (sigmake -ho displays them): 1 MS DOS 2 MS Windows 4 OS/2 8 Netware a=type specifies application type if the startup module is found in the executable file. Valid values are combination of the following bits (sigmake -ha displays them): 0001 console 0002 graphics 0004 program (EXE) 0008 library (DLL) 0010 driver (VxD) 0020 Single-threaded 0040 Multi-threaded 0080 16bit 0100 32bit When in question, don't specify a bit. l=lib1/lib2/lib3... Optional signatures. This directive may be omitted. An optional signature file is not applied automatically, but it will be marked with an asterisk in the list of signature files. m=hints A simple program to find main() function. Format of hints is decribed below. This directive may be omitted. s=off/signame Reference to secondary startup signature. Presence of this directive means that IDA can't make decision based on the recognition of one startup module. IDA needs to make additional checks to select proper signature file: these additional checks are in the secondary signature file. The secondary signature file will be applied to an address referenced by an instruction at start+off (off is hexadecimal). This directive must be the last item in the -l switch. This directive may be omitted. S=off/signame Almost the same thing as lowercase 's'. The difference between these switches is that the uppercase 'S' uses the start+off address as it is while the lowecase 's' tries to get the address referenced by the instruction. The start address mentioned in this switches is either the address where the signature was applied to (usually the entry point of the program) or the address after applying the main() hints (if they were specified before) i=idcfile An IDC file to invoke. The IDC file will be searched in the IDC subdirectory of IDA. -m... The name of the library module. If this switch is specified, parselib will process only the specified module, not the whole library. This switch is mainly used for startup modules. -n... The name of the startup function. If this switch is specified, parselib will start pattern at the specified function, not at the module start. Signatures are applied to the entry point of an executable file and therefore the patterns should start at entry point too. -o... The offset of the startup entry point (hex). The pattern will start at it. This is an alternative way to specify the start of a startup pattern. Sometimes the entry point has no name and in this case we are forced to use offsets instead of names. -p## Pattern length (default: 32) Never use this switch, it is for debugging only. -v Verbose output -w... This switch has the same meaning as -c switch. The only difference is that ctype array has 2-byte elements. -z Loosen input file format checks. Some library modules have erroneous structure. This switch allows parselib to handle them. Format of hints used to find main() function -------------------------------------------- Hints are arranged as a simple program encoded in a text string. The string is processed from the left to the right. For the ease of explanation, let's imagine a virtual machine with the following registers: PTR - contains a pointer to hints string. initialized with the start of the hints string. ADR - contains the current linear address. initialized with the executable program entry point address. MAIN - contains a possible main() address. initialized with a bad address (i.e. the main() address in not known) MAINNAME- contains a possible main() function name. SAFE - contains a 'safe' address. not initialized. FLAG - contains 1/0. Initialized with 0. The virtual machine takes a symbol at PTR, interprets it accordingly and moves PTR to the next symbol. The execution is stopped when one of the following conditions reached: - the end of the string is reached. The address of the main() function is in MAIN (unless it still contains the bad address) - PTR points to a '/' symbol. It means that the main() function is found at ADR. - illegal symbol at PTR is encountered. Elements of hints string (spaces are inserted for readibility only. they should not be present in the program string): + ADR <- ADR + off. off is a hexadecimal number - ADR <- ADR - off. off is a hexadecimal number ! make instruction at ADR. stop execution if not possible to create instruction (or rollback safe execution) #2 make 2-byte data item at ADR stop execution if not possible to create instruction (or rollback safe execution) #4 make 4-byte data item at ADR stop execution if not possible to create instruction (or rollback safe execution) & follow data reference (ADR <- dref(ADR)) For example, if instruction at ADR is ADR: push offset somedata then ADR <- address of somedata if the current instruction at ADR doesn't refer to data, then stop execution or rollback safe execution. ^ follow code reference (ADR <- cref(ADR)) For example, if instruction at ADR is ADR: call somefunc then ADR <- address of somefunc if the current instruction at ADR doesn't refer to code, then stop execution or rollback safe execution. *0c *0d *1c *1d make offset at ADR. general format is * where opnum (operand number) is '0' or '1', type is 'c' for cs or 'd' for ds. / stop execution - we have found main() function. It is at ADR. Its name follows '/' sign. If the name is not specified, its taken as '_main'. ? ... ; Conditional. Test a byte at ADR. If it is equal to (hexadecimal), then continue execution. Otherwise skip ... part and jump to position after ';'. The ellipsis ... represents a sequence of any other symbols here. Conditionals can't be included in each other. ~ / <+off> ~ ... ; Apply a signature file at ADR-. If the specified is found at ADR, then continue execution. Otherwise jump to execution position after ';'. sigfile - name of signature file to apply. default: first signature file specified in -l switch if sigfile == "-" then no signature file is applied, only the is tested. off - offset from ADR. Must be hexadecimal 4-digit number preceded by + sign. default: 0 funcname - name of function to compare. default: WINMAIN For example, the shortest form is: ~/~ ... ; This will apply the first signature to ADR and test a name appeared at ADR - it should be equal to WINMAIN. [mainname] MAIN <- ADR MAINNAME <- mainname Remember possible main() function address and name. Default main() name is WINMAIN. ( ... ) Switch to safe mode of execution. In this mode the execution is not stopped if something went wrong (can't convert to instruction, for example). In this case we jump to symbol after ')' and set FLAG to 0. Otherwise (if everything went ok), set FLAG to 1 when PTR is at ')'. ?? ... ; Test FLAG. If it is set (equal to 1), then continue exeuction. Otherwise jump to symbol after ';'. Conditionals can't be included in each other. @sigfile@ plan to apply a signature file Conditional semicolons (';') may be omitted. Examples -------- Please note that I give examples of most sophisticated usage of -l switch. Usually you don't need it. ------------------------- plb -a -lo=1:a=84:l=bc31tvd/bc31cls:bc31rtd:m=+EF^/ bcc\1.01\C0C.OBJ exe_bc31 input file: bcc\1.01\C0C.OBJ output file: exe_bc31.pat the output file should exist. we will append to it. -l switch: OS type is MS DOS (o=1) Application: 16 bit program (a=84) Optional signatures: bc31tvd.sig bc31cls.sig Automatically apply: bc31rtd.sig main() hints: add 0xEF to entry point of executable follow code reference (there is 'call' instruction there) main() function is here, its name is _main ------------------------- echo -lo=2:a=84:bh16rwin:l=bh16cls/bh16owl/bh16ocf/bh16dbe>bh.tmp plb -a @bh.tmp -lm=+AF^[]~/~+16^/ C0WC.OBJ ne_bh.pat input file: C0WC.OBJ output file: ne_bh.pat the output file should exist. we will append to it. -l switch: OS type is MS Windows (o=2) Application: 16 bit program (a=84) Automatically apply: bh16rwin Optional signatures: bh16cls bh16owl bh16ocf bh16dbe main() hints: +AF add 0xAF to entry point of executable ^ follow code reference (there is 'call' instruction there) [] remember the current address as possible WINMAIN address ~/~ apply bc16rwin.sig to the current address. Test for WINMAIN name. If don't match, then stop - WINMAIN is here (because we saved it with [] operator). If name matches, then continue. (it is likely that EasyWin program is here) +16 add 16 to the current address (ADR) ^ follow the code reference (there is a 'call' instruction there) / main() function is here, its name is _main -------------------------