# # Gaspar Sinai Tokyo 1999-12-03 # Binary Map Formats Sn = n long 8 bit unsigned char string U1 = 8 bit unsigned integer U2 = 16 bit unsigned integer network orderccessing: LOW = INPUT%256 HIGH = INPUT/256 returning 0 means no mappping. It also means that 0 can not be mapped to or from. so is if (LOW < LOW_MIN || LOW > LOW_MAX || HIGH < HIGH_MIN || HIGH > HIGH_MAX) U4 = 32 bit unsigned integer network order I4 = 32 bit signed integer network order ARRAY16 = 16 bit unsigned integer network order compressed array The size of the array: (LOW_MAX - LOW_MIN + 1) * (HIGH_MAX = HIGH_MIN+1) Where: LOW_MIN = min (code[n]%256) LOW_MAX = max (code[n]%256) HIGH_MIN = min (code[n]/256) HIGH_MAX = max (code[n]/256) Accessing: LOW = INPUT%256 HIGH = INPUT/256 returning 0 means no mappping. It also means that 0 can not be mapped to or from. so is if (LOW < LOW_MIN || LOW > LOW_MAX || HIGH < HIGH_MIN || HIGH > HIGH_MAX) if there is mapping it can be ARRAY[(HIGH - HIGH_MIN) * (LOW_MAX - LOW_MIN+1) + (LOW-LOW_MIN)] 1 to 1 bmap =========== This umap allows 1 to 1 mapping and reverse mapping of unsigned shorts. S16: "YUDIT-UMAP 1.0" S32: "alias name" U2: SOFFSET start of data offset from beginning of file # # FREE AREA till SOFFSET # # Bounds of Local Code Array that maps to Unicode # Start of data SOFFSET U2: decode HIGH_MIN U2: decode HIGH_MAX U2: decode LOW_MIN U2: decode LOW_MAX # Bounds of Unicode Array that maps to Local Code U2: encode HIGH_MIN U2: encode HIGH_MAX U2: encode LOW_MIN U2: encode LOW_MAX ARRAY16: decode ARRAY16: encode <---end---> n to n bmap =========== This maps max 255 byte long string to a max 256 byte string. S16: "YUDIT-NtoN 1.0" S32: "alias name" U4 COMMENT_SIZE U1 COMMENT[COMMENT_SIZE] U4: MAP_TYPE - 0. undefined 1. kmap 2. fontmap 3. clustered kmap U4: MAP_SIZE - the number of maps in this coder U4: OFFSET[0] - the offsets pointing to CODE_AREAs ... note that we have one more... U4: OFFSET[MAP_SIZE] - the offsets pointing to END # # Start of CODE AREA. array rererences start here. # S32: "alias name" U4 COMMENT_SIZE U1 COMMENT[COMMENT_SIZE] U1 DECODE - bit 0 is 0 if decode, 1 if encode (reverse = from unicode) map U1 INPUT_BYTE_SIZE - This many bytes supposed to form an input word (hint) U1 OUTPUT_BYTE_SIZE- This many bytes supposed to form an output word (hint) U1 INPUT_BYTE_LENGTH - The size of the length indicator in data. 0,1,2 or 3 U1 OUTPUT_BYTE_LENGTH - The size of the length indicator in data 0,1,2 or 3 0=8bit, 1=16bit, 2=32bit, 3=64bit U4 STATE_MACHINE - Index to state machine if zero there is no state machine. State machine should come after code area. U4 SPARE - UNUSED. 0. U4 CODE_SIZE - The size of the struct map. U4 CODE_MAP[0] - points to the first element starting from DATA_AREA .. note that we have 1 more element in this array!!! U4 CODE_MAP[CODE_SIZE] - points to the end of last element # # DATA_AREA array references start here. # unpadded struct { Ui KEY_SIZE Ui SUB_SIZE # The size of elements macthed. U1 [KEY_SIZE] KEY Uo RESULT_SIZE U1 [RESULT_SIZE] RESULT U1 COMMENT SIZE # A max 255 byte comment. U1 [COMMENT_SIZE] COMMENT } [CODE_SIZE] # # State Machine (optional) # The integers point to next state inside this state machine. # If -1, reject. # Currently ther is no implementation for this yet. # STATE_MACHINE this can be added here or collectively at the end of # this file. U4[32] size - state machine size in 64 byte words. (states) U4[16] state0 U4[16] state1 [..] * Each state contains a nibble. (FB -> F one state, B another state.) * Each state is has an index of 30 bit integers pointing to next state or the matched value. * They can point to matched value. by having the upper 2 bits value: REJECT: 0 - points to nowhere MORE: 1 - points to STATE_MACHINE MATCH: 3 - points to CODE_MAP * The match more is not used. # # FREE AREA till OFFSET[1] #