第2个回答 2008-07-31
Microsoft Excel File Format
Microsoft Office Technical Documentation Group
Created: November, 1995
Click to open or copy the BIFFVIEW project files
Click to open or copy the DUMPBIFF project files
The binary interchange file format (BIFF) is the file format in which Microsoft Excel workbooks are saved on disk. Microsoft Excel versions 5.0 and later use compound files; this is the OLE 2 implementation of the Structured Storage Model standard. For more information on this technology, see the OLE 2 Programmer's Reference, Volume One, and Inside OLE 2, both published by Microsoft Press® and available from your local bookstore.
File Format Versions: BIFF5 and BIFF7
The BIFF7 file format of Microsoft Excel version 7.0 is nearly identical to the BIFF5 file format of Microsoft Excel version 5.0. There are some new records in BIFF7 and some records that have changed from BIFF5. This article documents both the BIFF5 and BIFF7 file formats.
The following tables describe the new records and changed records in BIFF7. For more information on the new and changed records, see the appropriate record description, later in this article.
New Records in BIFF7
Number
Record
1A5h
FILESHARING2
13Dh
TABID
EAh
TABIDCONF
Changed Records in BIFF7
Number
Record
Changes
9Eh
AUTOFILTER
The BIFF7 record includes several new fields to support the Top 10 AutoFilter feature.
5Bh
FILESHARING
The BIFF7 and BIFF5 records are identical, but the behavior of the record changes in BIFF7 when the workbook contains a shared list.
There are also several new streams in the BIFF7 file. The User Names and Revision Log streams support the new shared list feature in Microsoft Excel 7.0. The binary format of these streams is not documented.
The DocumentSummaryInformation (new to BIFF7) and SummaryInformation streams support the document properties available in Microsoft Excel 7.0, which are standardized across the Office 95 family of applications.
The Workbook Compound File
An OLE 2 compound file is essentially "a file system within a file." The compound file contains a hierarchical system of storages and streams. A storage is analogous to a directory, and a stream is analogous to a file in a directory. Each Microsoft Excel workbook is stored in a compound file, an example of which is shown in the following illustration. This file is a workbook that contains three sheets: a worksheet with a PivotTable, a Visual Basic module, and a chart.
If a workbook contains embedded objects, then the file will also contain storages written by the applications that created the objects. The PivotTable data cache storage and VBA PROJECT storage are not documented. The CompObj stream contains OLE 2 component object data, and the Summary Info stream contains the standardized file summary information such as title, subject, author, and so on.
The Book stream begins with a BOF record, and then contains workbook global records up to the first EOF. The workbook global section contains one BOUNDSHEET record for each sheet in the workbook. You can use the dt field (document type), the lbPlyPos field (stream position of the BOF record for the sheet), and the cch/rgch fields (sheet name as a byte-counted string) to quickly read selected sheets in the workbook.
Each sheet in the workbook is stored after the workbook global section, beginning with BOF and ending with EOF. If you read the file in a continuous stream (instead of using the BOUNDSHEET records), you can test the dt field of each BOF record to determine the sheet type.
Simple Save (New for BIFF7)
Microsoft Excel 7.0, uses a new simple save method from OLE 2, which was developed to increase performance. If a workbook contains no Visual Basic modules, no PivotTables, and no embedded objects on worksheets, then Microsoft Excel uses the simple save method.
When Microsoft Excel 7.0 saves a workbook using simple save, the streams in the file must be at least 4kbytes long. The OLE 2 code adds padding bytes to the streams to ensure that they are at least 4kbytes long. If you use a low-level binary viewer to examine the resulting file, you will see the padding bytes appearing as "garbage" at the end of the streams.
To find the actual end of the Book stream, you can increment a counter every time you read a BOF record and then decrement it every time you read an EOF record. When the counter reaches zero, then you have read the last EOF in the Book stream and you can ignore the rest of the bytes in that stream.
Other Microsoft Excel File Formats
Although chart records are written as part of the Book stream, they are documented in another article. Microsoft Excel creates several other files, some of which are documented. The workspace file is documented in a separate article. The toolbar file (.XLB extension in Microsoft Windows) is not documented.
This article contains BIFF documentation for Microsoft Excel versions 5.0 and 7.0 only. Earlier versions of BIFF documentation are available on the Development Library (published by the Microsoft Developer Network). The Library is a CD-based reference source for Windows-based developers. For more information about this service, contact Microsoft Developer Network via email (devnetwk@microsoft.com), via Compuserve (>INTERNET:devnetwk@microsoft.com), or call (800) 759-5474.
BIFF Record Information
Although different BIFF record types contain different information, every record has the same basic format. All BIFF records consist of the following three sections:
Record Number
This 16-bit word identifies the record. The hexadecimal value of the record number is included in parentheses in the heading of the record description. For example, the EOF record's heading appears in this article as "EOF: End of File (0Ah)."
Record Data Length
This 16-bit word equals the length of the following record data, in bytes. The record length depends on the type of data in the record. For example, the EOF record is always the same length, while a FORMULA record varies in length depending on the length of the formula itself.
Record Data
This is the portion of the record containing the actual data that describes the formula, window, object, and so on.
The format for all BIFF records is described in the following table.
Offset
Length (bytes)
Contents
0
2
Record number
2
2
Record data length
4
Variable
Record data
A BIFF record has a length limit of 2084 bytes, including the record type and record length fields. Therefore, the record data field must be no longer than 2080 bytes. A large data object has a parent record and then one or more CONTINUE records to store the data. For example, embedded bitmap graphic objects often use a parent IMDATA record and several CONTINUE records.
If a field (or a bit in a field) is marked "Reserved," then your application should treat the field or bit as a "don't-care" when you read or write the BIFF file. If a field (or bit in a field) is marked "Reserved; must be zero," then you must write zeros to the field or bit when you write a BIFF file.
Byte Swapping
Microsoft Excel BIFF files are transportable across the MS-DOS/Windows (Intel® 80x86), and Apple Macintosh (Motorola® 680x0) operating systems, among others. To support transportability, Microsoft Excel writes BIFF files in the 80x86 format, where the low-order byte of the word appears first in the file, followed by the high-order byte. For example, the BOF record consists of six 16-bit words:
0809 0008 0500 0005 096C 07C9
and it appears in a BIFF file as:
09 08 08 00 00 05 05 00 6C 09 C9 07
Whenever Microsoft Excel for the Macintosh reads or writes a BIFF file, it calls a function that swaps the high- and low-order bytes of every 16-bit word in every record in the file. For 32-bit longs, the bytes in each 16-bit word are swapped first, and then the two 16-bit words are swapped. Be sure to include a byte-swap function in any custom BIFF utility you write for the Macintosh.
BIFF Versions
You can determine the BIFF version (and infer the Microsoft Excel version that wrote the file) by testing the high-order byte of the record number of the BOF record. For example, the first BOF record in a BIFF5/BIFF7 workbook file is:
09 08 08 00 00 05 05 00 6C 09 C9 07 (08h in high-order byte)
while the BOF record in a BIFF4 chart file is:
09 04 06 00 00 00 20 00 00 00 (04h in high-order byte)
In BIFF4 and earlier, some records (other than the BOF record) contained version information in the high-order byte of their record numbers. This proved to be redundant, so for BIFF5 and later, Microsoft Excel determines the BIFF version by reading the high-order byte of the record number of the BOF record.
Beginning with this version of the Microsoft Excel Developer's Kit, record numbers are documented as full 16-bit numbers instead of 8-bit numbers with a high-order version number byte. For example, the DIMENSIONS record is now documented as record number 200h instead of 00h. This causes the records to appear in a different order in the following table, "BIFF Records: Record Number Order."
Indexing in BIFF Records
In BIFF files, rows and columns are always stored zero-based, rather than with an offset of one as they appear in a sheet. For example, cell A1 is stored as row 0 (rw = 00h), column 0 (col = 00h); cell B3 is row 2 (rw = 02h), column 1 (col = 01h), and so on.
In most cases, you can use the variable-naming conventions in this article to determine if a variable is zero-based. Variable names that begin with the letter i are usually indexes, which are zero-based. For example, the variable ixfe occurs in every cell record; it is a zero-based index into the table of XF records. Variable names that begin with the letter c are usually counts, which are one-based. For example, many records contain a cch, which is a count of characters in the following string.
Undefined Cells in the Sheet
To reduce file size, cells that don't contain values or formulas and aren't referenced by formulas in any other cell are considered to be undefined cells. Such undefined cells don't appear in the BIFF file.
For example, if a worksheet has a value in cell A3, and the formula =A3+A4 in cell B10, then the only defined cells on the worksheet are A3, A4, and B10. No other cells need to exist.
Using this technique, entire rows can be undefined if they have no defined cells in them. In the preceding example, only rows 3, 4, and 10 are defined, so the file contains only three ROW records.
Cell Records
The term "cell records" refers to the BIFF record types that actually contain cell data. Cell records that appear in BIFF5/BIFF7 files are shown in the following table.
Record
Contents
ARRAY
An array-entered formula
BLANK
An empty cell
BOOLERR
A Boolean or error value
FORMULA
A cell formula, stored as parse tokens
LABEL
A string constant
NUMBER
An IEEE floating-point number
MULBLANK
Multiple empty cells (new to BIFF5)
MULRK
Multiple RK numbers (new to BIFF5)
RK
An RK number
RSTRING
Cell with character formatting
SHRFMLA
A shared formula (new to BIFF5)
STRING
A string that represents the result of a formula
Microsoft Excel stores cell records in blocks that have at most 32 rows. Each row that contains cell records has a corresponding ROW record in the block, and each block contains a DBCELL record at the end of the block. For more information about row blocks and about optimizing your code when searching for cell records, see Finding Cell Records in BIFF Files.
BIFF Record Order
BIFF record order has changed as the file format has evolved. The simplest way to determine BIFF record order is to create a workbook in Microsoft Excel and then use the BiffView utility to examine the record order.
BIFF Utilities
There are two BIFF utilities in the Microsoft Solutions Development Kit. Use the BiffView utility (BIFFVIEW.EXE, a Windows program) to examine BIFF5/BIFF7 workbook files. Use DUMPBIFF.EXE (an MS-DOS program) to examine the BIFF5/BIFF7 workspace file.
BIFF Records: Alphabetical Order
Number
Record
22
1904: 1904 Date System
87
ADDIN: Workbook Is an Add-in Macro
C2
ADDMENU: Menu Addition
221
ARRAY: Array-Entered Formula
9E
AUTOFILTER: AutoFilter Data
9D
AUTOFILTERINFO: Drop-Down Arrow Count
40
BACKUP: Save Backup Version of the File
201
BLANK: Cell Value, Blank Cell
809
BOF: Beginning of File
DA
BOOKBOOL: Workbook Option Flag
205
BOOLERR: Cell Value, Boolean or Error
29
BOTTOMMARGIN: Bottom Margin Measurement
85
BOUNDSHEET: Sheet Information
0C
CALCCOUNT: Iteration Count
0D
CALCMODE: Calculation Mode
42
CODEPAGE: Default Code Page
7D
COLINFO: Column Formatting Information
3C
CONTINUE: Continues Long Records
A9
COORDLIST: Polygon Object Vertex Coordinates
8C
COUNTRY: Default Country/Region and WIN.INI Country/Region
5A
CRN: Nonresident Operands
D7
DBCELL: Stream Offsets
50
DCON: Data Consolidation Information
52
DCONNAME: Data Consolidation Named References
51
DCONREF: Data Consolidation References
225
DEFAULTROWHEIGHT: Default Row Height
55
DEFCOLWIDTH: Default Width for Columns
C3
DELMENU: Menu Deletion
10
DELTA: Iteration Increment
200
DIMENSIONS: Cell Table Size
B8
DOCROUTE: Routing Slip Information
88
EDG: Edition Globals
0A
EOF: End of File
16
EXTERNCOUNT: Number of External References
223
EXTERNNAME: Externally Referenced Name
17
EXTERNSHEET: External Reference
2F
FILEPASS: File Is Password-Protected
5B
FILESHARING: File-Sharing Information
1A5h
FILESHARING2: File-Sharing Info for Shared Lists (BIFF7)
9B
FILTERMODE: Sheet Contains Filtered List
9C
FNGROUPCOUNT: Built-in Function Group Count
9A
FNGROUPNAME: Function Group Name
231
FONT: Font Description
15
FOOTER: Print Footer on Each Page
41E
FORMAT: Number Format
406
FORMULA: Cell Formula
AB
GCW: Global Column Width Flags
82
GRIDSET: State Change of Gridlines Option
80
GUTS: Size of Row and Column Gutters
83
HCENTER: Center Between Horizontal Margins
14
HEADER: Print Header on Each Page
8D
HIDEOBJ: Object Display Options
1B
HORIZONTALPAGEBREAKS: Explicit Row Page Breaks
7F
IMDATA: Image Data
20B
INDEX: Index Record
E2
INTERFACEEND: End of User Interface Records
E1
INTERFACEHDR: Beginning of User Interface Records
11
ITERATION: Iteration Mode
204
LABEL: Cell Value, String Constant
26
LEFTMARGIN: Left Margin Measurement
95
LHNGRAPH: Named Graph Information
94
LHRECORD: .WK? File Conversion Information
98
LPR: Sheet Was Printed Using LINE.PRINT( )
C1
MMS: ADDMENU/DELMENU Record Group Count
BE
MULBLANK: Multiple Blank Cells
BD
MULRK: Multiple RK Cells
218
NAME: Defined Name
1C
NOTE: Note Associated with a Cell
203
NUMBER: Cell Value, Floating-Point Number
5D
OBJ: Describes a Graphic Object
63
OBJPROTECT: Objects Are Protected
D3
OBPROJ: Visual Basic Project
DE
OLESIZE: Size of OLE Object
92
PALETTE: Color Palette Definition
41
PANE: Number of Panes and Their Position
13
PASSWORD: Protection Password
4D
PLS: Environment-Specific Print Record
0E
PRECISION: Precision
2B
PRINTGRIDLINES: Print Gridlines Flag
2A
PRINTHEADERS: Print Row/Column Labels
12
PROTECT: Protection Flag
89
PUB: Publisher
B9
RECIPNAME: Recipient Name
0F
REFMODE: Reference Mode
27
RIGHTMARGIN: Right Margin Measurement
27E
RK: Cell Value, RK Number
208
ROW: Describes a Row
D6
RSTRING: Cell with Character Formatting
5F
SAVERECALC: Recalculate Before Save
AF
SCENARIO: Scenario Data
AE
SCENMAN: Scenario Output Data
DD
SCENPROTECT: Scenario Protection
A0
SCL: Window Zoom Magnification
1D
SELECTION: Current Selection
A1
SETUP: Page Setup
BC
SHRFMLA: Shared Formula
90
SORT: Sorting Options
96
SOUND: Sound Note
99
STANDARDWIDTH: Standard Column Width
207
STRING: String Value of a Formula
293
STYLE: Style Information
91
SUB: Subscriber
C5
SXDI: Data Item
DC
SXEXT: External Source Information
D5
SXIDSTM: Stream ID
B4
SXIVD: Row/Column Field IDs
B5
SXLI: Line Item Array
B6
SXPI: Page Item
CD
SXSTRING: String
D0
SXTBL: Multiple Consolidation Source Info
D2
SXTBPG: Page Item Indexes
D1
SXTBRGIITM: Page Item Name Count
B1
SXVD: View Fields
B2
SXVI: View Item
B0
SXVIEW: View Definition
E3
SXVS: View Source
13Dh
TABID: Sheet Tab Index Array (BIFF7)
EAh
TABIDCONF: Sheet Tab ID of Conflict History (BIFF7)
236
TABLE: Data Table
60
TEMPLATE: Workbook Is a Template
28
TOPMARGIN: Top Margin Measurement
DF
UDDESC: Description String for Chart Autoformat
5E
UNCALCED: Recalculation Status
84
VCENTER: Center Between Vertical Margins
1A
VERTICALPAGEBREAKS: Explicit Column Page Breaks
3D
WINDOW1: Window Information
23E
WINDOW2: Sheet Window Information
19
WINDOWPROTECT: Windows Are Protected
5C
WRITEACCESS: Write Access User Name
86
WRITEPROT: Workbook Is Write-Protected
81
WSBOOL: Additional Workspace Information
59
XCT: CRN Record Count
E0
XF: Extended Format