exFAT Cluster Heap
Understanding of underlying mechanisms of data storage, organization and data recovery.
The cluster heap is a set of clusters which hold data in exFAT. It contains:
- Root Directory
- Files
- Directories
- Allocation Bitmap
- Up-case Table
The allocation status of clusters in cluster heap is tracked by Bitmap Allocation Table which itself located inside the cluster heap.
Allocation Bitmap
Allocation Bitmap keeps track of the allocation status of clusters. FAT does not serve this purpose as in FAT16/FAT32 file system. Allocation Bitmap consists of a number of 8 bit bytes which can be treated as a sequence of bits. Each bit in bitmap corresponds to a data cluster. If it has a value of 1, the cluster is occupied, if 0 - the cluster is free. The least significant bit of bitmap table refers to the first cluster, i.e. cluster 2.
Offset | Size | Description | Comments |
---|---|---|---|
0x00 | 1 | 1st byte | Clusters 2-9 |
0x01 | 1 | 2nd byte | Clusters 10-17 |
0x02 | 1 | 3rd byte | Clusters 18-25 |
… |
Bitmap allocation table resides in cluster heap and referred by Bitmap Directory entry in root directory.
In exFAT could be 2 Bitmap Allocation tables, otherwise there will be only one bitmap. The NumberOfFats field in Boot Sectors determines the number of valid Allocation Bitmap directory entries in the root directory and the number of Allocation Bitmaps.
Up-case Table
Up-case table contains data used for conversion from lower-case to upper-case characters. File Name Directory Entry uses Unicode characters and preserves case when storing file name. exFAT itself is case insensitive, so it needs to compare file names converted to the upper-case during search operations.
Normally Up-case table is located right after Bitmap Allocation table but can be placed anywhere is the cluster heap. It has a corresponding primary critical directory entry in the root directory.
Up-case Table is an array of Unicode characters, an index of which represents the Unicode characters to be up-cased and the value is the target up-cased character. The Up-case Table shall contain at least 128 mandatory Unicode mappings. If implementation supports only mandatory 128 characters it may ignore the rest of Up-case Table. When up-casing file names such implementation shall up-case only characters from the mandatory 128 characters set and leave other characters intact. When comparing file names which are different only by characters in non-mandatory set, those file names shall be treated as equal.
Index | Value | Comments |
---|---|---|
0x0000 | 0x0000 | |
0x0001 | 0x0001 | |
0x0002 | 0x0002 | |
… | … | .. |
0x0041 | 0x0041 | ‘A’ is mapped into itself (identity mapping) |
0x0042 | 0x0042 | ‘B’ is mapped into itself |
.. | .. | .. |
0x061 | 0x041 | ‘a’ is mapped into ‘A’ (non-identity mapping) |
0x062 | 0x0042 | ‘b’ is mapped into ‘B’ |
.. | .. | .. |
Up-case Table can be written in compressed format where the series of identity mappings is represented with 0xFFFF followed by the number of identity mappings.
Mandatory First 128 Up-case Table Entries Index | Table Entries ________________________________________________________________________________________ 0000 - 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 - 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B 001C 001D 001E 001F 0020 - 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 002A 002B 002C 002D 002E 002F 0030 - 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 003A 003B 003C 003D 003E 003F 0040 - 0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 004A 004B 004C 004D 004E 004F 0050 - 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 005A 005B 005C 005D 005E 005F 0060 - 0060 0041 0042 0043 0044 0045 0046 0047 0048 0049 004A 004B 004C 004D 004E 004F 0070 - 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 005A 007B 007C 007D 007E 007F
Non-identity mappings are highlighted in bold.
Mandatory First 128 Up-case Table Entries in compressed format Index | Table Entries ________________________________________________________________________________________ 0000 - FFFF 0061 0041 0042 0043 0044 0045 0046 0047 0048 0049 004A 004B 004C 004D 004E 0010 - 004F 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 005A FFFF 0005
The first highlighted group describes that first 0x0061 characters (0x0000-0x0060) have identity mappings. The next character after it (0x0061) maps to 0x0041 etc. until the next compressed group is encountered.
The first highlighted in bold group describes that first 0x0061 characters (0x0000-0x0060) have identity mappings. The next character after it (0x0061) maps to 0x0041 etc. until the next compressed group is encountered.