Package bw :: Package poseidon :: Module diskindex
[frames] | no frames]

Module diskindex

This file contains code related to on-disk indexes which are used for the .i files created in the db/[db_name]/data folder.

The DiskIndex class is an abstract class which has most methods raise NotImplementedError. The OceanIndex, BSDDiskIndex and SeaIndex classes extend the DiskIndex class and provide custom implementations of the following methods:

The following table explains which persistence engines use which index.

Indexes are periodically cleaned. At this point, they should also be committed.

Basic disk format is described as follows:

2 byte signature:

       {'O', 'I'}

4 byte version number (network byte order - big endian)

currently this is 2

256 long long numbers (8 bytes each) in network byte order

N-th entry of this table records the number of items in the corresponding pack, the first byte of whose uuid is less than or equal to N. This is called the 'first-level fan-out' table. Note: This implies that the maximum number of items which can be stored in this diskindex is 2**64 (which should be enough for some time)

Sorted list of uuids.

Each id is 16 bytes. They are saved without any additional information or delimiters so the cache footprint can be minimized. Currently the search algorithm is linear (but avoids iterating through every byte). We can improve this to binary search later without affecting the structure.

Corresponding list of items The first item corresponds to the first uuid in the uuid table and so on. The structure of each item is different for OceanIndex and SeaIndex

OceanIndex:

       "!dQL"   # (time = 8 bytes,
                               offset = 8 bytes,
                               length = 4 bytes
                               ) = 20 bytes

OceanIndex items needs to store the transaction time, offset of the latest version of the data and length of the data. Given this information, the header of the data entry doesn't need to be read at all (it is there for redundancy and index recovery.)

SeaIndex:

       "!dQQQLH" # (time = 8 bytes,
                               offset = 8 bytes, 
                               previous_tablespace_offset = 8 bytes,
                               total_rowcount = 8 bytes,
                               last_table_rowcount = 4 bytes,
                               clustersize = 2 bytes
                               ) = 38 bytes

SeaIndex items need to store the transaction_time, offset of the last tablespace, offset of the second-to-last tablespace - which is total number of rows which is required during reads to audit the data read, last_table_rowcount which is needed to be able to ignore null bytes in a table which are unused, and the clustersize of the last tablespace which is calculated by raising 2 to this power. e.g. 2**clustersize.


Date: 15 Feb 2008

Author: Prateek Sureka

Copyright: Copyright 2005-2008 Brainwave Corp.

License: Proprietary

Classes
  DiskIndex
This is the top-level abstract class which defines the interface for all DiskIndexes in use.
  BSDDiskIndex
The BSDDiskIndex uses bsddb as the underlying disk-index implementation.
  OceanIndex
This is the main implementation of OceanIndex new in version 2.
  SeaIndex
This class is exactly the same as OceanIndex.
  TestDiskIndex
Unittest to audit bytes written by OceanIndex and SeaIndex

Imports: bsddb, deque, defaultdict, os, sys, dropwhile, izip, islice, chain, marshal, operator, struct, shutil, Struct, time, types, threading, uuid, Lock, IOCorruptionError, VersionMismatchError, unittest, random