Interface ( BeeDict : BeeStorage : BeeIndex ) Examples : Structure : Support : Download : Copyright & License : History : Home | Version 2.0.2 |
XXX This documentation is unfinished... the software itself has been in use for quite a while, but the documentation still needs to be completed. For now, all I can offer you is to read the code which is well documented.
mxBeeBase is a high performance construction kit for disk
based indexed databases. It offers components which you can
plug together to easily build your own custom mid-sized
databases (the current size limit is
sizeof(long)
which gives you an address range
of around 2GB on 32-bit platforms).
The two basic building blocks in mxBeeBase are storage and index. Storage is implemented as variable record length data storage with integrated data protection features, automatic data recovery and locking for multi process access. Indexes use a high performance optimized B+Tree implementation built on top of Thomas Niemann's Cookbook B+Tree implementation.
Note: Due to the file locking mechanism currently used in mxBeeBase, the package will only work on systems which support symbolic links, e.g. Windows does not support symbolic links and therefore parts of mxBeeBase don't work on that platform.
The BeeDict module provides two high level on-disk dictionary implementations. The first (BeeDict) can work with arbitrary hashable key objects, while the second (BeeStringDict) uses limited sized strings as basis providing slightly better performance. Both variants need pickleable Python objects as keys and values.
Data transfer to and from the dictionaries is done in the
same way as for in-memory dictionaries, e.g. d['key']
= 1; print d['key']; del d['key]
, so usage should be
transparent to Python programs using either in-memory or
on-disk dictionaries. Not all dictionary methods are
understood though.
BeeDict objects are on-disk dictionaries which use a hash-to-address index. Both Keys and values must be pickleable and can have arbitrary size (keys shouldn't be too long though); keys have to be hashable.
Hash collisions are treated by sequential reads of all records with the same hash value and testing for equality of keys. This can be expensive !
BeeDicts use a BeeStorage.BeeKeyValueStorage instance as storage object and a BeeIndex.BeeIntegerIndex instance as index object.
BeeDict objects are constructed using:
BeeDict(name,min_recordsize=0,readonly=0,recover=0,
autocommit=0,validate=0)
name
as basename for
the data and index files. Two files will be created:
name.dat and name.idx.
min_recordsize
is passed to the BeeStorage
as indicator of the minimum size for data
records. readonly
can be set to true to
open the files in read-only mode, preventing any disk
modifications.
To open the dictionary in recovery mode, pass a keyword
recover=1
. Then run .recover()
and reopen using the normal settings. The
AutoRecover()
wrapper can take care of this
action for you automatically.
If autocommit
is true the cache control
will do an automatic .commit()
whenever the
transaction log overflows.
If validate
is true, the dictionary will
run a validation check after having successfully opened
storage and index. RecreateIndexError
or
RecoverError
exceptions could be raised in
case inconsistencies are found.
BeeDict objects have the following methods:
flush()
close()
This does not issue a .commit()
, so the
current transaction is rolled back.
commit()
rollback()
changed()
has_key()
Successfully found items are put in the cache for fast subsequent access.
get(key,default=None)
This first tries to read the item from cache and reverts to the disk storage if it is not found.
cursor(key,default=None)
In case the key is not found, default is returned instead.
Note that cursors operate with the data on disk meaning that any uncommitted changes will not be seen by the cursor.
garbage()
This amount would be freed if .collect() were run.
collect()
Storage collection can only be done for writeable dictionaries and then only if the current transaction does not contain any pending changes.
This can take a while depending on the size of the dictionary.
recover()
keys()
The method does not load any data into the cache, but does take notice of uncommitted changes.
values()
The method does not load any data into the cache, but does take notice of uncommitted changes.
items()
The method does not load any data into the cache, but does take notice of uncommitted changes.
BeeDict objects have the following read-only attributes:
name
closed
readonly
autocommit
FirstKey
LastKey
BeeDickCursor objects are intended to iterate over the database one item at a time without the need to read all keys. You can then read/write to the current cursor position and thus modify the dictionary in place.
Note that modifying the targetted dictionary while using a cursor can cause the cursor to skip new entries or fail due to deleted items. Especially deleting the key to which the cursor currently points can cause errors to be raised. In all other cases, the cursor will be repositioned.
BeeDictCursor objects are constructed using the BeeDict
.cursor()
method.
BeeDictCursor objects have the following methods:
position(key,value=None)
key may also be FirstKey or LastKey (in which case value has to be None).
next()
Returns true on success, false if the end-of-data has been reached.
prev()
Returns true on success, false if the end-of-data has been reached.
read()
read_key()
write()
The new data is not written to disk until the dictionary's current transaction is committed.
BeeDictCursor don't have any useful attributes. Use the instance methods to query the key and value objects.
BeeStringDict objects are on-disk dictionaries which use a limited size string to address index. Values must be pickleable and can have arbitrary size.
Since hash collisions cannot occur this dictionary type may have some performance advantages over the standard BeeDict dictionary.
BeeStringDict objects are constructed using:
BeeStringDict(name,keysize=10,min_recordsize=0,readonly=0,
recover=0,autocommit=0,validate=0)
name
as basename for
the data and index files. Two files will be created:
name.dat and name.idx.
keysize
gives the maximal size of the
strings used as index keys. min_recordsize
is passed to the BeeStorage as indicator of the minimum
size for data records. readonly
can be set
to true to open the files in read-only mode, preventing
any disk modifications.
To open the dictionary in recovery mode, pass a keyword
recover=1
. Then run .recover()
and reopen using the normal settings. The
AutoRecover()
wrapper can take care of this
action for you automatically.
If autocommit
is true the cache control
will do an automatic .commit()
whenever the
transaction log overflows.
If validate
is true, the dictionary will
run a validation check after having successfully opened
storage and index. RecreateIndexError
or
RecoverError
exceptions could be raised in
case inconsistencies are found.
Note that the keysize is currently not stored in the dictionary itself -- you'll have to store this information in some other form. This may change in future versions.
BeeStringDict objects have the following methods:
commit()
rollback()
changed()
has_key()
Successfully found items are put in the cache for fast subsequent access.
get(key,default=None)
This first tries to read the item from cache and reverts to the disk storage if it is not found.
cursor(key,default=None)
In case the key is not found, default is returned instead.
Note that cursors operate with the data on disk meaning that any uncommitted changes will not be seen by the cursor.
garbage()
This amount would be freed if .collect() were run.
collect()
Storage collection can only be done for writeable dictionaries and then only if the current transaction does not contain any pending changes.
This can take a while depending on the size of the dictionary.
recover()
keys()
The method will raise an error if there are uncommitted changes pending. Output is sorted ascending according to keys.
values()
The method will raise an error if there are uncommitted changes pending. Output is sorted ascending according to keys.
items()
The method will raise an error if there are uncommitted changes pending. Output is sorted ascending according to keys.
BeeDict objects have the following read-only attributes:
name
closed
readonly
autocommit
FirstKey
LastKey
BeeDickCursor objects are intended to iterate over the database one item at a time without the need to read all keys. You can then read/write to the current cursor position and thus modify the dictionary in place.
Note that modifying the targetted dictionary while using a cursor can cause the cursor to skip new entries or fail due to deleted items. Especially deleting the key to which the cursor currently points can cause errors to be raised. In all other cases, the cursor will be repositioned.
BeeStringDictCursor objects are constructed using the BeeStringDict
.cursor()
method.
BeeStringDictCursor objects have the following methods:
position(key,value=None)
key may also be FirstKey or LastKey (in which case value has to be None).
next()
Returns true on success, false if the end-of-data has been reached.
prev()
Returns true on success, false if the end-of-data has been reached.
read()
read_key()
write()
The new data is not written to disk until the dictionary's current transaction is committed.
BeeStringDictCursor objects have the following read-only attributes:
key
These functions are available:
AutoRecover(Class,*args,**kws)
Example: diskdict = AutoRecover(BeeDict.BeeDict,
'test')
These constants are available:
Error
RecreateIndexError
RecoverError
FirstKey
LastKey
XXX Not yet documented. Please refer to the source code.
XXX Not yet documented. Please refer to the source code.
Here is a very simple one:
from mx import BeeBase #...XXX not written yet...
More examples will eventually appear in the Examples
subdirectory of the package.
Entries enclosed in brackets are packages (i.e. they are
directories that include a __init__.py file). Ones
without brackets are just simple subdirectories that are not
accessible via
The package imports all symbols from the BeeBase sub module
which in turn imports the extension module, so you only need
to '
eGenix.com is providing commercial support for this
package. If you are interested in receiving information
about this service please see the eGenix.com
Support Conditions.
© 1999-2000, Copyright by Marc-André Lemburg;
All Rights Reserved. mailto: mal@lemburg.com
© 2000-2001, Copyright by eGenix.com Software GmbH,
Langenfeld, Germany; All Rights Reserved. mailto: info@egenix.com
This software is covered by the eGenix.com Public
License Agreement. The text of the license is also
included as file "LICENSE" in the package's main directory.
By downloading, copying, installing or otherwise using
the software, you agree to be bound by the terms and
conditions of the eGenix.com Public License
Agreement.
Parts of this package are based on an ANSI C implementation
of a B+Tree implementation written by Thomas Nieman,
Portland, Oregon. The files in question are btr.c and btr.h
which were heavily modified for the purpose of inclusion in
this package by the above author.
The original files were extracted from btr.c -- an ANSI C
implementation included in the source code distribution of
From the cookbook:
Things that still need to be done:
There were no significant changes between 2.0.0 and 2.0.3.
Version 2.0.0 was the intial release.
© 1998-2000, Copyright by Marc-André Lemburg;
All Rights Reserved. mailto: mal@lemburg.com
© 2000-2001, Copyright by eGenix.com Software GmbH;
All Rights Reserved. mailto: info@egenix.com
Package Structure
[BeeBase]
Doc/
[mxBeeBase]
test.py
BeeBase.py
BeeDict.py
BeeIndex.py
BeeStorage.py
showBeeDict.py
import
.
import BeeBase
' to start working.
Support
Copyright & License
SORTING AND SEARCHING ALGORITHMS: A COOKBOOK
by THOMAS NIEMANN Portland, Oregon
home: http://epaperpress.com/
Permission to reproduce this document, in whole or in part,
is given provided the original web site listed below is
referenced, and no additional restrictions apply. Source
code, when part of a software project, may be used freely
without reference to the author.
History & Future