Python - Our key to Efficiency

mxTools - A Collection of New Builtins for Python


Builtin Functions : Builtin Objects : sys-Module Functions : mx.Tools Functions : Examples : Structure : Support : Download : Copyright & License : History : Home Version 2.0.3

Introduction

    As time passes there have often been situations where I thought "Hey, why not have this as builtin". In most cases the functions were easily coded in Python. But I started to use them quite heavily and since performance is always an issue (at least for me: hits/second pay my bills), I decided to code them in C. Well, that's how it started and here we are now with an ever growing number of goodies...

    The functions defined by the C extensions are installed by the package at import time in different places of the Python interpreter. They work as fast add-ons to the existing set of functions and objects.

New Builtin Functions

    The following functions are installed as Python builtin functions at package import time. They are then available as normal builtin functions in every module without explicit import in each module using them (though it is good practice to still put a 'import mx.Tools.NewBuiltins' at the top of each module relying on these add-ons).

    indices(object)
    Returns the same as tuple(range(len(object))) -- a tad faster and a lot easier to type.

    trange([start=0,]stop[,step=1])
    This works like the builtin function range() but returns a tuple instead of a list. Since range() is most often used in for-loops there really is no need for a mutable data type and construction of tuples is somewhat (20%) faster than that of lists. So changing the usage of range() in for-loops to trange() pays off in the long run.

    range_len(object)
    Returns the same as range(len(object)).

    tuples(sequence)
    Returns much the same as apply(map,(None,)+tuple(sequence)) does, except that the resulting list will always have the length of the first sub-sequence in sequence. The function returns a list of tuples (a[0], b[0], c[0],...), (a[1], b[1], c[1],...), ... with missing elements being filled in with None.

    Note that the function is of the single argument type meaning that calling tuples(a,b,c) is the same as calling tuples((a,b,c)). tuples() can be used as inverse to lists().

    lists(sequence)
    Same as tuples(sequence), except that a tuple of lists is returned. Can be used as inverse to tuples().

    reverse(sequence)
    Returns a tuple or list with the elements from sequence in reverse order. A tuple is returned, if the sequence itself is a tuple. In all other cases a list is returned.

    dict(items)
    Constructs a dictionary from the given items sequence. The sequence items must contain sequence entries with at least two values. The first one is interpreted as key, the second one as associated object. Remaining values are ignored.

    setdict(sequence,value=None)
    Constructs a dictionary from the given sequence. The sequence must contain hashable objects which are used as keys. The values are all set to value. Multiple keys are silently ignored. The function comes in handy whenever you need to work with a sequence in a set based context (e.g. to determine the set of used values).

    invdict(dictionary)
    Constructs a new dictionary from the given one with inverted mappings. Keys become values and vice versa. Note that no exception is raised if the values are not unique. The result is undefined in this case (there is a value:key entry, but it is not defined which key gets used).

    irange(object[,indices])
    Builds a tuple of tuples (index,object[index]). If a sequence indices is given, the indices are read from it. If not, then the index sequence defaults to trange(len(object)).

    Note that object can be any object that can handle object[index], e.g. lists, tuples, string, dictionaries, even your own objects, if they provide a __getitem__-method. This makes very nifty constructions possible and extracting items from another sequence becomes a piece of cake. Give it a try ! You'll soon love this little function.

    ifilter(condition,object[,indices])
    Builds a list of tuples (index,object[index]) such that condition(object[index]) is true and index is found in the sequence indices (defaulting to trange(len(object))). Order is preserved. condition must be a callable object.

    get(object,index[,default])
    Returns object[index], or, if that fails, default. If default is not given or the singleton NotGiven an error is raised (the error produced by the object).

    extract(object,indices[,defaults])
    Builds a list with entries object[index] for each index in the sequence indices.

    If a lookup fails and the sequence defaults is given, then defaults[nth_index] is used, where nth_index is the index of index in indices (confused ? it works as expected !). defaults should have the same length as indices.

    If you need the indices as well, try the irange function. The function raises an IndexError in case it can't find an entry in indices or defaults.

    iremove(object,indices)
    Removes the items indexed by indices from object.

    This changes the object in place and thus is only possible for mutable types.

    For sequences the index list must be sorted ascending; an IndexError will be raised otherwise (and the object left in an undefined state).

    findattr(object_list,attrname)
    Returns the first attribute with name attrname found among the objects in the list. Raises an AttributeError if the attribute is not found.

    attrlist(object_list,attrname)
    Returns a list of all attributes with name attrname found among the objects in the list.

    napply(number_of_calls,function[,args=(),kw={}])
    Calls the given function number_of_calls times with the same arguments and returns a tuple with the return values. This is roughly equivalent to a for-loop that repeatedly calls apply(function,args,kw) and stores the return values in a tuple. Example: create a tuple of 10 random integers... l = napply(10,whrandom.randint,(0,10)).

    mapply(callable_objects[,args=(),kw={}])
    Creates a tuple of values by applying the given arguments to each object in the sequence callable_objects.

    This function has a functionality dual to that of map(). While map() applies many different arguments to one callable object, this function applies one set of arguments to many different callable objects.

    method_mapply(objects,methodname[,args=(),kw={}])
    Creates a tuple of values by applying the given arguments to each object's <methodname> method. The objects are processed as given in the sequence objects.

    A simple application is e.g. method_mapply([a,b,c],'method', (x,y)) resulting in a tuple (a.method(x,y), b.method(x,y), c.method(x,y)). Thanks to Aaron Waters for suggesting this function.

    count(condition,sequence)
    Counts the number of objects in sequence for which condition returns true and returns the result as integer. condition must be a callable object.

    exists(condition,sequence)
    Return 1 if and only if condition is true for at least one of the items in sequence and 0 otherwise. condition must be a callable object.

    forall(condition,sequence)
    Return 1 if and only if condition is true for all of the items in sequence and 0 otherwise. condition must be a callable object.

    index(condition,sequence)
    Return the index of the first item for which condition is true. A ValueError is raised in case no item is found. condition must be a callable object.

    sizeof(object)
    Returns the number of bytes allocated for the given Python object. Additional space allocated by the object and stored in pointers is not taken into account (though the pointer itself is). If the object defines tp_itemsize in its type object then it is assumed to be a variable size object and the size is adjusted accordingly.

    acquire(object,name)
    Looks up the attribute name in object.baseobj and returns the result. If object does not have an attribute 'baseobj' or that attribute is None or the attribute name starts with an underscore, an AttributeError is raised.

    This function can be used as __getattr__ hook in Python classes to enable implicit acquisition along a predefined lookup chain (object.baseobj provides a way to set up this chain). See Examples/Acquistion.py for some sample code.

    defined(name)
    Returns true iff a symbol name is defined in the current namespace.

    The function has intimate knowledge about how symbol resolution works in Python: it first looks in locals(), then in globals() and if that fails in __builtins__.

    reval(codestring[,locals={}])
    Evaluates the given codestring in a restricted environment that only allows access to operators and basic type constructors like (), [] and {}.

    No builtins are available for the evaluation. locals can be given as local namespace to use when evaluating the codestring.

    After a suggestion by Tim Peters on comp.lang.python.

    truth(object)
    Returns the truth value of object as truth singleton (True or False). Note that the singletons are ordinary Python integers 1 and 0, so you can also use them in calculations.

    This function is different from the one in the operator module: the function does not return truth singletons but integers.

    sign(object)
    Returns the signum of object interpreted as number, i.e. -1 for negative numbers, +1 for positive ones and 0 in case it is equal to 0. The method used is equivalent to cmp(object,-object).

    A note on the naming scheme used:

    • i stands for indexed, meaning that you have access to indices
    • m stands for multi, meaning that processing involves multiple objects
    • n stands for n-times, e.g. a function is executed a certain number of times
    • t stands for tuple
    • x stands for lazy evaluation

    Since this is (and will always be) work-in-progress, more functions will eventually turn up in this module, so stopping by every now and then is not a bad idea :-).

New Builtin Objects

    These objects are available after importing the package:

    xmap(func, seq, [seq, seq, ...])
    Constructs a new xmap object emulating map(func, seq, [seq, seq, ...]).

    The object behaves like a list, but evaluation of the function is postponed until a specific element from the list is requested. Unlike map, xmap can handle sequences not having a __len__ method defined (due to the evaluation-on-demand feature).

    The xmap objects define one method:

    tolist()
    Return the whole list giving the same result as the emulated map()-construct.

    This object is a contribution by Christopher Tavares (see xmap.c for his email address). I am providing this extension AS-IS, since I haven't had time to adapt it to my coding style.

    NotGiven
    This is a singleton similar to None. Its main purpose is providing a way to indicate that a keyword was not given in a call to a keyword capable function, e.g.
    import mx.Tools.NewBuiltins
    
    	      def f(a,b=4,c=NotGiven,d=''):
    	      if c is NotGiven:
    	         return a / b, d
    	      else:
    	         return a*b + c, d
    	    

    It is also considered false in if-statements, e.g.

    import mx.Tools.NewBuiltins
    
    	      a = NotGiven
    	      # ...init a conditionally...
    	      if not a:
    	         print 'a was not given as value'
    	    

    True, False
    These two singletons are used by Python internally to express the boolean values true and false. They represent Python integer objects for 1 and 0 resp. All explicit comparisons return these singletons, e.g. (1==1) is True and (1==0) is False.

New sys-Module Functions

mx.Tools Functions

    The following functions are not installed in any builtin module. Instead, you have to reference them via the mx.Tools module.

    mx.Tools.verscmp(a,b)
    Compares two version strings and returns a cmp() function compatible value (<,==,> 0). The function is useful for sorting lists containing version strings.

    The logic used is as follows: the strings are compared at each level, empty levels defaulting to '0', numbers with attached strings (e.g. '1a1') compare less than numbers without attachement (e.g. '1a1' < '1).

    mx.Tools.dictscan(dictobj[,prevposition=0])
    Dictionary scanner.

    Returns a tuple (key,value,position) containing the key,value pair and slot position of the next item found in the dictionaries hash table after slot prevposition.

    Raises an IndexError when the end of the table is reached or the prevposition index is out of range.

    Note that the dictionary scanner does not produce an items list. It provides a very memory efficient way of iterating over large dictionaries.

    mx.Tools.srange(string)
    Converts a textual representation of integer numbers and ranges to a Python list.

    Supported formats: "2,3,4,2-10,-1 - -3, 5 - -2"

    Values are appended to the created list in the order specified in the string.

    mx.Tools.fqhostname(hostname=None, ip=None)
    Tries to return the fully qualified (hostname, ip) for the given hostname.

    If hostname is None, the default name of the local host is chosen. ip then defaults to '127.0.0.1' if not given.

    The function modifies the input data according to what it finds using the socket module. If that doesn't work the input data is returned unchanged.

    mx.Tools.username(default='')
    Return the user name of the user running the current process.

    If no user name can be determined, default is returned.

    mx.Tools.scanfiles(files, dir=None, levels=0, filefilter=None)
    Build a list of filenames starting with the filenames and directories given in files.

    The filenames in are made absolute relative to dir. dir defaults to the current working directory if not given.

    If levels is greater than 0, directories in the files list are recursed into up the given number of levels.

    If filefilter is given, as re match object, then all filenames (the absolute names) are matched against it. Filenames which do not match the criteria are removed from the list.

    Note that directories are not included in the resulting list. All filenames are non-directories.

    If no user name can be determined, default is returned.

mx.Tools Objects

    The following objects are not installed in any builtin module. Instead, you have to reference them via the mx.Tools module.

    mx.Tools.DictScan(dictionary)
    Creates a forward iterator for the given dictionary. It is based on mx.Tools.dictscan().

    The dictionary scanner does not produce an items list. It provides a very memory efficient way of iterating over large dictionaries.

    Note that no precaution is taken to insure that the dictionary is not modified in-between calls to the __getitem__ method. It is the user's responsibility to ensure that the dictionary is neither modified, nor changed in size, since this would result in skipping entries or double occurance of items in the scan.

    The iterator inherits all methods from the underlying dictionary for convenience.

    The returned object inherits all methods from the underlying dictionary and additionally provides the following methods:

    reset()
    Resets the iterator to its initial position.

    mx.Tools.DictItems(dictionary)
    Is an alias for mx.Tools.DictScan.

Examples of Use

    A few simple examples:

    import mx.Tools.NewBuiltins
    
    sequence = range(100)
    
    # In place calculations:
    for i,item in irange(sequence):
        sequence[i] = 2*item
    
    # Get all odd-indexed items from a sequence:
    odds = extract(sequence,trange(0,len(sequence),2))
    
    # Turn a tuple of lists into a list of tuples:
    chars = 'abcdefghji'
    ords = map(ord,chars)
    table = tuples(chars,ords)
    
    # The same as dictionary:
    chr2ord = dict(table)
    
    # Inverse mapping:
    ord2chr = invdict(chr2ord)
    
    # Range checking:
    if exists( lambda x: x > 10, sequence ):
        print 'Warning: Big sequence elements!'
    
    # Handle special cases:
    if forall( lambda x: x > 0, sequence ):
        print 'Positive sequence'
    else:
        print 'Index %i loses' % (index( lambda x: x <= 0, sequence ),)
    
    # dict.get functionality for e.g. lists:
    print get(sequence,101,"Don't have an element with index 101")
    
    # Filtering away false entries of a list:
    print filter(truth,[1,2,3,0,'',None,NotGiven,4,5,6])
    
    

    More elaborate examples can be found in the Examples/ subdirectory of the package.

Package Structure

    [Tools]
           Doc/
           [Examples]
                  Acquisition.py
           [mxTools]
                  vc5/
                  bench1.py
                  bench2.py
                  hack.py
                  test.py
           NewBuiltins.py
           Tools.py
    	

    Entries enclosed in brackets are packages (i.e. they are directories that include a __init__.py file) or submodules. Ones with slashes are just ordinary subdirectories that are not accessible via import.

    Note: Importing mx.Tools will automatically install the functions and objects defined in this package as builtins. They are then available in all other modules without having to import then again every time. If you don't want this feature, you can turn it off in mx/Tools/__init__.py.

Support

Copyright & License

History & Future

    Things that still need to be done:

    • Implement a generic join() builtin:
      join((a,b,c),sep) := (((a + sep) + b) + sep) + c
      with optimizations for sequences of strings, unicode objects, lists and tuples (e.g. join(((1,2),(3,4),(0,))) gives (1,2,0,3,4)).

    • Provide some more examples.
    • Add Neil S.'s repeat module (see private dir).
    • Make dict(dict) return a copy of the dictionary just like list(list) returns a copy of the list.

    Changes from 2.0.0 to 2.0.3:

    • Removed config.h include from xmap.c -- this was never really needed and causes problems with Python 2.2. Thanks to Gerhard Häring for finding this one.

    Changes from 1.0.0 to 2.0.0:

    • Added truth(;-).

    • Added VC5 project files donated by Darrell Gallion.

    • Added sys.debugging() and sign().

    • Moved the package under a new top-level package 'mx' and renamed it to mx.Tools. It is part of the eGenix.com mx BASE distribution.

    • Added mx.Tools.verscmp().

    • Added mx.Tools.dictscan() and mx.Tools.DictScan class.

    • Added mx.Tools.scanfiles().

    Changes from 0.9.1 to 1.0.0:

    • Added defined().

    • Moved the two functions verbosity() and optimization() to the sys module and added enhanced custimization code to NewBuiltins/NewBuiltins.py. You can now define where the functions and objets are installed by editing that file.

    • Important change: Cleaned up the packages offerings in that all old function names are now disabled by default. You can edit NewBuiltins/NewBuiltins.py to reenable them without the need to recompile.

    • Added sys.cur_frame() and iremove().

    • Added Macintosh precompiled binaries for PowerMacs donated by Joseph Strout. They are included in a StuffIt file in the mxTools subdir.

    • Added True and False singletons.

    Changes from 0.9.0 to 0.9.1:

    • Added optimization() (XXX should probably go into sys rather than __builtins__ just as verbosity()).

    • Added new PYD files provided by David Ascher. Thanks Dave :-)

    • Added NotGiven singleton.

    Changes from 0.8.1 to 0.9.0:

    • Added acquire() and a some sample code for its usage in Examples/Acquisition.py.

    • Renamed mget() to extract(). The old function is aliased to mget, so this should not break any existing code.

    • tuples() now is a single argument function, just as lists(): passing tuples() a single tuple will cause it to interpret the tuple as argument tuple, e.g. tuples((a,b)) gets interpreted as tuples(a,b). Note that this causes the semantics of tuples(a) (called with only one sequence) to change !!!

    • Added setdict() and attrlist().

    • Added verbosity().

    Changes from 0.7 to 0.8:

    • Version 0.8.1: Fixed a bug that caused sizeof(), reverse() and invdict() to dump core when called without argument.

    • Fixed a bug in forall() that caused it to fail. Found by Henk Jansen.

    • Added index(). Contributed by Henk Jansen (TU Delft).

    • Added tests for forall(), exists(), count() and index() to the test script.

    Changes from 0.6 to 0.7:

    • Renamed mgetattr() to findattr(). The old function name still works, but it will either be removed in the near future or replaced with another functionality.

    • Changed the way list creation works. This should speed up all functions from the package which create and return lists.

    • Fixed a serious memory leakage in dict().

    • Fixed a bug in get().

    • Added lists().

    • Made many functions taking only one argument use a simpler calling mechanism. Note that passing more than one argument to these functions results in the "multiple" arguments being seen as a tuple, e.g. sizeof(1,2) is the same as calling sizeof((1,2)). Note that there are some other methods in core Python that work in the same way, e.g. list.append(1,2) really does a list.append((1,2)). I find this convenient at times.

    • Renamed trange_len() to indices() (easier to write and intuitive enough to use, e.g. in 'for i in indices(obj):...'). The old function name still works, but it will be removed in the near future.

    Changes from 0.5 to 0.6:

    • Added dict().

    • Added invdict().

    • Added tuples().

    • Added reverse().

    Changes from 0.4 to 0.5:

    • Added get() and mget().

    • Added sizeof().

    • Added mgetattr().

    Changes from 0.3 to 0.4:

    • Converted the module into a package called 'NewBuiltins'. Importing it installs all of the functions defined in the C extension modules as builtins.

    • Fixed a few memory leaks.

    • Added method_mapply().

    • Added Christopher Tavares' xmap module.


© 1997-2000, Copyright by Marc-André Lemburg; All Rights Reserved. mailto: mal@lemburg.com

© 2000-2001, Copyright by eGenix.com Software GmbH; All Rights Reserved. mailto: info@egenix.com