Sophy API

class SophiaError

General exception class used to indicate error returned by Sophia database.

Environment

class Sophia(path)
Parameters:path (str) – Directory path to store environment and databases.

Environment object providing access to databases and for controlling transactions.

Example of creating environment, attaching a database and reading/writing data:

from sophy import *


# Environment for managing one or more databases.
env = Sophia('/tmp/sophia-test')

# Schema describes the indexes that comprise the key and value portions
# of a database.
kv_schema = Schema([StringIndex('key')], [StringIndex('value')])
db = env.add_data('kv', kv_schema)

# We need to open the env after configuring the database(s), in order
# to read/write data.
assert env.open(), 'Failed to open environment!'

# We can use dict-style APIs to read/write key/value pairs.
db['k1'] = 'v1'
assert db['k1'] == 'v1'

# Close the env when finished.
assert env.close(), 'Failed to close environment!'
open()
Returns:Boolean indicating success.

Open the environment. The environment must be opened in order to read and write data to the configured databases.

close()
Returns:Boolean indicating success.

Close the environment.

add_database(name, schema)
Parameters:
  • name (str) – database name
  • schema (Schema) – schema for keys and values.
Returns:

a database instance

Return type:

Database

Add or declare a database. Environment must be closed to add databases. The Schema will declare the data-types and structure of the key- and value-portion of the database.

env = Sophia('/path/to/db-env')

# Declare an events database with a multi-part key (ts, type) and
# a msgpack-serialized data field.
events_schema = Schema(
    key_parts=[U64Index('timestamp'), StringIndex('type')],
    value_parts=[MsgPackIndex('data')])
db = env.add_database('events', events_schema)

# Open the environment for read/write access to the database.
env.open()

# We can now write to the database.
db[current_time(), 'init'] = {'msg': 'event logging initialized'}
remove_database(name)
Parameters:name (str) – database name

Remove a database from the environment. Environment must be closed to remove databases. This method does really not have any practical value but is provided for consistency.

get_database(name)
Returns:the database corresponding to the provided name
Return type:Database

Obtain a reference to the given database, provided the database has been added to the environment by a previous call to add_database().

__getitem__(name)

Short-hand for get_database().

transaction()
Returns:a transaction handle.
Return type:Transaction

Create a transaction handle which can be used to execute a transaction on the databases in the environment. The returned transaction can be used as a context-manager.

Example:

env = Sophia('/tmp/sophia-test')
db = env.add_database('test', Schema.key_value())
env.open()

with env.transaction() as txn:
    t_db = txn[db]
    t_db['k1'] = 'v1'
    t_db.update(k2='v2', k3='v3')

# Transaction has been committed.
print(db['k1'], db['k3'])  # prints "v1", "v3"

See Transaction for more information.

Database

class Database

Database interface. This object is not created directly, but references can be obtained via Sophia.add_database() or Sophia.get_database().

For example:

env = Sophia('/path/to/data')

kv_schema = Schema(StringIndex('key'), MsgPackIndex('value'))
kv_db = env.add_database('kv', kv_schema)

# Another reference to "kv_db":
kv_db = env.get_database('kv')

# Same as above:
kv_db = env['kv']
set(key, value)
Parameters:
  • key – key corresponding to schema (e.g. scalar or tuple).
  • value – value corresponding to schema (e.g. scalar or tuple).
Returns:

No return value.

Store the value at the given key. For single-index keys or values, a scalar value may be provided as the key or value. If a composite or multi-index key or value is used, then a tuple must be provided.

Examples:

simple = Schema(StringIndex('key'), StringIndex('value'))
simple_db = env.add_database('simple', simple)

composite = Schema(
    [U64Index('timestamp'), StringIndex('type')],
    [MsgPackIndex('data')])
composite_db = env.add_database('composite', composite)

env.open()  # Open env to access databases.

# Set k1=v1 in the simple key/value database.
simple_db.set('k1', 'v1')

# Set new value in composite db. Note the key is a tuple and, since
# the value is serialized using msgpack, we can transparently store
# data-types like dicts.
composite_db.set((current_time, 'evt_type'), {'msg': 'foo'})
get(key[, default=None])
Parameters:
  • key – key corresponding to schema (e.g. scalar or tuple).
  • default – default value if key does not exist.
Returns:

value of given key or default value.

Get the value at the given key. If the key does not exist, the default value is returned.

If a multi-part key is defined for the given database, the key must be a tuple.

Example:

simple_db.set('k1', 'v1')
simple_db.get('k1')  # Returns "v1".

simple_db.get('not-here')  # Returns None.
delete(key)
Parameters:key – key corresponding to schema (e.g. scalar or tuple).
Returns:No return value

Delete the given key, if it exists. If a multi-part key is defined for the given database, the key must be a tuple.

Example:

simple_db.set('k1', 'v1')
simple_db.delete('k1')  # Deletes "k1" from database.

simple_db.exists('k1')  # False.
exists(key)
Parameters:key – key corresponding to schema (e.g. scalar or tuple).
Returns:Boolean indicating if key exists.
Return type:bool

Return whether the given key exists. If a multi-part key is defined for the given database, the key must be a tuple.

multi_set([__data=None[, **kwargs]])
Parameters:
  • __data (dict) – Dictionary of key/value pairs to set.
  • kwargs – Specify key/value pairs as keyword-arguments.
Returns:

No return value

Set multiple key/value pairs efficiently.

multi_get(*keys)
Parameters:keys – key(s) to retrieve
Returns:a list of values associated with the given keys. If a key does not exist a None will be indicated for the value.
Return type:list

Get multiple values efficiently. Returned as a list of values corresponding to the keys argument, with missing values as None.

Example:

db.update(k1='v1', k2='v2', k3='v3')
db.multi_get('k1', 'k3', 'k-nothere')
# ['v1', 'v3', None]
multi_get_dict(keys)
Parameters:keys (list) – list of keys to get
Returns:a list of values associated with the given keys. If a key does not exist a None will be indicated for the value.
Return type:list

Get multiple values efficiently. Returned as a dict of key/value pairs. Missing values are not represented in the returned dict.

Example:

db.update(k1='v1', k2='v2', k3='v3')
db.multi_get_dict(['k1', 'k3', 'k-nothere'])
# {'k1': 'v1', 'k3': 'v3'}
multi_delete(*keys)
Parameters:keys – key(s) to delete
Returns:No return value

Efficiently delete multiple keys.

get_range(start=None, stop=None, reverse=False)
Parameters:
  • start – start key (omit to start at first record).
  • stop – stop key (omit to stop at the last record).
  • reverse (bool) – return range in reverse.
Returns:

a generator that yields the requested key/value pairs.

Fetch a range of key/value pairs from the given start-key, up-to and including the stop-key (if given).

keys()

Return a cursor for iterating over the keys in the database.

values()

Return a cursor for iterating over the values in the database.

items()

Return a cursor for iterating over the key/value pairs in the database.

__getitem__(key_or_slice)
Parameters:key_or_slice – key or range of keys to retrieve.
Returns:value of given key, or an iterator over the range of keys.
Raises:KeyError if single key requested and does not exist.

Retrieve a single value or a range of values, depending on whether the key represents a single row or a slice of rows.

Additionally, if a slice is given, the start and stop values can be omitted to indicate you wish to start from the first or last key, respectively.

__setitem__(key, value)

Equivalent to set().

__delitem__(key)

Equivalent to delete().

__contains__(key)

Equivalent to exists().

__iter__()

Equivalent to items().

__len__()

Equivalent to iterating over all keys and returning count. This is the most accurate way to get the total number of keys, but is not very efficient. An alternative is to use the Database.index_count property, which returns an approximation of the number of keys in the database.

cursor(order='>=', key=None, prefix=None, keys=True, values=True)
Parameters:
  • order (str) – ordering semantics (default is “>=”)
  • key – key to seek to before iterating.
  • prefix – string prefix to match.
  • keys (bool) – return keys when iterating.
  • values (bool) – return values when iterating.

Create a cursor with the given semantics. Typically you will want both keys=True and values=True (the defaults), which will cause the cursor to yield a 2-tuple consisting of (key, value) during iteration.

Transaction

class Transaction

Transaction handle, used for executing one or more operations atomically. This class is not created directly - use Sophia.transaction().

The transaction can be used as a context-manager. To read or write during a transaction, you should obtain a transaction-specific handle to the database you are operating on.

Example:

env = Sophia('/tmp/my-env')
db = env.add_database('kv', Schema.key_value())
env.open()

with env.transaction() as txn:
    tdb = txn[db]  # Obtain reference to "db" in the transaction.
    tdb['k1'] = 'v1'
    tdb.update(k2='v2', k3='v3')

# At the end of the wrapped block, the transaction is committed.
# The writes have been recorded:
print(db['k1'], db['k3'])
# ('v1', 'v3')
begin()

Begin a transaction.

commit()
Raises:SophiaError

Commit all changes. An exception can occur if:

  1. The transaction was rolled back, either explicitly or implicitly due to conflicting changes having been committed by a different transaction. Not recoverable.
  2. A concurrent transaction is open and must be committed before this transaction can commit. Possibly recoverable.
rollback()

Roll-back any changes made in the transaction.

__getitem__(db)
Parameters:db (Database) – database to reference during transaction
Returns:special database-handle for use in transaction
Return type:DatabaseTransaction

Obtain a reference to the database for use within the transaction. This object supports the same APIs as Database, but any reads or writes will be made within the context of the transaction.

Schema Definition

class Schema(key_parts, value_parts)
Parameters:
  • key_parts (list) – a list of Index objects (or a single index object) to use as the key of the database.
  • value_parts (list) – a list of Index objects (or a single index object) to use for the values stored in the database.

The schema defines the structure of the keys and values for a given Database. They can be comprised of a single index-type or multiple indexes for composite keys or values.

Example:

# Simple schema defining text keys and values.
simple = Schema(StringIndex('key'), StringIndex('value'))

# Schema with composite key for storing timestamps and event-types,
# along with msgpack-serialized data as the value.
event_schema = Schema(
    [U64Index('timestamp'), StringIndex('type')],
    [MsgPackIndex('value')])

Schemas are used when adding databases using the Sophia.add_database() method.

add_key(index)
Parameters:index (BaseIndex) – an index object to add to the key parts.

Add an index to the key. Allows Schema to be built-up programmatically.

add_value(index)
Parameters:index (BaseIndex) – an index object to add to the value parts.

Add an index to the value. Allows Schema to be built-up programmatically.

classmethod key_value()

Short-hand for creating a simple text schema consisting of a single StringIndex for both the key and the value.

class BaseIndex(name)
Parameters:name (str) – Name for the key- or value-part the index represents.

Indexes are used to define the key and value portions of a Schema. Traditional key/value databases typically only supported a single-value, single-datatype key and value (usually bytes). Sophia is different in that keys or values can be comprised of multiple parts with differing data-types.

For example, to emulate a typical key/value store:

schema = Schema([BytesIndex('key')], [BytesIndex('value')])
db = env.add_database('old_school', schema)

Suppose we are storing time-series event logs. We could use a 64-bit integer for the timestamp (in micro-seconds) as well as a key to denote the event-type. The value could be arbitrary msgpack-encoded data:

key = [U64Index('timestamp'), StringIndex('type')]
value = [MsgPackIndex('value')]
events = env.add_database('events', Schema(key, value))
class SerializedIndex(name, serialize, deserialize)
Parameters:
  • name (str) – Name for the key- or value-part the index represents.
  • serialize – a callable that accepts data and returns bytes.
  • deserialize – a callable that accepts bytes and deserializes the data.

The SerializedIndex can be used to transparently store data as bytestrings. For example, you could use a library like msgpack or pickle to transparently store and retrieve Python objects in the database:

key = StringIndex('key')
value = SerializedIndex('value', pickle.dumps, pickle.loads)
pickled_db = env.add_database('data', Schema([key], [value]))

Note: sophy already provides indexes for JsonIndex, MsgPackIndex and PickleIndex.

class BytesIndex(name)

Store arbitrary binary data in the database.

class StringIndex(name)

Store text data in the database as UTF8-encoded bytestrings. When reading from a StringIndex, data is decoded and returned as unicode.

class JsonIndex(name)

Store data as UTF8-encoded JSON. Python objects will be transparently serialized and deserialized when writing and reading, respectively.

class MsgPackIndex(name)

Store data using the msgpack serialization format. Python objects will be transparently serialized and deserialized when writing and reading.

Note: Requires the msgpack-python library.

class PickleIndex(name)

Store data using Python’s pickle serialization format. Python objects will be transparently serialized and deserialized when writing and reading.

class UUIDIndex(name)

Store UUIDs. Python uuid.UUID() objects will be stored as raw bytes and decoded to uuid.UUID() instances upon retrieval.

class U64Index(name)
class U32Index(name)
class U16Index(name)
class U8Index(name)

Store unsigned integers of the given sizes.

class U64RevIndex(name)
class U32RevIndex(name)
class U16RevIndex(name)
class U8RevIndex(name)

Store unsigned integers of the given sizes in reverse order.

Cursor

class Cursor

Cursor handle for a Database. This object is not created directly but through the Database.cursor() method or one of the database methods that returns a row iterator (e.g. Database.items()).

Cursors are iterable and, depending how they were configured, can return keys, values or key/value pairs.

Settings

Sophia supports a wide range of settings and configuration options. These settings are also documented in the Sophia documentation.

Environment settings

The following settings are available as properties on Sophia:

Setting Type Description
version string, ro Get current Sophia version
version_storage string, ro Get current Sophia storage version
build string, ro Get git commit hash of build
status string, ro Get environment status (eg online)
errors int, ro Get number of errors
error string, ro Get last error description
path string, ro Get current Sophia environment directory
Backups    
backup_path string Set backup path
backup_run method Start backup in background (non-blocking)
backup_active int, ro Show if backup is running
backup_last int, ro Show ID of last-completed backup
backup_last_complete int, ro Show if last backup succeeded
Scheduler    
scheduler_threads int Get or set number of worker threads
scheduler_trace(thread_id) method Get a worker trace for given thread
Transaction Manager    
transaction_online_rw int, ro Number of active read/write transactions
transaction_online_ro int, ro Number of active read-only transactions
transaction_commit int, ro Total number of completed transactions
transaction_rollback int, ro Total number of transaction rollbacks
transaction_conflict int, ro Total number of transaction conflicts
transaction_lock int, ro Total number of transaction locks
transaction_latency string, ro Average transaction latency from start to end
transaction_log string, ro Average transaction log length
transaction_vlsn int, ro Current VLSN
transaction_gc int, ro SSI GC queue size
Metrics    
metric_lsn int, ro Current log sequential number
metric_tsn int, ro Current transaction sequential number
metric_nsn int, ro Current node sequential number
metric_dsn int, ro Current database sequential number
metric_bsn int, ro Current backup sequential number
metric_lfsn int, ro Current log file sequential number
Write-ahead Log    
log_enable int Enable or disable transaction log
log_path string Get or set folder for log directory
log_sync int Sync transaction log on every commit
log_rotate_wm int Create a new log after “rotate_wm” updates
log_rotate_sync int Sync log file on every rotation
log_rotate method Force Sophia to rotate log file
log_gc method Force Sophia to garbage-collect log file pool
log_files int, ro Number of log files in the pool

Database settings

The following settings are available as properties on Database. By default, Sophia uses pread(2) to read from disk. When mmap-mode is on (by default), Sophia handles all requests by directly accessing memory-mapped node files.

Setting Type Description
database_name string, ro Get database name
database_id int, ro Database sequential ID
database_path string, ro Directory for storing data
mmap int Enable or disable mmap-mode
direct_io int Enable or disable O_DIRECT mode.
sync int Sync node file on compaction completion
expire int Enable or disable key expiration
compression string Specify compression type: lz4, zstd, none (default)
limit_key int, ro Scheme key size limit
limit_field int Scheme field size limit
Index    
index_memory_used int, ro Memory used by database for in-memory key indexes
index_size int, ro Sum of nodes size in bytes (e.g. database size)
index_size_uncompressed int, ro Full database size before compression
index_count int, ro Total number of keys in db, includes unmerged dupes
index_count_dup int, ro Total number of transactional duplicates
index_read_disk int, ro Number of disk reads since start
index_read_cache int, ro Number of cache reads since start
index_node_count int, ro Number of active nodes
index_page_count int, ro Total number of pages
Compaction    
compaction_cache int Total write cache size used for compaction
compaction_checkpoint int  
compaction_node_size int Set a node file size in bytes.
compaction_page_size int Set size of page
compaction_page_checksum int Validate checksum during compaction
compaction_expire_period int Run expire check process every N seconds
compaction_gc_wm int GC starts when watermark value reaches N dupes
compaction_gc_period int Check for a gc every N seconds
Performance    
stat_documents_used int, ro Memory used by allocated document
stat_documents int, ro Number of currently allocated documents
stat_field string, ro Average field size
stat_set int, ro Total number of Set operations
stat_set_latency string, ro Average Set latency
stat_delete int, ro Total number of Delete operations
stat_delete_latency string, ro Average Delete latency
stat_get int, ro Total number of Get operations
stat_get_latency string, ro Average Get latency
stat_get_read_disk string, ro Average disk reads by Get operation
stat_get_read_cache string, ro Average cache reads by Get operation
stat_pread int, ro Total number of pread operations
stat_pread_latency string, ro Average pread latency
stat_cursor int, ro Total number of cursor operations
stat_cursor_latency string, ro Average cursor latency
stat_cursor_read_disk string, ro Average disk reads by Cursor operation
stat_cursor_read_cache string, ro Average cache reads by Cursor operation
stat_cursor_ops string, io Average number of keys read by Cursor operation
Scheduler    
scheduler_gc int, ro Show if GC operation is in progress
scheduler_expire int, ro Show if expire operation is in progress
scheduler_backup int, ro Show if backup operation is in progress
scheduler_checkpoint int, ro