Sophy API

class SophiaError

General exception class used to indicate error returned by Sophia database.

Environment

class Sophia(path)
Parameters:

path (str) – Directory path to store environment and databases.

Environment object providing access to databases and for controlling transactions.

Example of creating environment, attaching a database and reading/writing data:

from sophy import *


# Environment for managing one or more databases.
env = Sophia('/tmp/sophia-test')

# Schema describes the indexes that comprise the key and value portions
# of a database.
kv_schema = Schema([StringIndex('key')], [StringIndex('value')])
db = env.add_data('kv', kv_schema)

# We need to open the env after configuring the database(s), in order
# to read/write data.
assert env.open(), 'Failed to open environment!'

# We can use dict-style APIs to read/write key/value pairs.
db['k1'] = 'v1'
assert db['k1'] == 'v1'

# Close the env when finished.
assert env.close(), 'Failed to close environment!'
open()
Returns:

Boolean indicating success.

Open the environment. The environment must be opened in order to read and write data to the configured databases.

close()
Returns:

Boolean indicating success.

Close the environment.

add_database(name, schema)
Parameters:
  • name (str) – database name

  • schema (Schema) – schema for keys and values.

Returns:

a database instance

Return type:

Database

Add or declare a database. Environment must be closed to add databases. The Schema will declare the data-types and structure of the key- and value-portion of the database.

env = Sophia('/path/to/db-env')

# Declare an events database with a multi-part key (ts, type) and
# a msgpack-serialized data field.
events_schema = Schema(
    key_parts=[U64Index('timestamp'), StringIndex('type')],
    value_parts=[MsgPackIndex('data')])
db = env.add_database('events', events_schema)

# Open the environment for read/write access to the database.
env.open()

# We can now write to the database.
db[current_time(), 'init'] = {'msg': 'event logging initialized'}
remove_database(name)
Parameters:

name (str) – database name

Remove a database from the environment. Environment must be closed to remove databases. This method does really not have any practical value but is provided for consistency.

get_database(name)
Returns:

the database corresponding to the provided name

Return type:

Database

Obtain a reference to the given database, provided the database has been added to the environment by a previous call to add_database().

__getitem__(name)

Short-hand for get_database().

transaction()
Returns:

a transaction handle.

Return type:

Transaction

Create a transaction handle which can be used to execute a transaction on the databases in the environment. The returned transaction can be used as a context-manager.

Example:

env = Sophia('/tmp/sophia-test')
db = env.add_database('test', Schema.key_value())
env.open()

with env.transaction() as txn:
    t_db = txn[db]
    t_db['k1'] = 'v1'
    t_db.update(k2='v2', k3='v3')

# Transaction has been committed.
print(db['k1'], db['k3'])  # prints "v1", "v3"

See Transaction for more information.

Database

class Database

Database interface. This object is not created directly, but references can be obtained via Sophia.add_database() or Sophia.get_database().

For example:

env = Sophia('/path/to/data')

kv_schema = Schema(StringIndex('key'), MsgPackIndex('value'))
kv_db = env.add_database('kv', kv_schema)

# Another reference to "kv_db":
kv_db = env.get_database('kv')

# Same as above:
kv_db = env['kv']
set(key, value)
Parameters:
  • key – key corresponding to schema (e.g. scalar or tuple).

  • value – value corresponding to schema (e.g. scalar or tuple).

Returns:

No return value.

Store the value at the given key. For single-index keys or values, a scalar value may be provided as the key or value. If a composite or multi-index key or value is used, then a tuple must be provided.

Examples:

simple = Schema(StringIndex('key'), StringIndex('value'))
simple_db = env.add_database('simple', simple)

composite = Schema(
    [U64Index('timestamp'), StringIndex('type')],
    [MsgPackIndex('data')])
composite_db = env.add_database('composite', composite)

env.open()  # Open env to access databases.

# Set k1=v1 in the simple key/value database.
simple_db.set('k1', 'v1')

# Set new value in composite db. Note the key is a tuple and, since
# the value is serialized using msgpack, we can transparently store
# data-types like dicts.
composite_db.set((current_time, 'evt_type'), {'msg': 'foo'})
get(key[, default=None])
Parameters:
  • key – key corresponding to schema (e.g. scalar or tuple).

  • default – default value if key does not exist.

Returns:

value of given key or default value.

Get the value at the given key. If the key does not exist, the default value is returned.

If a multi-part key is defined for the given database, the key must be a tuple.

Example:

simple_db.set('k1', 'v1')
simple_db.get('k1')  # Returns "v1".

simple_db.get('not-here')  # Returns None.
delete(key)
Parameters:

key – key corresponding to schema (e.g. scalar or tuple).

Returns:

No return value

Delete the given key, if it exists. If a multi-part key is defined for the given database, the key must be a tuple.

Example:

simple_db.set('k1', 'v1')
simple_db.delete('k1')  # Deletes "k1" from database.

simple_db.exists('k1')  # False.
exists(key)
Parameters:

key – key corresponding to schema (e.g. scalar or tuple).

Returns:

Boolean indicating if key exists.

Return type:

bool

Return whether the given key exists. If a multi-part key is defined for the given database, the key must be a tuple.

multi_set([__data=None[, **kwargs]])
Parameters:
  • __data (dict) – Dictionary of key/value pairs to set.

  • kwargs – Specify key/value pairs as keyword-arguments.

Returns:

No return value

Set multiple key/value pairs efficiently.

multi_get(*keys)
Parameters:

keys – key(s) to retrieve

Returns:

a list of values associated with the given keys. If a key does not exist a None will be indicated for the value.

Return type:

list

Get multiple values efficiently. Returned as a list of values corresponding to the keys argument, with missing values as None.

Example:

db.update(k1='v1', k2='v2', k3='v3')
db.multi_get('k1', 'k3', 'k-nothere')
# ['v1', 'v3', None]
multi_get_dict(keys)
Parameters:

keys (list) – list of keys to get

Returns:

a list of values associated with the given keys. If a key does not exist a None will be indicated for the value.

Return type:

list

Get multiple values efficiently. Returned as a dict of key/value pairs. Missing values are not represented in the returned dict.

Example:

db.update(k1='v1', k2='v2', k3='v3')
db.multi_get_dict(['k1', 'k3', 'k-nothere'])
# {'k1': 'v1', 'k3': 'v3'}
multi_delete(*keys)
Parameters:

keys – key(s) to delete

Returns:

No return value

Efficiently delete multiple keys.

get_range(start=None, stop=None, reverse=False)
Parameters:
  • start – start key (omit to start at first record).

  • stop – stop key (omit to stop at the last record).

  • reverse (bool) – return range in reverse.

Returns:

a generator that yields the requested key/value pairs.

Fetch a range of key/value pairs from the given start-key, up-to and including the stop-key (if given).

keys()

Return a cursor for iterating over the keys in the database.

values()

Return a cursor for iterating over the values in the database.

items()

Return a cursor for iterating over the key/value pairs in the database.

__getitem__(key_or_slice)
Parameters:

key_or_slice – key or range of keys to retrieve.

Returns:

value of given key, or an iterator over the range of keys.

Raises:

KeyError if single key requested and does not exist.

Retrieve a single value or a range of values, depending on whether the key represents a single row or a slice of rows.

Additionally, if a slice is given, the start and stop values can be omitted to indicate you wish to start from the first or last key, respectively.

__setitem__(key, value)

Equivalent to set().

__delitem__(key)

Equivalent to delete().

__contains__(key)

Equivalent to exists().

__iter__()

Equivalent to items().

__len__()

Equivalent to iterating over all keys and returning count. This is the most accurate way to get the total number of keys, but is not very efficient. An alternative is to use the Database.index_count property, which returns an approximation of the number of keys in the database.

cursor(order='>=', key=None, prefix=None, keys=True, values=True)
Parameters:
  • order (str) – ordering semantics (default is “>=”)

  • key – key to seek to before iterating.

  • prefix – string prefix to match.

  • keys (bool) – return keys when iterating.

  • values (bool) – return values when iterating.

Create a cursor with the given semantics. Typically you will want both keys=True and values=True (the defaults), which will cause the cursor to yield a 2-tuple consisting of (key, value) during iteration.

Transaction

class Transaction

Transaction handle, used for executing one or more operations atomically. This class is not created directly - use Sophia.transaction().

The transaction can be used as a context-manager. To read or write during a transaction, you should obtain a transaction-specific handle to the database you are operating on.

Example:

env = Sophia('/tmp/my-env')
db = env.add_database('kv', Schema.key_value())
env.open()

with env.transaction() as txn:
    tdb = txn[db]  # Obtain reference to "db" in the transaction.
    tdb['k1'] = 'v1'
    tdb.update(k2='v2', k3='v3')

# At the end of the wrapped block, the transaction is committed.
# The writes have been recorded:
print(db['k1'], db['k3'])
# ('v1', 'v3')
begin()

Begin a transaction.

commit()
Raises:

SophiaError

Commit all changes. An exception can occur if:

  1. The transaction was rolled back, either explicitly or implicitly due to conflicting changes having been committed by a different transaction. Not recoverable.

  2. A concurrent transaction is open and must be committed before this transaction can commit. Possibly recoverable.

rollback()

Roll-back any changes made in the transaction.

__getitem__(db)
Parameters:

db (Database) – database to reference during transaction

Returns:

special database-handle for use in transaction

Return type:

DatabaseTransaction

Obtain a reference to the database for use within the transaction. This object supports the same APIs as Database, but any reads or writes will be made within the context of the transaction.

Schema Definition

class Schema(key_parts, value_parts)
Parameters:
  • key_parts (list) – a list of Index objects (or a single index object) to use as the key of the database.

  • value_parts (list) – a list of Index objects (or a single index object) to use for the values stored in the database.

The schema defines the structure of the keys and values for a given Database. They can be comprised of a single index-type or multiple indexes for composite keys or values.

Example:

# Simple schema defining text keys and values.
simple = Schema(StringIndex('key'), StringIndex('value'))

# Schema with composite key for storing timestamps and event-types,
# along with msgpack-serialized data as the value.
event_schema = Schema(
    [U64Index('timestamp'), StringIndex('type')],
    [MsgPackIndex('value')])

Schemas are used when adding databases using the Sophia.add_database() method.

add_key(index)
Parameters:

index (BaseIndex) – an index object to add to the key parts.

Add an index to the key. Allows Schema to be built-up programmatically.

add_value(index)
Parameters:

index (BaseIndex) – an index object to add to the value parts.

Add an index to the value. Allows Schema to be built-up programmatically.

classmethod key_value()

Short-hand for creating a simple text schema consisting of a single StringIndex for both the key and the value.

class BaseIndex(name)
Parameters:

name (str) – Name for the key- or value-part the index represents.

Indexes are used to define the key and value portions of a Schema. Traditional key/value databases typically only supported a single-value, single-datatype key and value (usually bytes). Sophia is different in that keys or values can be comprised of multiple parts with differing data-types.

For example, to emulate a typical key/value store:

schema = Schema([BytesIndex('key')], [BytesIndex('value')])
db = env.add_database('old_school', schema)

Suppose we are storing time-series event logs. We could use a 64-bit integer for the timestamp (in micro-seconds) as well as a key to denote the event-type. The value could be arbitrary msgpack-encoded data:

key = [U64Index('timestamp'), StringIndex('type')]
value = [MsgPackIndex('value')]
events = env.add_database('events', Schema(key, value))
class SerializedIndex(name, serialize, deserialize)
Parameters:
  • name (str) – Name for the key- or value-part the index represents.

  • serialize – a callable that accepts data and returns bytes.

  • deserialize – a callable that accepts bytes and deserializes the data.

The SerializedIndex can be used to transparently store data as bytestrings. For example, you could use a library like msgpack or pickle to transparently store and retrieve Python objects in the database:

key = StringIndex('key')
value = SerializedIndex('value', pickle.dumps, pickle.loads)
pickled_db = env.add_database('data', Schema([key], [value]))

Note: sophy already provides indexes for JsonIndex, MsgPackIndex and PickleIndex.

class BytesIndex(name)

Store arbitrary binary data in the database.

class StringIndex(name)

Store text data in the database as UTF8-encoded bytestrings. When reading from a StringIndex, data is decoded and returned as unicode.

class JsonIndex(name)

Store data as UTF8-encoded JSON. Python objects will be transparently serialized and deserialized when writing and reading, respectively.

class MsgPackIndex(name)

Store data using the msgpack serialization format. Python objects will be transparently serialized and deserialized when writing and reading.

Note: Requires the msgpack-python library.

class PickleIndex(name)

Store data using Python’s pickle serialization format. Python objects will be transparently serialized and deserialized when writing and reading.

class UUIDIndex(name)

Store UUIDs. Python uuid.UUID() objects will be stored as raw bytes and decoded to uuid.UUID() instances upon retrieval.

class U64Index(name)
class U32Index(name)
class U16Index(name)
class U8Index(name)

Store unsigned integers of the given sizes.

class U64RevIndex(name)
class U32RevIndex(name)
class U16RevIndex(name)
class U8RevIndex(name)

Store unsigned integers of the given sizes in reverse order.

Cursor

class Cursor

Cursor handle for a Database. This object is not created directly but through the Database.cursor() method or one of the database methods that returns a row iterator (e.g. Database.items()).

Cursors are iterable and, depending how they were configured, can return keys, values or key/value pairs.

Settings

Sophia supports a wide range of settings and configuration options. These settings are also documented in the Sophia documentation.

Environment settings

The following settings are available as properties on Sophia:

Setting

Type

Description

version

string, ro

Get current Sophia version

version_storage

string, ro

Get current Sophia storage version

build

string, ro

Get git commit hash of build

status

string, ro

Get environment status (eg online)

errors

int, ro

Get number of errors

error

string, ro

Get last error description

path

string, ro

Get current Sophia environment directory

Backups

backup_path

string

Set backup path

backup_run

method

Start backup in background (non-blocking)

backup_active

int, ro

Show if backup is running

backup_last

int, ro

Show ID of last-completed backup

backup_last_complete

int, ro

Show if last backup succeeded

Scheduler

scheduler_threads

int

Get or set number of worker threads

scheduler_trace(thread_id)

method

Get a worker trace for given thread

Transaction Manager

transaction_online_rw

int, ro

Number of active read/write transactions

transaction_online_ro

int, ro

Number of active read-only transactions

transaction_commit

int, ro

Total number of completed transactions

transaction_rollback

int, ro

Total number of transaction rollbacks

transaction_conflict

int, ro

Total number of transaction conflicts

transaction_lock

int, ro

Total number of transaction locks

transaction_latency

string, ro

Average transaction latency from start to end

transaction_log

string, ro

Average transaction log length

transaction_vlsn

int, ro

Current VLSN

transaction_gc

int, ro

SSI GC queue size

Metrics

metric_lsn

int, ro

Current log sequential number

metric_tsn

int, ro

Current transaction sequential number

metric_nsn

int, ro

Current node sequential number

metric_dsn

int, ro

Current database sequential number

metric_bsn

int, ro

Current backup sequential number

metric_lfsn

int, ro

Current log file sequential number

Write-ahead Log

log_enable

int

Enable or disable transaction log

log_path

string

Get or set folder for log directory

log_sync

int

Sync transaction log on every commit

log_rotate_wm

int

Create a new log after “rotate_wm” updates

log_rotate_sync

int

Sync log file on every rotation

log_rotate

method

Force Sophia to rotate log file

log_gc

method

Force Sophia to garbage-collect log file pool

log_files

int, ro

Number of log files in the pool

Database settings

The following settings are available as properties on Database. By default, Sophia uses pread(2) to read from disk. When mmap-mode is on (by default), Sophia handles all requests by directly accessing memory-mapped node files.

Setting

Type

Description

database_name

string, ro

Get database name

database_id

int, ro

Database sequential ID

database_path

string, ro

Directory for storing data

mmap

int

Enable or disable mmap-mode

direct_io

int

Enable or disable O_DIRECT mode.

sync

int

Sync node file on compaction completion

expire

int

Enable or disable key expiration

compression

string

Specify compression type: lz4, zstd, none (default)

limit_key

int, ro

Scheme key size limit

limit_field

int

Scheme field size limit

Index

index_memory_used

int, ro

Memory used by database for in-memory key indexes

index_size

int, ro

Sum of nodes size in bytes (e.g. database size)

index_size_uncompressed

int, ro

Full database size before compression

index_count

int, ro

Total number of keys in db, includes unmerged dupes

index_count_dup

int, ro

Total number of transactional duplicates

index_read_disk

int, ro

Number of disk reads since start

index_read_cache

int, ro

Number of cache reads since start

index_node_count

int, ro

Number of active nodes

index_page_count

int, ro

Total number of pages

Compaction

compaction_cache

int

Total write cache size used for compaction

compaction_checkpoint

int

compaction_node_size

int

Set a node file size in bytes.

compaction_page_size

int

Set size of page

compaction_page_checksum

int

Validate checksum during compaction

compaction_expire_period

int

Run expire check process every N seconds

compaction_gc_wm

int

GC starts when watermark value reaches N dupes

compaction_gc_period

int

Check for a gc every N seconds

Performance

stat_documents_used

int, ro

Memory used by allocated document

stat_documents

int, ro

Number of currently allocated documents

stat_field

string, ro

Average field size

stat_set

int, ro

Total number of Set operations

stat_set_latency

string, ro

Average Set latency

stat_delete

int, ro

Total number of Delete operations

stat_delete_latency

string, ro

Average Delete latency

stat_get

int, ro

Total number of Get operations

stat_get_latency

string, ro

Average Get latency

stat_get_read_disk

string, ro

Average disk reads by Get operation

stat_get_read_cache

string, ro

Average cache reads by Get operation

stat_pread

int, ro

Total number of pread operations

stat_pread_latency

string, ro

Average pread latency

stat_cursor

int, ro

Total number of cursor operations

stat_cursor_latency

string, ro

Average cursor latency

stat_cursor_read_disk

string, ro

Average disk reads by Cursor operation

stat_cursor_read_cache

string, ro

Average cache reads by Cursor operation

stat_cursor_ops

string, io

Average number of keys read by Cursor operation

Scheduler

scheduler_gc

int, ro

Show if GC operation is in progress

scheduler_expire

int, ro

Show if expire operation is in progress

scheduler_backup

int, ro

Show if backup operation is in progress

scheduler_checkpoint

int, ro