Sophy API¶
-
class
SophiaError
¶ General exception class used to indicate error returned by Sophia database.
Environment¶
-
class
Sophia
(path)¶ Parameters: path (str) – Directory path to store environment and databases. Environment object providing access to databases and for controlling transactions.
Example of creating environment, attaching a database and reading/writing data:
from sophy import * # Environment for managing one or more databases. env = Sophia('/tmp/sophia-test') # Schema describes the indexes that comprise the key and value portions # of a database. kv_schema = Schema([StringIndex('key')], [StringIndex('value')]) db = env.add_data('kv', kv_schema) # We need to open the env after configuring the database(s), in order # to read/write data. assert env.open(), 'Failed to open environment!' # We can use dict-style APIs to read/write key/value pairs. db['k1'] = 'v1' assert db['k1'] == 'v1' # Close the env when finished. assert env.close(), 'Failed to close environment!'
-
open
()¶ Returns: Boolean indicating success. Open the environment. The environment must be opened in order to read and write data to the configured databases.
-
close
()¶ Returns: Boolean indicating success. Close the environment.
-
add_database
(name, schema)¶ Parameters: - name (str) – database name
- schema (Schema) – schema for keys and values.
Returns: a database instance
Return type: Add or declare a database. Environment must be closed to add databases. The
Schema
will declare the data-types and structure of the key- and value-portion of the database.env = Sophia('/path/to/db-env') # Declare an events database with a multi-part key (ts, type) and # a msgpack-serialized data field. events_schema = Schema( key_parts=[U64Index('timestamp'), StringIndex('type')], value_parts=[MsgPackIndex('data')]) db = env.add_database('events', events_schema) # Open the environment for read/write access to the database. env.open() # We can now write to the database. db[current_time(), 'init'] = {'msg': 'event logging initialized'}
-
remove_database
(name)¶ Parameters: name (str) – database name Remove a database from the environment. Environment must be closed to remove databases. This method does really not have any practical value but is provided for consistency.
-
get_database
(name)¶ Returns: the database corresponding to the provided name Return type: Database
Obtain a reference to the given database, provided the database has been added to the environment by a previous call to
add_database()
.
-
__getitem__
(name)¶ Short-hand for
get_database()
.
-
transaction
()¶ Returns: a transaction handle. Return type: Transaction
Create a transaction handle which can be used to execute a transaction on the databases in the environment. The returned transaction can be used as a context-manager.
Example:
env = Sophia('/tmp/sophia-test') db = env.add_database('test', Schema.key_value()) env.open() with env.transaction() as txn: t_db = txn[db] t_db['k1'] = 'v1' t_db.update(k2='v2', k3='v3') # Transaction has been committed. print(db['k1'], db['k3']) # prints "v1", "v3"
See
Transaction
for more information.
-
Database¶
-
class
Database
¶ Database interface. This object is not created directly, but references can be obtained via
Sophia.add_database()
orSophia.get_database()
.For example:
env = Sophia('/path/to/data') kv_schema = Schema(StringIndex('key'), MsgPackIndex('value')) kv_db = env.add_database('kv', kv_schema) # Another reference to "kv_db": kv_db = env.get_database('kv') # Same as above: kv_db = env['kv']
-
set
(key, value)¶ Parameters: - key – key corresponding to schema (e.g. scalar or tuple).
- value – value corresponding to schema (e.g. scalar or tuple).
Returns: No return value.
Store the value at the given key. For single-index keys or values, a scalar value may be provided as the key or value. If a composite or multi-index key or value is used, then a
tuple
must be provided.Examples:
simple = Schema(StringIndex('key'), StringIndex('value')) simple_db = env.add_database('simple', simple) composite = Schema( [U64Index('timestamp'), StringIndex('type')], [MsgPackIndex('data')]) composite_db = env.add_database('composite', composite) env.open() # Open env to access databases. # Set k1=v1 in the simple key/value database. simple_db.set('k1', 'v1') # Set new value in composite db. Note the key is a tuple and, since # the value is serialized using msgpack, we can transparently store # data-types like dicts. composite_db.set((current_time, 'evt_type'), {'msg': 'foo'})
-
get
(key[, default=None])¶ Parameters: - key – key corresponding to schema (e.g. scalar or tuple).
- default – default value if key does not exist.
Returns: value of given key or default value.
Get the value at the given key. If the key does not exist, the default value is returned.
If a multi-part key is defined for the given database, the key must be a tuple.
Example:
simple_db.set('k1', 'v1') simple_db.get('k1') # Returns "v1". simple_db.get('not-here') # Returns None.
-
delete
(key)¶ Parameters: key – key corresponding to schema (e.g. scalar or tuple). Returns: No return value Delete the given key, if it exists. If a multi-part key is defined for the given database, the key must be a tuple.
Example:
simple_db.set('k1', 'v1') simple_db.delete('k1') # Deletes "k1" from database. simple_db.exists('k1') # False.
-
exists
(key)¶ Parameters: key – key corresponding to schema (e.g. scalar or tuple). Returns: Boolean indicating if key exists. Return type: bool Return whether the given key exists. If a multi-part key is defined for the given database, the key must be a tuple.
-
multi_set
([__data=None[, **kwargs]])¶ Parameters: - __data (dict) – Dictionary of key/value pairs to set.
- kwargs – Specify key/value pairs as keyword-arguments.
Returns: No return value
Set multiple key/value pairs efficiently.
-
multi_get
(*keys)¶ Parameters: keys – key(s) to retrieve Returns: a list of values associated with the given keys. If a key does not exist a None
will be indicated for the value.Return type: list Get multiple values efficiently. Returned as a list of values corresponding to the
keys
argument, with missing values asNone
.Example:
db.update(k1='v1', k2='v2', k3='v3') db.multi_get('k1', 'k3', 'k-nothere') # ['v1', 'v3', None]
-
multi_get_dict
(keys)¶ Parameters: keys (list) – list of keys to get Returns: a list of values associated with the given keys. If a key does not exist a None
will be indicated for the value.Return type: list Get multiple values efficiently. Returned as a dict of key/value pairs. Missing values are not represented in the returned dict.
Example:
db.update(k1='v1', k2='v2', k3='v3') db.multi_get_dict(['k1', 'k3', 'k-nothere']) # {'k1': 'v1', 'k3': 'v3'}
-
multi_delete
(*keys)¶ Parameters: keys – key(s) to delete Returns: No return value Efficiently delete multiple keys.
-
get_range
(start=None, stop=None, reverse=False)¶ Parameters: - start – start key (omit to start at first record).
- stop – stop key (omit to stop at the last record).
- reverse (bool) – return range in reverse.
Returns: a generator that yields the requested key/value pairs.
Fetch a range of key/value pairs from the given start-key, up-to and including the stop-key (if given).
-
keys
()¶ Return a cursor for iterating over the keys in the database.
-
values
()¶ Return a cursor for iterating over the values in the database.
-
items
()¶ Return a cursor for iterating over the key/value pairs in the database.
-
__getitem__
(key_or_slice)¶ Parameters: key_or_slice – key or range of keys to retrieve. Returns: value of given key, or an iterator over the range of keys. Raises: KeyError if single key requested and does not exist. Retrieve a single value or a range of values, depending on whether the key represents a single row or a slice of rows.
Additionally, if a slice is given, the start and stop values can be omitted to indicate you wish to start from the first or last key, respectively.
-
__len__
()¶ Equivalent to iterating over all keys and returning count. This is the most accurate way to get the total number of keys, but is not very efficient. An alternative is to use the
Database.index_count
property, which returns an approximation of the number of keys in the database.
-
cursor
(order='>=', key=None, prefix=None, keys=True, values=True)¶ Parameters: - order (str) – ordering semantics (default is “>=”)
- key – key to seek to before iterating.
- prefix – string prefix to match.
- keys (bool) – return keys when iterating.
- values (bool) – return values when iterating.
Create a cursor with the given semantics. Typically you will want both
keys=True
andvalues=True
(the defaults), which will cause the cursor to yield a 2-tuple consisting of(key, value)
during iteration.
-
Transaction¶
-
class
Transaction
¶ Transaction handle, used for executing one or more operations atomically. This class is not created directly - use
Sophia.transaction()
.The transaction can be used as a context-manager. To read or write during a transaction, you should obtain a transaction-specific handle to the database you are operating on.
Example:
env = Sophia('/tmp/my-env') db = env.add_database('kv', Schema.key_value()) env.open() with env.transaction() as txn: tdb = txn[db] # Obtain reference to "db" in the transaction. tdb['k1'] = 'v1' tdb.update(k2='v2', k3='v3') # At the end of the wrapped block, the transaction is committed. # The writes have been recorded: print(db['k1'], db['k3']) # ('v1', 'v3')
-
begin
()¶ Begin a transaction.
-
commit
()¶ Raises: SophiaError Commit all changes. An exception can occur if:
- The transaction was rolled back, either explicitly or implicitly due to conflicting changes having been committed by a different transaction. Not recoverable.
- A concurrent transaction is open and must be committed before this transaction can commit. Possibly recoverable.
-
rollback
()¶ Roll-back any changes made in the transaction.
-
__getitem__
(db)¶ Parameters: db (Database) – database to reference during transaction Returns: special database-handle for use in transaction Return type: DatabaseTransaction
Obtain a reference to the database for use within the transaction. This object supports the same APIs as
Database
, but any reads or writes will be made within the context of the transaction.
-
Schema Definition¶
-
class
Schema
(key_parts, value_parts)¶ Parameters: - key_parts (list) – a list of
Index
objects (or a single index object) to use as the key of the database. - value_parts (list) – a list of
Index
objects (or a single index object) to use for the values stored in the database.
The schema defines the structure of the keys and values for a given
Database
. They can be comprised of a single index-type or multiple indexes for composite keys or values.Example:
# Simple schema defining text keys and values. simple = Schema(StringIndex('key'), StringIndex('value')) # Schema with composite key for storing timestamps and event-types, # along with msgpack-serialized data as the value. event_schema = Schema( [U64Index('timestamp'), StringIndex('type')], [MsgPackIndex('value')])
Schemas are used when adding databases using the
Sophia.add_database()
method.-
add_key
(index)¶ Parameters: index (BaseIndex) – an index object to add to the key parts. Add an index to the key. Allows
Schema
to be built-up programmatically.
-
add_value
(index)¶ Parameters: index (BaseIndex) – an index object to add to the value parts. Add an index to the value. Allows
Schema
to be built-up programmatically.
-
classmethod
key_value
()¶ Short-hand for creating a simple text schema consisting of a single
StringIndex
for both the key and the value.
- key_parts (list) – a list of
-
class
BaseIndex
(name)¶ Parameters: name (str) – Name for the key- or value-part the index represents. Indexes are used to define the key and value portions of a
Schema
. Traditional key/value databases typically only supported a single-value, single-datatype key and value (usually bytes). Sophia is different in that keys or values can be comprised of multiple parts with differing data-types.For example, to emulate a typical key/value store:
schema = Schema([BytesIndex('key')], [BytesIndex('value')]) db = env.add_database('old_school', schema)
Suppose we are storing time-series event logs. We could use a 64-bit integer for the timestamp (in micro-seconds) as well as a key to denote the event-type. The value could be arbitrary msgpack-encoded data:
key = [U64Index('timestamp'), StringIndex('type')] value = [MsgPackIndex('value')] events = env.add_database('events', Schema(key, value))
-
class
SerializedIndex
(name, serialize, deserialize)¶ Parameters: - name (str) – Name for the key- or value-part the index represents.
- serialize – a callable that accepts data and returns bytes.
- deserialize – a callable that accepts bytes and deserializes the data.
The
SerializedIndex
can be used to transparently store data as bytestrings. For example, you could use a library likemsgpack
orpickle
to transparently store and retrieve Python objects in the database:key = StringIndex('key') value = SerializedIndex('value', pickle.dumps, pickle.loads) pickled_db = env.add_database('data', Schema([key], [value]))
Note:
sophy
already provides indexes forJsonIndex
,MsgPackIndex
andPickleIndex
.
-
class
BytesIndex
(name)¶ Store arbitrary binary data in the database.
-
class
StringIndex
(name)¶ Store text data in the database as UTF8-encoded bytestrings. When reading from a
StringIndex
, data is decoded and returned as unicode.
-
class
JsonIndex
(name)¶ Store data as UTF8-encoded JSON. Python objects will be transparently serialized and deserialized when writing and reading, respectively.
-
class
MsgPackIndex
(name)¶ Store data using the msgpack serialization format. Python objects will be transparently serialized and deserialized when writing and reading.
Note: Requires the
msgpack-python
library.
-
class
PickleIndex
(name)¶ Store data using Python’s pickle serialization format. Python objects will be transparently serialized and deserialized when writing and reading.
-
class
UUIDIndex
(name)¶ Store UUIDs. Python
uuid.UUID()
objects will be stored as raw bytes and decoded touuid.UUID()
instances upon retrieval.
-
class
U64Index
(name)¶
-
class
U32Index
(name)¶
-
class
U16Index
(name)¶
-
class
U8Index
(name)¶ Store unsigned integers of the given sizes.
-
class
U64RevIndex
(name)¶
-
class
U32RevIndex
(name)¶
-
class
U16RevIndex
(name)¶
-
class
U8RevIndex
(name)¶ Store unsigned integers of the given sizes in reverse order.
Cursor¶
-
class
Cursor
¶ Cursor handle for a
Database
. This object is not created directly but through theDatabase.cursor()
method or one of the database methods that returns a row iterator (e.g.Database.items()
).Cursors are iterable and, depending how they were configured, can return keys, values or key/value pairs.
Settings¶
Sophia supports a wide range of settings and configuration options. These settings are also documented in the Sophia documentation.
Environment settings¶
The following settings are available as properties on Sophia
:
Setting | Type | Description |
---|---|---|
version | string, ro | Get current Sophia version |
version_storage | string, ro | Get current Sophia storage version |
build | string, ro | Get git commit hash of build |
status | string, ro | Get environment status (eg online) |
errors | int, ro | Get number of errors |
error | string, ro | Get last error description |
path | string, ro | Get current Sophia environment directory |
Backups | ||
backup_path | string | Set backup path |
backup_run | method | Start backup in background (non-blocking) |
backup_active | int, ro | Show if backup is running |
backup_last | int, ro | Show ID of last-completed backup |
backup_last_complete | int, ro | Show if last backup succeeded |
Scheduler | ||
scheduler_threads | int | Get or set number of worker threads |
scheduler_trace(thread_id) | method | Get a worker trace for given thread |
Transaction Manager | ||
transaction_online_rw | int, ro | Number of active read/write transactions |
transaction_online_ro | int, ro | Number of active read-only transactions |
transaction_commit | int, ro | Total number of completed transactions |
transaction_rollback | int, ro | Total number of transaction rollbacks |
transaction_conflict | int, ro | Total number of transaction conflicts |
transaction_lock | int, ro | Total number of transaction locks |
transaction_latency | string, ro | Average transaction latency from start to end |
transaction_log | string, ro | Average transaction log length |
transaction_vlsn | int, ro | Current VLSN |
transaction_gc | int, ro | SSI GC queue size |
Metrics | ||
metric_lsn | int, ro | Current log sequential number |
metric_tsn | int, ro | Current transaction sequential number |
metric_nsn | int, ro | Current node sequential number |
metric_dsn | int, ro | Current database sequential number |
metric_bsn | int, ro | Current backup sequential number |
metric_lfsn | int, ro | Current log file sequential number |
Write-ahead Log | ||
log_enable | int | Enable or disable transaction log |
log_path | string | Get or set folder for log directory |
log_sync | int | Sync transaction log on every commit |
log_rotate_wm | int | Create a new log after “rotate_wm” updates |
log_rotate_sync | int | Sync log file on every rotation |
log_rotate | method | Force Sophia to rotate log file |
log_gc | method | Force Sophia to garbage-collect log file pool |
log_files | int, ro | Number of log files in the pool |
Database settings¶
The following settings are available as properties on Database
. By
default, Sophia uses pread(2)
to read from disk. When mmap
-mode is on
(by default), Sophia handles all requests by directly accessing memory-mapped
node files.
Setting | Type | Description |
---|---|---|
database_name | string, ro | Get database name |
database_id | int, ro | Database sequential ID |
database_path | string, ro | Directory for storing data |
mmap | int | Enable or disable mmap-mode |
direct_io | int | Enable or disable O_DIRECT mode. |
sync | int | Sync node file on compaction completion |
expire | int | Enable or disable key expiration |
compression | string | Specify compression type: lz4, zstd, none (default) |
limit_key | int, ro | Scheme key size limit |
limit_field | int | Scheme field size limit |
Index | ||
index_memory_used | int, ro | Memory used by database for in-memory key indexes |
index_size | int, ro | Sum of nodes size in bytes (e.g. database size) |
index_size_uncompressed | int, ro | Full database size before compression |
index_count | int, ro | Total number of keys in db, includes unmerged dupes |
index_count_dup | int, ro | Total number of transactional duplicates |
index_read_disk | int, ro | Number of disk reads since start |
index_read_cache | int, ro | Number of cache reads since start |
index_node_count | int, ro | Number of active nodes |
index_page_count | int, ro | Total number of pages |
Compaction | ||
compaction_cache | int | Total write cache size used for compaction |
compaction_checkpoint | int | |
compaction_node_size | int | Set a node file size in bytes. |
compaction_page_size | int | Set size of page |
compaction_page_checksum | int | Validate checksum during compaction |
compaction_expire_period | int | Run expire check process every N seconds |
compaction_gc_wm | int | GC starts when watermark value reaches N dupes |
compaction_gc_period | int | Check for a gc every N seconds |
Performance | ||
stat_documents_used | int, ro | Memory used by allocated document |
stat_documents | int, ro | Number of currently allocated documents |
stat_field | string, ro | Average field size |
stat_set | int, ro | Total number of Set operations |
stat_set_latency | string, ro | Average Set latency |
stat_delete | int, ro | Total number of Delete operations |
stat_delete_latency | string, ro | Average Delete latency |
stat_get | int, ro | Total number of Get operations |
stat_get_latency | string, ro | Average Get latency |
stat_get_read_disk | string, ro | Average disk reads by Get operation |
stat_get_read_cache | string, ro | Average cache reads by Get operation |
stat_pread | int, ro | Total number of pread operations |
stat_pread_latency | string, ro | Average pread latency |
stat_cursor | int, ro | Total number of cursor operations |
stat_cursor_latency | string, ro | Average cursor latency |
stat_cursor_read_disk | string, ro | Average disk reads by Cursor operation |
stat_cursor_read_cache | string, ro | Average cache reads by Cursor operation |
stat_cursor_ops | string, io | Average number of keys read by Cursor operation |
Scheduler | ||
scheduler_gc | int, ro | Show if GC operation is in progress |
scheduler_expire | int, ro | Show if expire operation is in progress |
scheduler_backup | int, ro | Show if backup operation is in progress |
scheduler_checkpoint | int, ro |