Module Plasma_inodecache

module Plasma_inodecache: sig .. end

The inodecache stores the metadata found in inodes, namely the inodeinfo struct, and subsets of the block list.

Caching and transaction views

There is only one inodecache for all transactions - caching makes most sense if one transaction can directly profit from the knowledge gained by previous or parallel transactions. As PlasmaFS partly isolates transactions from each other, it is possible that one transaction sees a different version of the metadata than another transaction. There is the question how to deal with this.

First, we ensure that only metadata can be put into the cache where we know it is the current committed version, or at least was in the past. Non-committed data should not go into the cache - this would break fundamental assumptions about the transaction isolation.

Second, the version stored in the cache can only be replaced by a newer version. When a transaction keeps an old view and cannot yet see this newer version, the cache is useless for this transaction, and the server must be directly contacted to get information about the historical view. There is no disadvantage for the transaction, it is only slightly slower.

Effectively, a transaction can only profit from the cache if the transaction needs the newest version of the metadata.

For clean semantics data retrieved from the inodecache must only be used in transactions that only read data and never write. This affects the functions get_ii_via_cache_e and get_blocks_via_cache_e. Otherwise there is the danger that the cache returns the newest data which might not be the view the transaction has.

The functions that put data into the cache can be called from any kind of transaction. The functions implement criterions that reliably recognize cachable data and drop private views of transactions. This affects the functions get_ii_e and get_blocks_e.

How to use cached data

By nature, the queried data is already outdated when it is returned to the caller. This is true for this cache, but it is also true for many server calls if the transaction is not kept open.

This problem does not make it impossible to use the data. It depends very much on the field, though. Generally, the inodeinfo struct is at most refreshed once per second. The blocklists are only refreshed if there is a new version of the inodeinfo struct (so this is also limited to one refresh per second). If the application can live with this delay, there is no problem at all.

Many applications need only exact metadata the first time an inode is (re-)opened. To get this, it can be enforced to reload the metadata with get_ii_e.

There is a special problem with the blocklists. If an old version of the blocklist is used there is the danger that the wrong blocks are accessed. Of course, this is way more incorrect than just an mtime which is one second off. There are two ways to cope with the problem:

Do not use the inodecache for this purpose. If the transaction is still open for which the blocklist is requested, the namenode ensures that the blocklist remains valid for the duration of the transaction.
Use the data from the inodecache, but validate it again after (!) reading a block. If it turns out the wrong block was accessed, the cache needs to be refreshed, and the read is repeated. If necessary, this refresh cycle is repeated until the block is read and the following (!) validation approves that it is the right one. Validation is done with check_up_to_date_e.

type ic_manager

val create_manager : Unixqueue.event_system -> ic_manager

create_manager esys: Creates a new manager. Initially the cache is disabled (behaving as if not caching anything).

val enable : ic_manager -> Rpc_proxy.ManagedClient.t -> unit

Enables the cache. mc is used to access the server-side inodecache functionality (is_up_to_date)

val disable : ic_manager -> unit

Disables the cache. The cacheclient is not shut down

class type lazy_transaction = object .. end

A lazy transaction is started the first time get_tid_e is called.

val get_ii_e : ic_manager ->
       Rpc_proxy.ManagedClient.mclient ->
       int64 -> int64 -> Plasma_rpcapi_aux.rinodeinfo Uq_engines.engine

get_ii_e icm mc tid inode: Gets the inodeinfo for inode using transaction tid and the managed client mc. The data is directly requested from the namenode server (uncached data). The result is returned, but also put into the cache if possible.

Caching is possible if:

The inodeinfo lookup was successful (no caching of negative results)
The cache is enabled
The retrieved version is a committed version
The retrieved version is newer than the cached version (if any)

val get_ii_from_cache_e : ic_manager ->
       int64 -> Plasma_rpcapi_aux.inodeinfo option Uq_engines.engine

get_ii_from_cache_e icm inode: Looks the inodeinfo up for inode. If nothing is found in the cache, None is returned. If an entry is found, it is returned as Some ii. Aged entries are additionally validated, and if the validation fails, None is returned.

This function never requests data from the server. All data comes from the cache, and this means it was retrieved using a historic transaction. Although the data is validated in some cases, it is generally not safe to assume it is the most recent version.

Validation means:

very old cache data is dropped
very recent cache data is believed without cause
medium-aged data is validated by asking the server whether it is still the most recent version

val get_ii_via_cache_e : ic_manager ->
       Rpc_proxy.ManagedClient.mclient ->
       lazy_transaction ->
       int64 -> Plasma_rpcapi_aux.rinodeinfo Uq_engines.engine

get_ii_via_cache_e icm mc lt inode: First tries to get the inodeinfo for inode from the cache (as in get_ii_from_cache_e), and if this fails, requests the inodeinfo from the server (as in get_ii_e). In the latter case, lt may be used to obtain the transaction ID tid. It can also happen, though, that the transaction of a concurrently running get_ii_via_cache_e is actually used for requesting the value from the server.

Note that this function has a somewhat strange semantics. If the result comes from the cache, the latest known committed version is returned - even if the transaction tid modified this version already. However, if the result comes from the namenode, the version as viewed by the transaction is returned. Because of this asymmetry it is only safe to use this function if it is known that the transaction did not modify the inode before.

val invalidate_ii : ic_manager -> int64 -> unit

invalidate_ii icm inode: Removes information about this inode from the cache

val check_up_to_date_e : ic_manager ->
       Rpc_proxy.ManagedClient.mclient ->
       lazy_transaction ->
       int64 -> int64 -> bool Uq_engines.engine

check_up_to_date_e icm mc lt inode seqno: Checks whether seqno is the most recent committed sequence number of inode. If true, this condition held at the time this function started execution, but needs not hold anymore at the time the function returns. If false, seqno was already old at call time, or an error occurred.

The lt object must return a transaction that was never used for any kind of data modification.

This function also works if the cache is disabled, but is slower in this case.

Blocklists

type bl_cache

A blocklist cache

val create_bl_cache : Rpc_proxy.ManagedClient.mclient ->
       lazy_transaction -> int64 -> bl_cache

create_bl_cache mc lt inode: Create a cache for this transaction. This transaction should only be used for reads (otherwise the cache will have strange semantics).

val get_blocks_e : bl_cache ->
       int64 -> int64 -> int64 -> Plasma_rpcapi_aux.rblocklist Uq_engines.engine

get_blocks_e blc index number req_seqno: Gets the blocklist for inode for the index range index to index+number-1. The transaction tid is used for the server request.

The blocks are always retrieved from the server, never from the cache. The result, however, is put into the cache.

It is reasonable to check the presence of the inodeinfo prior to calling this function (via get_ii_e or get_ii_via_cache_e).

In addition to filling the cache, also old entries are removed from the cache.

val get_blocks_from_cache_e : bl_cache ->
       int64 -> Plasma_rpcapi_aux.blocklist option Uq_engines.engine

get_blocks_from_cache_e blc index: Tries to get the blocks for index from the cache for inode. If nothing is found in the cache, None is returned. If an entry is found, it is returned as Some bl. Aged entries are additionally validated, and if the validation fails, None is returned.

This function never requests data from the server. All data comes from the cache, and this means it was retrieved using a possibly historic transaction. Although the data is validated in some cases, it is generally not safe to assume it is the most recent version.

See also get_ii_from_cache_e for how the data is validated.

val get_blocks_via_cache_e : bl_cache ->
       int64 -> Plasma_rpcapi_aux.rblocklist Uq_engines.engine

get_blocks_via_cache_e blc index: First tries to get the blocklist for index and inode from the cache (as in get_blocks_from_cache_e), and if this fails, requests the blocklist from the server (as in get_blocks_e).

val snapshot_blocks_e : ?append:bool ->
       bl_cache ->
       Plasma_rpcapi_aux.inodeinfo -> Plasma_rpcapi_aux.rvoid Uq_engines.engine

Loads the complete blocklist - which effectively means to take a snapshot of the whole file. Returns `ok on success. If `econflict, the snapshot must be repeated because the file was modified in the meantime.

After enabling the snapshot feature, the functions get_blocks_from_cache and get_blocks_via_cache_e always respond from their cached data.

If append, only the last block of the file is included in the snapshot (see Plasma_client.snapshot for explanations).

val override_blocks : bl_cache ->
       int64 -> int64 -> Plasma_rpcapi_aux.blockinfo list -> unit

override_blocks blc block n blocklist: Puts these blocks into the cache, overriding whatever is there. Useful for snapshots.

val forget_blocks : bl_cache -> int64 -> int64 -> unit

forget_blocks blc block n: Forgets these blocks

val notify_got_eio : bl_cache -> unit

We got an EIO and have to update the blocklist. Currently we just drop the blocklist, so it needs to be reloaded from the server.

val expand_blocklist : Plasma_rpcapi_aux.blockinfo list -> Plasma_rpcapi_aux.blockinfo list

Expands the blockinfo structs, and returns only structs with length=1

(The other functions already return expanded lists.)

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge