Top

python_pachyderm.pfs_client module

Module variables

var BUFFER_SIZE

Classes

class ExtractValueIterator

Methods

def __init__(

self, r)

class PfsClient

Methods

def __init__(

self, host=None, port=None, auth_token=None)

Creates a client to connect to PFS.

Params: * host: The pachd host. Default is 'localhost', which is used with pachctl port-forward. * port: The port to connect to. Default is 30650. * auth_token: The authentication token; used if authentication is enabled on the cluster. Default to None.

def commit(

*args, **kwds)

A context manager for doing stuff inside a commit.

def create_repo(

self, repo_name, description=None)

Creates a new Repo object in PFS with the given name. Repos are the top level data object in PFS and should be used to store data of a similar type. For example rather than having a single Repo for an entire project you might have separate Repos for logs, metrics, database dumps etc.

Params: * repo_name: Name of the repo. * description: Repo description.

def delete_all(

self)

def delete_branch(

self, repo_name, branch_name)

Deletes a branch, but leaves the commits themselves intact. In other words, those commits can still be accessed via commit IDs and other branches they happen to be on.

Params: * reponame: The name of the repo. * branchname: The name of the branch to delete.

def delete_commit(

self, commit)

Deletes a commit.

Params: * commit: A tuple, string, or Commit object representing the commit.

def delete_file(

self, commit, path)

Deletes a file from a Commit. DeleteFile leaves a tombstone in the Commit, assuming the file isn't written to later attempting to get the file from the finished commit will result in not found error. The file will of course remain intact in the Commit's parent.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: The path to the file.

def delete_repo(

self, repo_name=None, force=False, all=False)

Deletes a repo and reclaims the storage space it was using.

Params: * repo_name: The name of the repo. * force: If set to true, the repo will be removed regardless of errors. This argument should be used with care. * all: Delete all repos.

def finish_commit(

self, commit)

Ends the process of committing data to a Repo and persists the Commit. Once a Commit is finished the data becomes immutable and future attempts to write to it with PutFile will error.

Params: * commit: A tuple, string, or Commit object representing the commit.

def flush_commit(

self, commits, repos=())

Blocks until all of the commits which have a set of commits as provenance have finished. For commits to be considered they must have all of the specified commits as provenance. This in effect waits for all of the jobs that are triggered by a set of commits to complete. It returns an error if any of the commits it's waiting on are cancelled due to one of the jobs encountering an error during runtime. Note that it's never necessary to call FlushCommit to run jobs, they'll run no matter what, FlushCommit just allows you to wait for them to complete and see their output once they do. This returns an iterator of CommitInfo objects.

Params: * commits: A commit or a list of commits to wait on. * repos: Optional. Only the commits up to and including those repos. will be considered, otherwise all repos are considered.

def get_file(

self, commit, path, offset_bytes=0, size_bytes=0, extract_value=True)

Returns an iterator of the contents contents of a file at a specific Commit.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: The path of the file. * offsetbytes: Optional. specifies a number of bytes that should be skipped in the beginning of the file. * sizebytes: Optional. limits the total amount of data returned, note you will get fewer bytes than size if you pass a value larger than the size of the file. If size is set to 0 then all of the data will be returned. * extract_value: If True, then an ExtractValueIterator will be return, which will iterate over the bytes of the file. If False, then the protobuf response iterator will return.

def get_files(

self, commit, paths, recursive=False)

Returns the contents of a list of files at a specific Commit as a dictionary of file paths to data.

Params: * commit: A tuple, string, or Commit object representing the commit. * paths: A list of paths to retrieve. * recursive: If True, will go into each directory in the list recursively.

def glob_file(

self, commit, pattern)

def inspect_commit(

self, commit)

Returns info about a specific Commit.

Params: * commit: A tuple, string, or Commit object representing the commit.

def inspect_file(

self, commit, path)

Returns info about a specific file.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: Path to file.

def inspect_repo(

self, repo_name)

Returns info about a specific Repo.

Params: * repo_name: Name of the repo.

def list_branch(

self, repo_name)

Lists the active Branch objects on a Repo.

Params: * repo_name: The name of the repo.

def list_commit(

self, repo_name, to_commit=None, from_commit=None, number=0)

Gets a list of CommitInfo objects.

Params: * reponame: If only repo_name is given, all commits in the repo are returned. * tocommit: Optional. Only the ancestors of to, including to itself, are considered. * from_commit: Optional. Only the descendants of from, including from itself, are considered. * number: Optional. Determines how many commits are returned. If number is 0, all commits that match the aforementioned criteria are returned.

def list_file(

self, commit, path, recursive=False)

Lists the files in a directory.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: The path to the directory. * recursive: If True, continue listing the files for sub-directories.

def list_repo(

self)

Returns info about all Repos.

def provenances_for_repo(

self, repo_name)

def put_file_bytes(

self, commit, path, value, delimiter=0, target_file_datums=0, target_file_bytes=0)

Uploads a binary bytes array as file(s) in a certain path.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: Path in the repo the file(s) will be written to. * value: The data bytes array, or an iterator returning chunked byte arrays. * delimiter: Optional. causes data to be broken up into separate files with path as a prefix. * targetfiledatums: Optional. Specifies the target number of datums in each written file. It may be lower if data does not split evenly, but will never be higher, unless the value is 0. * targetfilebytes: Specifies the target number of bytes in each written file, files may have more or fewer bytes than the target.

def put_file_url(

self, commit, path, url, recursive=False)

Puts a file using the content found at a URL. The URL is sent to the server which performs the request.

Params: * commit: A tuple, string, or Commit object representing the commit. * path: The path to the file. * url: The url of the file to put. * recursive: allow for recursive scraping of some types URLs for example on s3:// urls.

def set_branch(

self, commit, branch_name)

Sets a commit and its ancestors as a branch.

Params: * commit: A tuple, string, or Commit object representing the commit. * branch_name: The name for the branch to set.

def start_commit(

self, repo_name, branch=None, parent=None)

Begins the process of committing data to a Repo. Once started you can write to the Commit with PutFile and when all the data has been written you must finish the Commit with FinishCommit. NOTE, data is not persisted until FinishCommit is called. A Commit object is returned.

Params: * repo_name: The name of the repo. * branch: A more convenient way to build linear chains of commits. When a commit is started with a non-empty branch the value of branch becomes an alias for the created Commit. This enables a more intuitive access pattern. When the commit is started on a branch the previous head of the branch is used as the parent of the commit. * parent: Specifies the parent Commit, upon creation the new Commit will appear identical to the parent Commit, data can safely be added to the new commit without affecting the contents of the parent Commit. You may pass "" as parentCommit in which case the new Commit will have no parent and will initially appear empty.

def subscribe_commit(

self, repo_name, branch, from_commit_id=None)

SubscribeCommit is like ListCommit but it keeps listening for commits as they come in. This returns an iterator Commit objects.

Params: * reponame: Name of the repo. * branch: Branch to subscribe to. * fromcommit_id: Optional. Only commits created since this commit are returned.