workflow¶
-
workflow.
Workflow
= <class 'anadama2.workflow.Workflow'>¶
-
class
anadama2.workflow.
Workflow
(storage_backend=None, grid=None, strict=False, vars=None, version=None, description=None, remove_options=None, document=None, cli=True)[source] Create a Workflow.
Parameters: - storage_backend (instance of any
anadama2.backends.BaseBackend
subclass or None.) – Lookup and save dependency information from this object. IfNone
is passed (the default), the default backend fromanadama2.backends.default()
is used. - grid (objects implementing the interface of :class:`anadama2.grid.Dummy) – Use this object to configure the run context to submit tasks to a compute grid.
- strict (bool) – Enable strict mode. If strict, whenever a task is
added that depends on something that is not the target of
another task (or isn’t marked with
\(anadama2.workflow.Workflow.already_exists\)), raise a
KeyError
. If not strict, and the Tracked object.exists()
automatically do what’s necessary to track the object; if.exists()
is False, raise a KeyError. - vars (instance of any
anadama2.cli.Configuration
class or None.) – Provide a custom Configuration class for command line options. - version (str) – The version of the workflow. This version will be used for
the command line option
--version
. - description (str) – A description of the workflow. This description
will be used in the command line
--help
message. - remove_options (list) – A list of options to remove
- document – Provide a custom Document class.
Type: document: instance of any :class: `anadama2.Document’ class or None.
-
add_archive
(depends, targets, archive_software=None, remove_log=None)[source]¶ Create an archive including the dependencies. Name it the target. This adds a
anadama2.Task
to the workflow to create the archive.
-
add_argument
(name, **kwargs)[source]¶ This function adds an argument to the configuration object provided to the workflow. Arguments can alternatively be added to the configuration object before it is provided to the workflow. See the
add
function documentation for your Configuration class, for more information. The default configuration class isanadama2.cli.Configuration
.
-
add_document
(templates, depends=None, targets=None, vars=None, table_of_contents=None)[source]¶ Create and add a group of
anadama2.Task
to the workflow. This task will create a document which will be the target(s) provided. The variables will be passed on to the template and be available when the document is generated from the template. The document class provided to the workflow will be used to create the document.
-
add_task
(actions=None, depends=None, targets=None, name=None, visible=True, interpret_deps_and_targs=True, **kwargs)[source]¶ Create and add a
anadama2.Task
to the workflow. This function can be used as a decorator to set a function as the sole action.Extra keyword arguments can be used as formatting values similar to
[depends[0]]
. Seeanadama2.helpers.parse_sh()
Parameters: - actions (str or callable or list of str or
list of callable) – The actions to be performed to complete the
task. Strings or lists of strings are interpreted as shell
commands according to
anadama2.helpers.parse_sh()
. If given just a string or just a callable, this method treats it as a one-item list of the string or callable. - depends (str or
anadama2.tracked.Base
or list of str or list ofanadama2.tracked.Base
) – The dependencies of the task. The task must have these dependencies before executing the actions. Strings or lists of strings are interpreted as filenames and turned into objects of typeanadama2.tracked.HugeTrackedFile
. If given just a string or just aanadama2.tracked.Base
, this method treats it as a one-item list of the argument provided. - targets (str or
anadama2.tracked.Base
or list of str or list ofanadama2.tracked.Base
) – The targets of the task. The task must produce these targets after executing the actions to be considered as “success”. Strings or lists of strings are interpreted as filenames and turned into objects of typeanadama2.tracked.HugeTrackedFile
. If given just a string or just aanadama2.tracked.Base
, this method treats it as a one-item list of the argument provided. - name (str) – A name for the task. Task names must be unique within a run context.
- visible (bool) – Whether to show this task on the console. Set
to
False
if it should only be in the debug log. - interpret_deps_and_targs (bool) – Should I use
anadama2.helpers.parse_sh()
to change[depends[0]]
and[targets[0]]
into the first item in depends and the first item in targets? Default is True
Returns: The
anadama2.Task
just created- actions (str or callable or list of str or
list of callable) – The actions to be performed to complete the
task. Strings or lists of strings are interpreted as shell
commands according to
-
add_task_gridable
(actions=None, depends=None, targets=None, name=None, interpret_deps_and_targs=True, **gridopts)[source]¶ Add a task to be launched on a grid computing system as specified in the
grid
option ofanadama2.workflow.Workflow
. By default, this method is a synonym foranadama2.workflow.Workflow.add_task()
. Please see theadd_task
documentation for your powerup of choice e.g.anadama2.grid.slurm.Slurm.add_task()
for information on options to provide to this method.
-
add_task_group
(actions=None, depends=None, targets=None, name=[None], interpret_deps_and_targs=True, **kwargs)[source]¶ Create and add a group of
anadama2.Task
to the workflow. This function will create a task for each set of depends and targets provided. The number of targets and dependencies should be the same.This function will call
add_task
for each task in the group. Please see theadd_task
documentation for more information.
-
add_task_group_gridable
(actions=None, depends=None, targets=None, name=[None], interpret_deps_and_targs=True, **kwargs)[source]¶ Create gridable tasks as a group.
-
already_exists
(*depends)[source]¶ Declare a dependency as pre-existing. That means that no task creates these dependencies; they’re already there before any tasks run.
Note
If you have a list or other iterable containing the dependencies that already exist, you can declare them all like so
ctx.already_exists(*my_bunch_of_deps)
.Parameters: *depends (any argument recognized by anadama2.tracked.auto()
) – One or many dependencies to mark as pre-existing.
-
do
(cmd, track_cmd=True, track_binaries=True)[source]¶ Create and add a
anadama2.Task
to the workflow using a convenient, shell-like syntax.To explicitly mark task targets, wrap filenames within
cmd
with[t:]
. Similarly, wrap dependencies with[d:]
. The literal[t:]
and[d:]
characters will be stripped out prior to execution by the shell.Below are some examples of using
do
:from anadama2 import Workflow ctx = Workflow() ctx.do("wget -qO- checkip.dyndns.com > [t:my_ip.txt]") ctx.do(r"sed 's|.*Address: \(.*[0-9]\)<.*||' [d:my_ip.txt] > [t:ip.txt]") ctx.do("whois $(cat [d:ip.txt]) > [t:whois.txt]") ctx.go()
Variables from the workflow configuration can also be used inside
cmd
. These are wrapped with[v:]
:from anadama2 import Workflow ctx = Workflow() ctx.do("wget -qO- checkip.dyndns.com > [v:output]/[t:my_ip.txt]") ctx.go()
Modifiers inside the square brackets can be mixed and matched:
from anadama2 import Workflow from anadama2.cli import Configuration ctx = Workflow(vars=Configuration().add("input", type="dir")) ctx.do("tar c [vd:input] | gzip -c > [t:output.tgz]") ctx.go()
By default, changes made to
cmd
are tracked; any changes tocmd
will cause this task to be rerun. Settrack_cmd
to False to disable this behavior.Also by default, AnADAMA tries to discover pre-existing, small files in
cmd
and treat them as dependencies. This feature is intended to automatically track the scripts and binaries used incmd
. Thus, this task will be re-run if any of the binaries or scripts change. Settrack_binaries
to False to disable this behavior.Parameters: - cmd (str) – The shell command to add to the workflow. Wrap a
target filename in
[t:]
and wrap a dependency filename in[d:]
. Variables from workflow configuration can be substituted into the command by wrapping the variable name in[v:]
. - track_cmd (bool) – Set to False to not track changes to
cmd
. - track_binaries (bool) – Set to False to not discover files
within
cmd
and treat them as dependencies.
Returns: The
anadama2.Task
just created- cmd (str) – The shell command to add to the workflow. Wrap a
target filename in
-
do_gridable
(cmd, track_cmd=True, track_binaries=True, **gridopts)[source]¶ Add a task to be launched on a grid computing system as specified in the
grid
option ofanadama2.workflow.Workflow
. By default, this method is a synonym foranadama2.workflow.Workflow.do()
. Please see theadd_task
documentation for your powerup of choice e.g.anadama2.slurm.Slurm.do()
for information on options to provide to this method.
-
get_input_files
(extension=None, name=None)[source]¶ Return the files in the input folder filtered with the extension or name if provided. The input folder default can be set in the workflow or it can be provided on the command line by the user.
Parameters: Returns: A list of files
-
go
(skip_nothing=False, quit_early=False, runner=None, reporter=None, jobs=None, grid_jobs=None, until_task=None, exclude_task=None, target=None, exclude_target=None, dry_run=False)[source]¶ Kick off execution of all previously configured tasks.
Parameters: - skip_nothing (bool) – Skip no tasks, even if you could.
- quit_early – If any tasks fail, stop all execution
immediately. If set to
False
(the default), children of failed tasks are not executed but children of successful or skipped tasks are executed: basically, keep going until you run out of tasks to execute. - runner (instance of any
anadama2.runners.BaseRunner
subclass or None.) – The tasks to execute are passed to this object for execution. For a list of runners that come bundled with anadama, seeanadama2.runners
. PassingNone
(the default) uses the default runner fromanadama2.runners.default()
. - reporter (instance of any
anadama2.reporters.BaseReporter
subclass or None.) – As task execution proceeds, events are dispatched to this object for reporting purposes. For more information of the reporters bundled with anadama, seeanadama2.reporters
. PassingNone
(the default) uses the default reporter fromanadama2.reporters.default()
. - jobs (int) – The number of tasks to execute in
parallel. This option is ignored when a custom runner is
used with the
runner
keyword. - grid_jobs (int) – The number of tasks to submit to the
grid in parallel. This option is ignored when a custom
runner is used with the
runner
keyword. This option is also a synonym forjobs
if the context has no grid powerup. - until_task (int or str) – Stop after running the named task. Can refer to the end task by task number or task name.
- exclude_task (int or str) – Don’t execute this task or any of its children. Can refer to the task by task number or task name.
- target (str) – Execute the necessary tasks to produce this
target. If
target
contains[
,*
, or?
, it is treated as a pattern and used to match multiple targets. - exclude_target (str) – Don’t execute any tasks that will
produce this target. If
target
contains[
,*
, or?
, it is treated as a pattern and used to match multiple targets. - dry_run (bool) – Don’t execute any actions, just say that you did.
-
name_output_files
(name, tag=None, extension=None, subfolder=None)[source]¶ Return names of files in the output folder use the name(s), tag, extension, and subfolder provided. The output folder default can be set in the workflow or it can be provided on the command line by the user.
Parameters: Returns: A list of file names
-
parse_args
()[source]¶ Return the arguments parsed from the command line. Arguments are returned in the same format as calling argparse.parse_args(). An object with an attribute for each argument is returned. This custom object includes error reporting to aid the user in debugging when trying to get an argument that is not included in the list of command line arguments.
-
task_results
= None¶ task_results is a list of objects of type
anadama2.runners.TaskResult
. This list is populated only after tasks have been run withanadama2.workflow.Workflow.go()
.
-
tasks
= None¶ tasks is a
anadama2.taskcontainer.TaskContainer
filled with objects of typeanadama2.Task
. This list is populated as new tasks are added viaanadama2.workflow.Workflow.add_task()
andanadama2.workflow.Workflow.do()
- storage_backend (instance of any
-
anadama2.workflow.
discover_binaries
(s)[source]¶ Search through string
s
and find all existing files smaller than 10MB. Return those files as a list of objects of typeanadama2.tracked.TrackedExecutable
.