Flow#
Overview#
A Flow in dxflow represents one or more software tools or containers that are configured to work together to accomplish a specific scientific computing task. Flows can range from simple single-container scripts to complex multi-container applications with multiple services and dependencies. We use docker-compose style definitions for flows, enabling easy management and deployment of containerized scientific workflows.
Key Concepts#
Flow Definition: A template that describes containers, their configurations, and relationships
Flow Registry: A collection of flows available in your namespace
Variables: Configurable parameters that customize flow behavior
Profiles: Different execution configurations for the same flow
Projects: Instances of flows running on compute units
FlowRegisteryManager Class#
The FlowRegisteryManager manages all flows in your namespace, providing methods to publish, retrieve, update, and manage flow packages.
Usage#
from dxflow import Session
session = Session(email="your@email.com", password="your_password")
flow_register = session.get_flow_registery_manager()
flow_register.list(filters={"tags": "Molecular Dynamics"})
Name | Pointer | Tags | Status | Verified
--------+--------------------------------+--------------------------------+-----------+---------
Boltz-1 | 7ea2b574-e962-4f18-bb54-7e233c | Molecular Dynamics, Biomolecul | PUBLISHED | True
Methods:#
publish(name, description=None, properties=None, tags=None, arch="AMD64", color=None, icon=None, logo=None, image=None, logo_url=None, image_url=None): Publishes a new flow package.update_flow_package(flow_pointer, update_data): Updates an existing flow package.update_template(flow_pointer, template_data): Updates the template of a flow package.get(flow_pointer): Retrieves a flow package.get_by_flow_name(name): Retrieves a flow package by name.get_by_name(name, update_list=True): Retrieves a flow package by name within the current instance.get_list(filters=None): Retrieves a list of flow packages.list(filters={"tags": "AMD64", "verified": True}, table=True, return_info=False): Lists flow packages with optional filters and table display.available_flows(name): Lists available flows in a namespace.activate(flow_pointer): Activates a flow package.deactivate(flow_pointer): Deactivates a flow package.get_flow_pointer(name=None, update_list=True): Retrieves the pointer of a flow package by name.
Examples#
Publishing a New Flow#
# Define a flow template (docker-compose style)
template = """
version: '3.8'
services:
analyzer:
image: python:3.9
command: python analyze.py
environment:
- THREADS=${THREADS:-4}
- INPUT_FILE=${INPUT_FILE}
volumes:
- ./data:/data
"""
# Publish the flow
flow_register.publish(
name="DataAnalyzer",
description="Analyzes scientific data using Python",
properties={"template": template},
tags=["Data Analysis", "Python"],
arch="AMD64"
)
Filtering and Finding Flows#
# Find flows by tags
molecular_flows = flow_register.list(
filters={"tags": "Molecular Dynamics", "verified": True}
)
# Get a specific flow by name
flow = flow_register.get_by_name("Boltz-1")
# Get flow pointer for creating projects
flow_pointer = flow_register.get_flow_pointer(name="GROMACS")
Flow Class#
The Flow class represents an individual flow package with its template, variables, and metadata.
Methods:#
__init__(name, template, pointer=None, user={}, description=None, properties={}, tags=[], environments=[], visibility="PRIVATE", status=None, created_at=None, updated_at=None, state=None, verified=None): Initializes a Flow object with the provided attributes. RaisesValueErrorif required fields are missing or invalid values are provided forvisibilityorstatus.generate_compose(variables: dict=None, project_name: str=None) -> yaml: Generates a Docker Compose YAML file based on the flow’s template and provided variables. Filters services based on profiles and validates variables. Returns the updated YAML configuration.create_project(project_name=None, variables={}, compute_unit=None, **kwargs): Creates a project using the flow’s template and the provided compute unit. RaisesValueErrorif no template is available.get_variables(): Retrieves the variables defined in the flow’s template.display_variables(table=True): Displays the variables in the flow’s template along with their descriptions, hints, default values, and options. Supports table or JSON format.__str__(): Returns a JSON string representation of the Flow object, including all its attributes.
Flow Examples#
Creating a Project from a Flow#
# Get a flow
flow = flow_register.get_by_name("GROMACS")
# Display available variables
flow.display_variables()
# Create a project with custom variables
compute_unit = compute_manager.get_unit(name="GPU-Unit")
project = flow.create_project(
project_name="protein-simulation",
variables={
"PROTEIN_FILE": "1AKI.pdb",
"SIMULATION_TIME": "100",
"TEMPERATURE": "300"
},
compute_unit=compute_unit
)
Working with Flow Variables#
# Get flow variables
variables = flow.get_variables()
# Display in table format
flow.display_variables(table=True)
# Display as JSON
flow.display_variables(table=False)
Generating Docker Compose Configuration#
# Generate compose file with variables
compose_yaml = flow.generate_compose(
variables={
"CPU_CORES": "8",
"MEMORY": "16G",
"INPUT_PATH": "/data/input"
},
project_name="my-analysis"
)
# Save to file
import yaml
with open("docker-compose.yml", "w") as f:
yaml.dump(compose_yaml, f)
Common Flow Patterns#
Single Container Flow#
version: '3.8'
services:
main:
image: scientific/tool:latest
command: ${COMMAND}
environment:
- PARAM1=${PARAM1}
volumes:
- ./input:/input
- ./output:/output
Multi-Container Pipeline#
version: '3.8'
services:
preprocessor:
image: preprocess:latest
command: preprocess --input /data/raw --output /data/processed
volumes:
- ./data:/data
analyzer:
image: analyze:latest
depends_on:
- preprocessor
command: analyze --input /data/processed --output /data/results
volumes:
- ./data:/data
visualizer:
image: visualize:latest
depends_on:
- analyzer
command: visualize --input /data/results
volumes:
- ./data:/data
ports:
- "8080:8080"
GPU-Enabled Flow#
version: '3.8'
services:
gpu_compute:
image: nvidia/cuda:11.0-runtime
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
command: python gpu_simulation.py
volumes:
- ./models:/models
Best Practices#
Use Variables: Make flows configurable with variables for reusability
Tag Appropriately: Use descriptive tags for easy discovery
Document Variables: Provide clear descriptions and defaults
Version Control: Track flow templates in version control
Test Locally: Test flows with docker-compose before publishing
Resource Limits: Define resource constraints for predictable behavior
Health Checks: Include health checks for service reliability