Yotse Package

Subpackages

Submodules

yotse.execution module

Defines classes and functions for the execution of your experiment.

class yotse.execution.CustomExecutor(experiment: Experiment)[source]

Bases: Executor

Custom Executor class for users to tailor to their specific experimental setups.

The CustomExecutor class is a user-defined extension of the base Executor class. Users can customize this class to adapt the optimization and execution process to their specific experimental requirements.

Parameters:

experiment (Experiment) – The experiment object associated with the custom executor.

__init__(experiment: Experiment)[source]

Initialize CustomExecutor object.

collect_data() DataFrame

Collects data from output.csv (or the output of the scripts) and combines it into a dataframe which has as first column the associated cost and as the other columns the input parameters (order the same way is input to the experiment). The rows of the dataframe follow the same ordering as the jobs.

Returns:

data – Pandas dataframe containing the combined outputs of the individual jobs in the form above.

Return type:

pandas.Dataframe

create_points_based_on_optimization(data: DataFrame, evolutionary: bool | None = None) None

Applies an optimization algorithm to process the collected data and create new data points from it which is then directly written into the experiments attributes.

Parameters:
  • data (pandas.Dataframe) – A pandas dataframe containing the collected data in the format cost_value init_param_1 … init_param_n.

  • evolutionary (bool , optional) – Overwrite the type of construction to be used for the new points. If evolutionary=None the optimization algorithm determines whether the point creation is evolutionary or based on the best solution. Defaults to None.

generate_optimizer() Optimizer

Sets the optimization algorithm for the run by translating information in the currently ‘active’ optimization_info.

Returns:

optimization_alg – Object of subclass of :class:GenericOptimization, the optimization algorithm to be used by this runner.

Return type:

GenericOptimization

get_active_optimization() OptimizationInfo

Get the active optimization step.

Returns:

The active optimization step.

Return type:

OptimizationInfo

Raises:
  • RuntimeError – If there are multiple active optimization steps.

  • RuntimeError – If no active optimization steps are found.

load_executor_state(aux_directory: str) None

Load the state of the Executor to be able to resume.

next_optimization() None

Switch to the next active optimization in the list.

Deactivates the current active optimization and activates the next one in the list.

Raises:

RuntimeError – If there are multiple active optimizations or none at all.

pre_submission_analysis() List[str | Any]

Executes any necessary steps before the analysis script and returns the QCG- Pilot command line list for it.

Returns:

  • analysis_commandline (list) – The list of command line arguments for the QCG-Pilot job submission for the program.

  • Note (Overwrite this function if you need other directory structure or pre-submission functionality for your)

  • analysis script.

pre_submission_setup_per_job(datapoint_item: List[float], step_number: int, job_number: int) List[str | Any]

Sets up the basic directory structure for a job and returns the QCG- Pilot command line list for it.

Parameters:
  • datapoint_item (list) – Single item of data points for the job as a list.

  • step_number (int) – The number of the step in the experiment.

  • job_number (int) – The number of the job within the step.

Returns:

  • program_commandline (list) – The list of command line arguments for the QCG-Pilot job submission for the program.

  • Note (Overwrite this function if you need other directory structure or pre-submission functionality.)

run(step_number: int = 0, evolutionary_point_generation: bool | None = None) None[source]

Run the custom execution process.

This method overrides the run method in the base Executor class to provide custom logic for the execution process tailored to the user’s specific needs.

Parameters:
  • step_number (int, optional) – Step number to submit to QCGPilot. Should be used for e.g. running different optimization steps. Defaults to 0.

  • evolutionary_point_generation (bool, optional) – Overwrite the type of construction to be used for the new points. If None, the optimization algorithm determines whether the point creation is evolutionary or based on the best solution. Defaults to None.

save_executor_state() None

Save state of the Executor to be able to resume later.

submit(step_number: int = 0) List[str]

Submits jobs to the LocalManager.

Parameters:

step_number (int, optional) – Step number to submit to QCGPilot. Should be used for e.g. running different optimization steps. Defaults to 0.

Returns:

job_ids – A list of job IDs submitted to the LocalManager.

Return type:

list

whitebox_submit() None

Run the white-box optimization process.

Currently, this does not use QCGPilotJob but runs locally.

class yotse.execution.Executor(experiment: Experiment)[source]

Bases: object

A facilitator for running experiments and optimization algorithms.

The Executor class coordinates the execution of experiments, manages the optimization process, and interfaces with the LocalManager for job submission. It supports both black-box and white-box optimization strategies, allowing for the exploration of various optimization algorithms.

experiment

The experiment object associated with the executor.

Type:

Experiment

blackbox_optimization

Flag indicating whether the optimization process is black-box or white-box.

Type:

bool

optimizer

The optimizer object responsible for managing the optimization algorithm.

Type:

Optimizer

aux_dir

The auxiliary directory used during the optimization process.

Type:

str

Parameters:

experiment (Experiment) – The experiment object associated with the executor.

__init__(experiment: Experiment)[source]

Initialize Executor object.

collect_data() DataFrame[source]

Collects data from output.csv (or the output of the scripts) and combines it into a dataframe which has as first column the associated cost and as the other columns the input parameters (order the same way is input to the experiment). The rows of the dataframe follow the same ordering as the jobs.

Returns:

data – Pandas dataframe containing the combined outputs of the individual jobs in the form above.

Return type:

pandas.Dataframe

create_points_based_on_optimization(data: DataFrame, evolutionary: bool | None = None) None[source]

Applies an optimization algorithm to process the collected data and create new data points from it which is then directly written into the experiments attributes.

Parameters:
  • data (pandas.Dataframe) – A pandas dataframe containing the collected data in the format cost_value init_param_1 … init_param_n.

  • evolutionary (bool , optional) – Overwrite the type of construction to be used for the new points. If evolutionary=None the optimization algorithm determines whether the point creation is evolutionary or based on the best solution. Defaults to None.

generate_optimizer() Optimizer[source]

Sets the optimization algorithm for the run by translating information in the currently ‘active’ optimization_info.

Returns:

optimization_alg – Object of subclass of :class:GenericOptimization, the optimization algorithm to be used by this runner.

Return type:

GenericOptimization

get_active_optimization() OptimizationInfo[source]

Get the active optimization step.

Returns:

The active optimization step.

Return type:

OptimizationInfo

Raises:
  • RuntimeError – If there are multiple active optimization steps.

  • RuntimeError – If no active optimization steps are found.

load_executor_state(aux_directory: str) None[source]

Load the state of the Executor to be able to resume.

next_optimization() None[source]

Switch to the next active optimization in the list.

Deactivates the current active optimization and activates the next one in the list.

Raises:

RuntimeError – If there are multiple active optimizations or none at all.

pre_submission_analysis() List[str | Any][source]

Executes any necessary steps before the analysis script and returns the QCG- Pilot command line list for it.

Returns:

  • analysis_commandline (list) – The list of command line arguments for the QCG-Pilot job submission for the program.

  • Note (Overwrite this function if you need other directory structure or pre-submission functionality for your)

  • analysis script.

pre_submission_setup_per_job(datapoint_item: List[float], step_number: int, job_number: int) List[str | Any][source]

Sets up the basic directory structure for a job and returns the QCG- Pilot command line list for it.

Parameters:
  • datapoint_item (list) – Single item of data points for the job as a list.

  • step_number (int) – The number of the step in the experiment.

  • job_number (int) – The number of the job within the step.

Returns:

  • program_commandline (list) – The list of command line arguments for the QCG-Pilot job submission for the program.

  • Note (Overwrite this function if you need other directory structure or pre-submission functionality.)

run(step_number: int = 0, evolutionary_point_generation: bool | None = None) None[source]

Submits jobs to the LocalManager, collects the output, creates new data points, and finishes the run.

Parameters:
  • step_number (int (optional)) – Step number to submit to QCGPilot. Should be used for e.g. running different optimization steps. Defaults to 0.

  • evolutionary_point_generation (bool (optional)) – Overwrite the type of construction to be used for the new points. If None the optimization algorithm determines whether the point creation is evolutionary or based on the best solution. Defaults to None.

save_executor_state() None[source]

Save state of the Executor to be able to resume later.

submit(step_number: int = 0) List[str][source]

Submits jobs to the LocalManager.

Parameters:

step_number (int, optional) – Step number to submit to QCGPilot. Should be used for e.g. running different optimization steps. Defaults to 0.

Returns:

job_ids – A list of job IDs submitted to the LocalManager.

Return type:

list

whitebox_submit() None[source]

Run the white-box optimization process.

Currently, this does not use QCGPilotJob but runs locally.

yotse.post module

Defines classes and functions for the post processing of your experiment.

yotse.post.plot_cost_function(x: ndarray, y: ndarray, z: ndarray) None[source]

Plot the cost function.

yotse.post.plot_opt_steps(experiment: Experiment, x: ndarray, y: ndarray, z: ndarray) None[source]

Plot the optimization steps.

yotse.pre module

Defines classes and functions for the setup of your experiment.

class yotse.pre.ConstraintDict[source]

Bases: TypedDict

Data structure to define constraints on parameter values.

Parameters:
  • low (float, optional) – The lower bound for the parameter.

  • high (float, optional) – The upper bound for the parameter.

  • step (float, optional) – The step size for the parameter.

__init__(*args, **kwargs)
clear() None.  Remove all items from D.
copy() a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

high: float
items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
low: float
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

step: float | None
update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
class yotse.pre.Experiment(experiment_name: str, system_setup: SystemSetup, parameters: List[Parameter] | None = None, opt_info_list: List[OptimizationInfo] | None = None)[source]

Bases: object

Class that contains the whole experiment, including ExperimentalSystemSetup, all Parameters and a list of optimization steps.

Parameters:
  • experiment_name (str) – Descriptive name for the experiment.

  • system_setup (SystemSetup) – Instance of the SystemSetup class that contains the setup of the experimental system.

  • parameters (list of Parameter, optional) – List of Parameter instances that define the parameters to be varied in the experiment. Defaults to an empty list. Note: If one wants to first optimize over a subset of parameters then set the remaining parameters as inactive for param not in params_to_opt_over: param.parameter_active = False. To later also optimize over the other subset just set them to active again.

  • opt_info_list (list, optional) – List of OptimizationInfo objects describing the different optimization algorithms to be used and their parameters. Defaults to an empty list.

data_points

MxN array where M is the total number of combinations of parameter data points and N is the number of parameters. Each row represents a unique combination of parameter values to be explored in the experiment.

Type:

ndarray

__init__(experiment_name: str, system_setup: SystemSetup, parameters: List[Parameter] | None = None, opt_info_list: List[OptimizationInfo] | None = None)[source]

Initialize Experiment object.

add_optimization_info(optimization_info: OptimizationInfo) None[source]

Adds OptimizationInfo to the experiment.

Parameters:

optimization_info (OptimizationInfo) – The optimization step to add to the experiment.

add_parameter(parameter: Parameter) None[source]

Adds a parameter to the experiment.

Parameters:

parameter (Parameter) – The parameter to add to the experiment.

property cost_function: Callable[[...], float] | None

Cost function of the experiment.

create_datapoint_c_product() ndarray[source]

Create initial set of points as Cartesian product of all active parameters.

Overwrite if other combination is needed.

generate_slurm_script(filename: str) None[source]

Generate slurm script to execute the file through slurm.

Note: after the file has been created the process can be started by calling sbatch slurm.job

Parameters:

filename (str (optional)) – Name of the file to be executed through SLURM. Note this is not the filename of the SLURM script itself.

parse_slurm_arg(filename: str) None[source]

Parse command-line arguments to determine if a SLURM script should be generated.

Parameters:

filename (str) – The filename of the script to be executed with SLURM.

qcgpilot_commandline(datapoint_item: List[Any]) List[str | Any][source]

Creates a command line for the QCG-PilotJob executor based on the experiment configuration.

Parameters:

datapoint_item (List[float]) – Datapoint containing the specific values for each parameter e.g. (x1, y2, z1).

Returns:

A list of strings representing the command line arguments for the QCG-PilotJob executor.

Return type:

list

class yotse.pre.OptimizationInfo(name: str, blackbox_optimization: bool, opt_parameters: Dict[str, Any], is_active: bool)[source]

Bases: object

Class that is optional as input to the Experiment, if the run is supposed to execute an optimization it will look here for the parameters.

Parameters:
  • name (str) – Name of the optimization algorithm to be used, e.g. “GA” (genetic algorithm), “GD” (gradient descent).

  • blackbox_optimization (bool) – Whether the optimization should be a black-box optimization. (If False: a function must be supplied.)

  • opt_parameters (dict) – Dictionary containing all necessary parameters for the optimization.

  • is_active (bool) – Whether this is the currently active optimization algorithm. Can be used to perform sequential optimization with different optimization algorithms that can all be defined in a single Experiment.

  • function (callable, optional) – The objective function to be optimized. Required if blackbox_optimization is False.

__init__(name: str, blackbox_optimization: bool, opt_parameters: Dict[str, Any], is_active: bool)[source]

Initialize OptimizationInfo object.

class yotse.pre.Parameter(name: str, param_range: List[float | int], number_points: int, distribution: str, constraints: ConstraintDict | ndarray | None = None, weights: List[float] | None = None, parameter_active: bool = True, custom_distribution: Callable[[float, float, int], ndarray] | None = None, param_type: str = 'continuous', scale_factor: float = 1.0, depends_on: ParameterDependencyDict | None = None)[source]

Bases: object

Defines a class for any type of parameter we want to vary in our simulation.

Parameters:
  • name (str) – Name of the parameter.

  • param_range (list(min, max)) – List with the min and max value of the parameter; min and max are floats.

  • number_points (int) – Number of points to be explored.

  • distribution (str) – Type of distribution of the points. Currently supports ‘linear’, ‘uniform’, ‘normal’, ‘log’ or ‘custom’. If ‘custom’ is specified then parameter custom_distribution is required.

  • constraints (dict or np.ndarray (optional)) – Dictionary with constraints. Keys can be ‘low’, ‘high’ and ‘step’. Alternatively np.ndarray with acceptable values. Defaults to None.

  • weights (list (optional)) – List of weights for the parameters, defaults to None.

  • parameter_active (bool (optional)) – Whether this parameter should be used a varied parameter in this optimization step. Can be used to perform sequential optimization of different parameters with only one pre-step. Defaults to True.

  • custom_distribution (function (optional if distribution!='custom')) – Custom distribution function that takes as arguments (min_value: float, max_value: float, number_points: int) and returns a Tuple of float points. Defaults to None.

  • param_type (str (optional)) –

    Type of parameter: ‘discrete’ or ‘continuous’.

    Defaults to ‘continuous’.

  • scale_factor (float (optional)) – Scale factor to apply to the parameter when generating new points. Defaults to 1.0.

  • todo (#) –

  • depends_on (dict (optional)) – Dictionary containing the two keys ‘name’ and ‘function’, specifying the parameter name it depends on and a function of the form function(parameter_value: float, parameter_it_depends_on_value: float) -> float. Defaults to None.

data_points

Data points for this parameter, stored in an np.ndarray for efficient computation and memory usage.

Type:

np.ndarray (1D)

__init__(name: str, param_range: List[float | int], number_points: int, distribution: str, constraints: ConstraintDict | ndarray | None = None, weights: List[float] | None = None, parameter_active: bool = True, custom_distribution: Callable[[float, float, int], ndarray] | None = None, param_type: str = 'continuous', scale_factor: float = 1.0, depends_on: ParameterDependencyDict | None = None)[source]

Initialize Parameter object.

generate_data_points(num_points: int) ndarray[source]

Generate set of n=num_points data points based on the specified distribution, range, and param_type of this parameter.

Parameters:

num_points (int) – Number of datapoints to generate.

Notes

  • data_points are not sorted.

  • data_points are not guaranteed to be unique.

generate_initial_data_points() ndarray[source]

Generate initial data points based on the specified distribution and range.

property is_active: bool

Whether this parameter is active (=used for the current optimization).

update_parameter_through_dependency(parameter_list: List[Parameter]) None[source]

Update data points and constraints for this parameter based on another parameter’s data points and constraints.

Parameters:

parameter_list (list) – List of (all) Parameter objects in the experiment. Should at least contain the parameter that this parameter depends on.

Notes

# todo : this will only be applied once before the start of the experiment. Is that useful?

class yotse.pre.ParameterDependencyDict[source]

Bases: TypedDict

Data structure to explicitly specify how to define parameter dependencies.

Parameters:
  • name (str) – Name of the parameter the initial parameter depends on. E.g. if fidelity depends on time, name = ‘time’.

  • function (Callable) – Dependency function of the form function(parameter_value: float, parameter_it_depends_on_value: float) -> float

__init__(*args, **kwargs)
clear() None.  Remove all items from D.
copy() a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

function: Callable[[float, float], float]
get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
name: str
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
class yotse.pre.SystemSetup(source_directory: str, program_name: str, command_line_arguments: Dict[str, Any] | None = None, analysis_script: str | None = None, executor: str = 'python', output_dir_name: str | None = None, output_extension: str = 'csv', venv: str | None = None, slurm_venv: str | None = None, num_nodes: int = 1, alloc_time: str = '00:15:00', slurm_args: List[str] | None = None, qcg_cfg: Dict[str, str | int] | None = None, modules: List[str] | None = None)[source]

Bases: object

Defines a class for the setup of the system parameters.

Parameters:
  • source_directory (str) – Path of the source directory.

  • program_name (str) – Name of the script that should be used for the experiment.

  • command_line_arguments (dict (optional)) – Dictionaries containing as keys the reference of the line argument and as values their value. Defaults to an empty dictionary.

  • analysis_script (str (optional)) – Name of the script that is used to analyse the output of the program script. Defaults to None.

  • executor (str (optional)) – Executor to be passed when submitting jobs. Defaults to ‘python’.

  • output_dir_name (str (optional)) – Name of the directory the output should be stored in. Defaults to ‘output’.

  • output_extension (str (optional)) – Extension of the output files to be picked up by the analysis_script, e.g ‘csv’ or ‘json’. Defaults to ‘csv’.

  • venv (str (optional)) – Path to the virtual environment that should be initialized before the QCGPilot job is started. Defaults to None.

  • slurm_venv (str (optional)) – Path to the environment that slurm should activate before executing yotse. This needs to have yotse installed. Defaults to None.

  • num_nodes (int (optional)) – Number of nodes to allocate on the HPC cluster. Defaults to 1.

  • alloc_time (str (optional)) – Time to allocate on the HPC cluster in the format HH:MM:SS (or HHH:MM:SS and so forth). Defaults to ‘00:15:00’.

  • slurm_args (list (optional)) – Additional arguments to pass to SLURM, e.g. ‘–exclusive’. Defaults to None

  • qcg_cfg (dict (optional)) – Configuration to pass to the QCG-PilotJob manager. Dict with supported keys ‘init_timeout’, ‘poll_delay’, ‘log_file’, ‘log_level’. See docstring of qcg.pilotjob.api.manager.LocalManager. If None QCG defaults are used. Defaults to None.

  • modules (list (optional)) – Modules to load on the HPC cluster. Defaults to None.

  • Attributes

  • ----------

  • stdout_basename (str) – Basename of the file that the standard output steam (stdout) of the script should be written to. The final filename will be of the form ‘<stdout_basename><unique_job_identifier>.txt’. O

  • working_directory (str) – Name of the current working directory to be passed to QCGPilotJob.

__init__(source_directory: str, program_name: str, command_line_arguments: Dict[str, Any] | None = None, analysis_script: str | None = None, executor: str = 'python', output_dir_name: str | None = None, output_extension: str = 'csv', venv: str | None = None, slurm_venv: str | None = None, num_nodes: int = 1, alloc_time: str = '00:15:00', slurm_args: List[str] | None = None, qcg_cfg: Dict[str, str | int] | None = None, modules: List[str] | None = None)[source]

Initialize SystemSetup object.

cmdline_dict_to_list() List[str | int | float][source]

Convert the dictionary of commandline arguments to a list for QCGPilot.

property current_step_directory: str

Returns the path of the current optimization step.

yotse.pre.set_basic_directory_structure_for_job(experiment: Experiment, step_number: int, job_number: int) None[source]

Creates a new directory for the given step number and updates the experiment’s working directory accordingly.

The basic directory structure is as follows:

source_dir/
├── output_dir/
│   ├── your_run_script.py
│   ├── analysis_script.py
│   └── step_{i}/
│       ├── analysis_output.csv
│       └── job_{j}/
│           ├── output_of_your_run_script.extension
│           └── stdout{j}.txt
Parameters:
  • experiment (Experiment) – The Experiment that is being run.

  • step_number (int) – The number of the current step.

  • job_number (int) – The number of the current job.

Module contents