From b59179a1e00e03f7b27aa8c4f70a69981cf4d41c Mon Sep 17 00:00:00 2001 From: Rafael Ferreira da Silva Date: Fri, 24 Apr 2020 12:44:24 -0700 Subject: [PATCH] #156: some minor edits to WRENCH 102 contents --- doc/wrench_101.md | 24 +++---- doc/wrench_102.md | 173 ++++++++++++++++++++++------------------------ 2 files changed, 95 insertions(+), 102 deletions(-) diff --git a/doc/wrench_101.md b/doc/wrench_101.md index 26a826dffe..7a231ab1db 100644 --- a/doc/wrench_101.md +++ b/doc/wrench_101.md @@ -63,7 +63,7 @@ header file: The state of a WRENCH simulation is defined by the `wrench::Simulation` class. A simulator must create an instance of this class and initialize it -with the `wrench::Simulation::init()` method. The bare-metal-chain +with the `wrench::Simulation::init()` member function. The bare-metal-chain simulator does this as follows: ~~~~~~~~~~~~~{.cpp} @@ -71,7 +71,7 @@ wrench::Simulation simulation; simulation.init(&argc, argv); ~~~~~~~~~~~~~ -Note that this method takes in the command-line arguments passed to the main +Note that this member function takes in the command-line arguments passed to the main function of the simulator. This is so that it can parse WRENCH-specific and [SimGrid-specific](https://simgrid.org/doc/latest/Configuring_Simgrid.html) command-line arguments. (Recall that WRENCH is based on @@ -82,7 +82,7 @@ an extensive SimGrid help message. ## Step 2: Instantiate a simulated platform # {#wrench-101-simulator-1000ft-step-2} This is done with the `wrench::Simulation::instantiatePlatform()` -method which takes as argument a [SimGrid virtual platform description +member function which takes as argument a [SimGrid virtual platform description file](https://simgrid.org/doc/latest/platform.html). Any SimGrid simulation, and thus any WRENCH simulation, must be provided with the description of the simulated hardware platform (compute hosts, clusters @@ -142,7 +142,7 @@ simulation.instantiatePlatform(argv[2]); While the previous step defines the hardware platform, this step defines what software services run on that hardware. -The `wrench::Simulation::add()` method is used +The `wrench::Simulation::add()` member function is used to add services to the simulation. Each class of service is created with a particular constructor, which also specifies host(s) on which the service is to be started. Typical kinds of services include compute services, @@ -205,7 +205,7 @@ are located (and is often required - see Step #4 hereafter). Every WRENCH simulator simulates the execution of a workflow, and thus must create an instance of the `wrench::Workflow` class. This class has -methods to manually create tasks and files and add them to the workflow. +member functions to manually create tasks and files and add them to the workflow. For instance, the bare-metal-chain simulator does this as follows: ~~~~~~~~~~~~~{.cpp} @@ -235,7 +235,7 @@ The above creates a "chain" workflow (hence the name of the simulator), in which output from one task is input to the next task. The number of tasks is obtained from a command-line argument. -The `wrench::Workflow` class also provides methods to import workflows from +The `wrench::Workflow` class also provides member functions to import workflows from workflow description files in standard [JSON format](https://github.com/workflowhub/workflow-schema) and [DAX format](http://workflowarchive.org). @@ -295,10 +295,10 @@ This call checks the simulation setup, and blocks until the WMS terminates. ## Step 7: Process simulation output # {#wrench-101-simulator-1000ft-step-7} Once `wrench::Simulation::launch()` has returned, simulation output can be -processed programmatically. The `wrench::Simulation::getOutput()` method +processed programmatically. The `wrench::Simulation::getOutput()` member function returns an instance of class `wrench::SimulationOutput`. -Note that there are methods to configure the type and amount of output generated -(see the `wrench::SimulationOutput::enable*Timestamps()` methods). +Note that there are member functions to configure the type and amount of output generated +(see the `wrench::SimulationOutput::enable*Timestamps()` member functions). The bare-metal-chain simulator does minimal output processing as: ~~~~~~~~~~~~~{.cpp} @@ -308,7 +308,7 @@ for (auto const &item : trace) { ~~~~~~~~~~~~~ Specifically, class `wrench::SimulationOutput` has a templated -`wrench::SimulationOutput::getTrace()` method to retrieve traces for +`wrench::SimulationOutput::getTrace()` member function to retrieve traces for various information types. The first line of code above returns a `std::vector` of time-stamped task completion events. The second line of code iterates through this vector and prints task @@ -323,7 +323,7 @@ which provides accounting for computing time and dissipated energy in the simulated platform. SimGrid's energy plugin requires host `pstate` definitions (levels of performance, CPU frequency) in the [XML platform description file](https://simgrid.org/doc/latest/platform.html). -The `wrench::Simulation::getEnergyConsumed()` method returns energy consumed +The `wrench::Simulation::getEnergyConsumed()` member function returns energy consumed by all hosts in the platform. **Important:** The energy plugin is NOT enabled by default in WRENCH simulations. To enable it, pass the `--activate-energy` command line option to the simulator. See @@ -333,7 +333,7 @@ description file that defines host power consumption profiles). Another option altogether is to dump all simulation output to a JSON file. This is done with the `wrench::SimulationOutput::dump*JSON()` -methods. See the documentation of each method to see the structure of the +member functions. See the documentation of each member function to see the structure of the JSON output, in case you want to parse/process the JSON yourself. Alternately, you can run the installed `wrench-dashboard` tool, which provides interactive visualization/inspection of simulation output. diff --git a/doc/wrench_102.md b/doc/wrench_102.md index f56f68f3d8..0d6d89937d 100644 --- a/doc/wrench_102.md +++ b/doc/wrench_102.md @@ -23,7 +23,7 @@ by including a single header file: ~~~~~~~~~~~~~ A WMS implementation must derive the `wrench::WMS` class, which means that -it can override several virtual methods, but also that a WMS is a service. +it can override several virtual member functions, but also that a WMS is a service. As such, it has a `main()` function that goes through a simple loop as follows: ~~~~~~~~~~~~~{.sh} @@ -44,17 +44,17 @@ to WMS development. ## Discovering running services # {#wrench-102-obtain-information-discovering} -The `wrench::WMS` base class implements a set of methods named +The `wrench::WMS` base class implements a set of member functions named `wrench::WMS::getAvailableComputeServices()`, `wrench::WMS::getAvailableStorageServices()`, -`wrench::WMS::getAvailableNetworkProximityServices()`, etc. These methods +`wrench::WMS::getAvailableNetworkProximityServices()`, etc. These member functions return sets of services that can be used by the WMS to execute its -workflow. Some of these methods are templated to retrieve only a +workflow. Some of these member functions are templated to retrieve only particular kind of services. For instance, the -`wrench::WMS::getAvailableComputeServices()` takes a template argument +`wrench::WMS::getAvailableComputeServices()` takes a template argument to retrieve particular kinds of compute services. In the example simulator in `examples/basic-examples/bare-metal-chain`, the WMS -implementation in `OneTaskAtATimeWMS.cpp` includes the following call: +implementation in `OneTaskAtATimeWMS.cpp` includes the following call: ~~~~~~~~~~~~~{.cpp} auto compute_service = *(this->getAvailableComputeServices().begin()); @@ -62,10 +62,11 @@ auto compute_service = *(this->getAvailableComputeServices()` can return an empty set. +example, the simulator always passes exactly one bare-metal service to the +WMS, so this code is valid. However, `wrench::WMS::getAvailableComputeServices()` +can return an empty set. -The above methods (as well as, for instance, `wrench::Simulation::add()`) +The above member functions (as well as, for instance, `wrench::Simulation::add()`) return shared pointers (i.e., `std::shared_ptr<>`) to the service instances. This is to free the developer from the responsibility of freeing memory. @@ -73,56 +74,57 @@ memory. ## Finding out information about running services # {#wrench-102-obtain-information-finding} -Most service classes provide methods to get information about the +Most service classes provide member functions to get information about the capabilities and properties of the services. For instance, a `wrench::ComputeService` has a `wrench::ComputeService::getNumHosts()` -method that returns how many compute hosts the +member function that returns how many compute hosts the service has access to in total. A `wrench::StorageService` has a -`wrench::StorageService::getFreeSpace()` method to find out have many bytes -of free space are available on it. And so on... +`wrench::StorageService::getFreeSpace()` member function to find out how +many bytes of free space are available on it. And so on... -To take a concrete example, consider the WMS implementation in `examples/basic-examples/batch-bag-of-tasks/TwoTasksAtATimeBatchWMS.cpp`. This WMS finds out the compute speed of the cores of the compute nodes -available to a `wrench::BatchComputeService` as: +To take a concrete example, consider the WMS implementation in +`examples/basic-examples/batch-bag-of-tasks/TwoTasksAtATimeBatchWMS.cpp`. +This WMS finds out the compute speed of the cores of the compute nodes +available to a `wrench::BatchComputeService` as: ~~~~~~~~~~~~~{.cpp} double core_flop_rate = (*(batch_service->getCoreFlopRate().begin())).second; ~~~~~~~~~~~~~ -Method `wrench::ComputeService::getCoreFlopRate()` returns a map of core -compute speeds indexed by hostname (the map thus has one element per +Member function `wrench::ComputeService::getCoreFlopRate()` returns a map +of core compute speeds indexed by hostname (the map thus has one element per compute node available to the service). Since the compute nodes of a batch compute service are homogeneous, the above code simply grabs the core speed value of the first element in the map. -It is important to note that these methods actually +It is important to note that these member functions actually involve communication with the service, and thus incur overhead -that is part of the simulation (as if, in the real-world, you would +that is part of the simulation (as if, in the real-world, you would contact a running service with a request for information over the network). This is why the line of code above, in that example WMS, is executed once and the core compute speed is stored in the `core_flop_rate` variable -to be re-used by the WMS repeatedly throughout its execution. - +to be re-used by the WMS repeatedly throughout its execution. # B) Interacting with services # {#wrench-102-WMS-services} A WMS can have many and complex interactions with services, especially -with compute and storage services. In this section we describe how WRENCH -makes these interactions relatively easy, providing examples for -each kind of interaction for each kind of service. +with compute and storage services. In this section, we describe how WRENCH +makes these interactions relatively easy, providing examples for +each kind of interaction for each kind of service. ## Job Manager and Data Movement Manager # {#wrench-102-WMS-services-managers} As expected, each service type provides its own API. For instance, a network proximity -service provides methods to query the service's host distance databases. +service provides member functions to query the service's host distance databases. The [Developer API Reference](../developer/annotated.html) provides all -necessary documentation, which also explains which methods are synchronous +necessary documentation, which also explains which member functions are synchronous and which are asynchronous (in which case some [event](@ref wrench-102-WMS-events) will occur in the future). -**However, the WRENCH developer will find that many methods that one would +**However, the WRENCH developer will find that many member functions that one would expect are nowhere to be found. For instance, the compute services do not -have methods for submitting workflow tasks for execution!** +have member functions for submitting workflow tasks for execution!** -The rationale for the above is that many methods need to be asynchronous so +The rationale for the above is that many member functions need to be asynchronous so that the WMS can use services concurrently. For instance, a WMS could submit a compute job to two distinct compute services asynchronously, and then wait for the service which completes its job first and cancel the job @@ -131,14 +133,14 @@ require that the WRENCH developer use data structures to perform the necessary bookkeeping of ongoing service interactions, and process incoming control messages from the services on the (simulated) network or alternately register many callbacks. Instead, WRENCH provides **managers**. One can think of -managers are separate threads that handle all asynchronous interactions +managers as separate threads that handle all asynchronous interactions with services, and which have been implemented for your convenience - to make interacting with services easy. +to make interacting with services easy. There are two managers: a **job manager** (class `wrench::JobManager`) and a **data movement manager** (class `wrench::DataMovementManager`). The base `wrench::WMS` class provides two -methods for instantiating and starting these managers: +member functions for instantiating and starting these managers: `wrench::WMS::createJobManager()` and `wrench::WMS::createDataMovementManager()`. @@ -150,7 +152,6 @@ auto job_manager = this->createJobManager(); auto data_movement_manager = this->createDataMovementManager(); ~~~~~~~~~~~~~ - Each manager has its own documented API, and is discussed further in sections below. @@ -165,17 +166,17 @@ The possible interactions between a WMS and a storage service include: - Synchronously copy a file from one storage service to another - Asynchronously copy a file from one storage service to another -The first 4 interactions above are done by calling methods of the +The first 4 interactions above are done by calling member functions of the `wrench::StorageService` class. The last two are done via a Data Movement -Manager, i.e., by calling methods of the `wrench::DataMovementManager` class. Some of -these methods take an optional `wrench::FileRegistryService` argument, in which case +Manager, i.e., by calling member functions of the `wrench::DataMovementManager` +class. Some of these member functions take an optional +`wrench::FileRegistryService` argument, in which case they will also update entries in a file registry service (e.g., removing an entry -when a file is deleted). +when a file is deleted). See [this page](@ref guide-102-simplestorage) for concrete examples of interactions with a `wrench::SimpleStorageService`. - ## Interacting with compute services # {#wrench-102-WMS-services-compute} ### The Job abstraction # {#wrench-102-WMS-services-compute-job} @@ -187,14 +188,14 @@ operations. The job abstraction is powerful and greatly simplifies the task of a WMS while affording flexibility. There are two kinds of jobs in WRENCH: `wrench::PilotJob` and -`wrench::StandardJob`. A pilot job (sometimes called a "placeholder job" in the literature) +`wrench::StandardJob`. A pilot job (sometimes called a "placeholder job" in the literature) is a concept that is mostly relevant for batch scheduling. In a nutshell, it is a job that allows late binding of tasks to resources. It is submitted to a compute service (provided that service supports pilot jobs), and when it starts it just looks to the WMS like a temporary (bare-metal) compute service to which standard jobs can be submitted. -The most common kind of jobs is the standard job. A standard job is a unit +The most common kind of jobs is the **standard job**. A standard job is a unit of execution by which a WMS tells a compute service to do a set of operations. More specifically, in its most complete form, a standard job specifies: @@ -218,11 +219,11 @@ Any of the above can actually be empty, and in the extreme a standard job can do nothing. Standard jobs and pilot jobs are created via the job manager, which -provides a `wrench::JobManager::createPilotJob()` method and several -versions of a `wrench::JobManager::createStandardJob()` method. Briefly +provides a `wrench::JobManager::createPilotJob()` member function and several +versions of a `wrench::JobManager::createStandardJob()` member function. Briefly put, the job manager is a job factory. -The job manager provides the following expected methods: +The job manager provides the following expected member functions: - `wrench::JobManager::submitJob()`: asynchronous submission of a job to a compute service. @@ -236,7 +237,6 @@ The job manager provides the following expected methods: The next section gives many examples of interactions with each kind of compute service. - Click on the following links to see detailed descriptions of and examples of how jobs are submitted to each compute service type: @@ -246,15 +246,13 @@ and examples of how jobs are submitted to each compute service type: - [Virtualized cluster compute service](@ref guide-102-virtualizedcluster) - [HTCondor compute service](@ref guide-102-htcondor) - - ## Interacting with file registry services # {#wrench-102-WMS-services-registry} Interaction with a file registry service is straightforward and done by directly -calling methods of the `wrench::FileRegistryService` class. Note that often +calling member functions of the `wrench::FileRegistryService` class. Note that often file registry service entries are managed automatically, e.g., via calls to -`wrench::DataMovementManager` and `wrench::StorageService` methods. So often -a WMS does not need to interact with the file registry service. +`wrench::DataMovementManager` and `wrench::StorageService` member functions. So often +a WMS does not need to interact with the file registry service. Adding/removing an entry to a file registry service is done as follows: @@ -274,7 +272,6 @@ for a file copy available at some storage service. Retrieving all entries for a given file is done as follows: - ~~~~~~~~~~~~~{.cpp} wrench::WorkflowFile *some_file = ...; @@ -299,12 +296,12 @@ std::shared_ptr np_service = auto entries = fr_service->lookupEntry(some_file, "ReferenceHost", np_service); ~~~~~~~~~~~~~ -See the documentation of `wrench::FileRegistryService` for more API methods. +See the documentation of `wrench::FileRegistryService` for more API member functions. ## Interacting with network proximity services # {#wrench-102-WMS-services-network} Querying a network proximity service is straightforward. For instance, to -obtain a measure of the network distance between hosts "Host1" and "Host3", +obtain a measure of the network distance between hosts "Host1" and "Host2", one simply does: ~~~~~~~~~~~~~{.cpp} @@ -318,28 +315,26 @@ This distance corresponds to half the round-trip-time, in seconds, between the two hosts. If the service is configured to use the Vivaldi coordinate-based system, as in our example above, this distance is actually derived from network coordinates, as computed by the Vivaldi algorithm. In -this case one can actually ask for these coordinates for any given host: +this case, one can actually ask for these coordinates for any given host: ~~~~~~~~~~~~~{.cpp} std::pair coords = np_service->getCoordinates("Host1"); ~~~~~~~~~~~~~ -See the documentation of `wrench::NetworkProximityService` for more API methods. - - - +See the documentation of `wrench::NetworkProximityService` for more API +member functions. # C) Workflow execution events # {#wrench-102-WMS-events} - Because the WMS performs asynchronous operations, it needs to wait for and re-act to events. This is done by -calling the `wrench::WMS::waitForAndProcessNextEvent()` method implemented -by the base `wrench::WMS` class. A call to this method blocks until some -event occurs and the calls a callback method. The possible event classes all derive the -`wrench::WorkflowExecutionEvent` class, and a WMS can override the callback method for -each possible event (the default method does nothing but print -some log message). These overridable callback methods are: +calling the `wrench::WMS::waitForAndProcessNextEvent()` member function implemented +by the base `wrench::WMS` class. A call to this member function blocks until some +event occurs and then calls a callback member function. +The possible event classes all derive from the +`wrench::WorkflowExecutionEvent` class, and a WMS can override the callback member +function for each possible event (the default member function does nothing but print +some log message). These overridable callback member functions are: - `wrench::WMS::processEventStandardJobCompletion()`: react to a standard job completion - `wrench::WMS::processEventStandardJobFailure()`: react to a standard job failure @@ -348,13 +343,12 @@ some log message). These overridable callback methods are: - `wrench::WMS::processEventFileCopyCompletion()`: react to a file copy completion - `wrench::WMS::processEventFileCopyFailure()`: react to a file copy failure -Each method above takes in an event object as parameter. -In the +Each member function above takes in an event object as parameter. In the case of failure, the event includes a `wrench::FailureCause` object, which can be accessed to analyze (or just display) the root cause of the failure. -Consider the WMS in `examples/basic-examples/bare-metal-bag-of-tasks/TwoTasksAtATimeWMS.cpp`. At each -each iteration of its main loop it does: +Consider the WMS in `examples/basic-examples/bare-metal-bag-of-tasks/TwoTasksAtATimeWMS.cpp`. +At each each iteration of its main loop it does: ~~~~~~~~~~~~~{.cpp} // Submit some standard job to some compute service @@ -364,8 +358,9 @@ job_manager->submitJob(...); this->waitForAndProcessNextEvent(); ~~~~~~~~~~~~~ -In this simple example, only one of two events could occur at this point: a standard job completion -or a standard job failure. As a result, this WMS overrides the two corresponding methods as follows: +In this simple example, only one of two events could occur at this point: +a standard job completion or a standard job failure. As a result, this +WMS overrides the two corresponding member functions as follows: ~~~~~~~~~~~~~{.cpp} void TwoTasksAtATimeWMS::processEventStandardJobCompletion( @@ -400,11 +395,10 @@ This is for clarity purposes, and especially because we have not yet explained how WRENCH does message logging. See [an upcoming section about logging](@ref wrench-102-WMS-logging). - While the above callbacks are convenient, sometimes it is desirable to do things more manually. That is, wait for an event and then process it in -the code of the main loop of the WMS rather than in a callback method. This -is done by calling the `wrench::waitForNextEvent()` method. For instance, the WMS +the code of the main loop of the WMS rather than in a callback member function. This +is done by calling the `wrench::waitForNextEvent()` member function. For instance, the WMS in `examples/basic-examples/bare-metal-data-movement/DataMovementWMS.cpp` does it as: @@ -425,10 +419,9 @@ if (auto file_copy_completion_event = std::dynamic_pointer_castsimulation` +member functions are static, but other are not. The `wrench:WMS` class includes a +`simulation` object. Thus, the WMS can call member functions on the `this->simulation` object. For instance, this fragment of code shows how a WMS can check that -a host exists (given a hostname) and if so set its -pstate (power state) to the highest possible. +a host exists (given a hostname) and if so set its `pstate` (power state) +to the highest possible. ~~~~~~~~~~~~~{.cpp} if (wrench::Simulation::doesHostExist("SomeHost")) { - this->simulation->setPstate("SomeHost", wrench::Simulation::getNumberofPstates("SomeHost")-1); } ~~~~~~~~~~~~~ @@ -474,18 +466,18 @@ plug-and-play-able for a single WMS implementation. For this reason, the `wrench::WMS` constructor takes as parameters two objects (or null pointers if not needed): - - `wrench::StandardJobScheduler`: A class that has a `wrench::PilotJobScheduler::schedulePilotJobs()` - method (to be overwritten) that can be invoked at any time by the WMS to submit pilot jobs to compute services. - - `wrench::PilotJobScheduler`: A class that has a `wrench::StandardJobScheduler::scheduleTasks()` - method (to be overwritten) that can be invoked at any time by the WMS to submit tasks (inside standard jobs) to compute services. + - `wrench::StandardJobScheduler`: A class that has a `wrench::StandardJobScheduler::scheduleTasks()` + member function (to be overwritten) that can be invoked at any time by the WMS to submit tasks (inside standard jobs) to compute services. + + - `wrench::PilotJobScheduler`: A class that has a `wrench::PilotJobScheduler::schedulePilotJobs()` + member function (to be overwritten) that can be invoked at any time by the WMS to submit pilot jobs to compute services. Although not required, it is possible to implement most (or even all) -decision-making in these two methods so at to have a clean separation of +decision-making in these two member functions so at to have a clean separation of concern between the decision-making part of the WMS and the rest of its -functionality. This kind of design is used in the example simulators in the +functionality. This kind of design is used in the example simulators in the `examples/real-workflow-example/` directory. - # Logging # {#wrench-102-WMS-logging} It is typically desirable for the WMS to print log output to the terminal. @@ -497,7 +489,8 @@ documentation](https://simgrid.org/doc/latest/outcomes.html) for all details. Furthermore, one can change the color of the log messages with the -`wrench::TerminalOutput::setThisProcessLoggingColor()` method, which takes as parameter a color specification: +`wrench::TerminalOutput::setThisProcessLoggingColor()` member function, +which takes as parameter a color specification: - `wrench::TerminalOutput::COLOR_BLACK` - `wrench::TerminalOutput::COLOR_RED` @@ -510,7 +503,7 @@ Furthermore, one can change the color of the log messages with the When inspecting the code of the WMSs in the example simulators you will find many examples of calls to `wrench::WRENCH_INFO()`. -The logging is per .cpp file, each of which corresponds to a declared +The logging is per `.cpp` file, each of which corresponds to a declared logging category. For instance, in `examples/basic-examples/batch-bag-of-tasks/TwoTasksAtATimeBatchWMS.cpp`, you will find the typical pattern: