Learning Center
Plans & pricing Sign in
Sign Out



									[SUP-629] ECFLOW getting a job to trigger on more then one event
Created: 11/Sep/13 Updated: 25/Oct/13 Resolved: 25/Oct/13
Status:            Resolved
Project:           Software Support
Component/s:       ECFLOW
Fix Version/s:     None

Type:              Bug                      Priority:               Major
Reporter:          Lennart Pontus Kristofer                         Daniel Varela Santoalla
Resolution:        Fixed                    Votes:                  0
Labels:            smhi, sweden

Attachments:          pontus.png
                   Member State Met Service


If say I have a task t1 that shall run 3 times per day. On every run it should also trigger
another task t2.
If I just let t2 trigger on t1 == complete. T2 will only run once. If I put a ecflow_client
requeue on t2 it will run indefinitely after the t1s third run.

Is there any way to tell a job to only complete after it has run a set amount of times?


Pontus Brander

Comment by Daniel Varela Santoalla [ 11/Sep/13 ]
Hi Pontus

I don't think there is any direct feature like that but will still check, just in case. An
alternative could be to keep a separate counter on how many times has t1 run, for
example in a file, and only re-queue t2 if the counter is less than 3.

Anyway, I'll see if anybody here has better ideas.

Comment by Axel Bonet [ 11/Sep/13 ]

shouldn't you use a repeat (or two) here?

family test
trigger initial condition ...
task t1
repeat integer ID 1 3
time +03:00 # relative time interval between each run
task t2
repeat integer ID 1 3
trigger t1:ID gt t2:ID or t1 eq complete

family test
trigger initial condition ...
repeat integer ID 1 3

task t1
time +03:00 # relative time interval between each run
task t2
trigger t1 eq complete # t1 is blocked if t2 fails in this example

another solution is to expand the repeat to handle the three runs if we cannot accept to be
blocked by a previous aborted run.
(we appreciate a design with python/... there)
family f1
task t1
time 09:00
task t2
trigger t1 eq complete
family f2
task t1
time 12:00
task t2
trigger t1 eq complete
family f3
task t1
time 15:00
task t2
trigger t1 eq complete

Comment by Daniel Varela Santoalla [ 11/Sep/13 ]
Thanks for the input Axel
Comment by Lennart Pontus Kristofer Brander [ 12/Sep/13 ]
I'm sorry but I Don't understand the first solution, could you explain it abit more and also
show how it would be done in python.

Second solution doesn't seem to work how I want it. The third isn't really a solution since
it will clutter ecflowview.

To add to the initial question: This is another suite with the same problem. Where I want
t13 and it's slave t14 to trigger twice a day but by two different tasks.

def create_family_f1():
    f1 = ecflow.Family("enlossadm")
    # Family parameters
    f1_dict = {"NODE":"lxserv707", "OWNER":"jobs.u",
    #f1.add_repeat( ecflow.RepeatDay (1))

    t8 = f1.add_task("LoadFcstET_energi_em_sum")
    t8_dict = {"SCRIPT_NAME":"loadIndata.csh",
"PARM1":"forecast", "PARM2":"store",
    t8.add_time( "13:56" )

    t9 = f1.add_task("LoadFcstET_energi_fm_sum")
    t9_dict = {"SCRIPT_NAME":"loadIndata.csh",
"PARM1":"forecast", "PARM2":"store",
    t9.add_time( "13:55" )

return f1
def create_family_f2():
    #exp1 =
PartExpression('../enlossadm/LoadFcstET_energi_fm_sum ==
    f2 = ecflow.Family("enlossprod")
    f2_dict = {"NODE":"lxserv707", "OWNER":"jobs.u",
    #f2.add_repeat( ecflow.RepeatDay (1))

    t13 = f2.add_task("Prod_1150")
    t13_dict = {"SCRIPT_NAME":"runForecast.csh", "PARM1":"-
== complete or ../enlossadm/LoadFcstET_energi_em_sum ==

    t14 = f2.add_task("Create_Prod_133")
    t14_dict = {"SCRIPT_NAME":"createProduct.csh",
"PARM1":"-p 133"}
    t14.add_trigger("Prod_1150 == complete")

     return f2


Ps. I would really appreciate it if you could explain in more detail why and how and even
add "maybe instead you should do this" whenever we ask for help, keep in mind that we
have no prior knowledge of SMS nor ECFLOW.
Comment by Axel Bonet [ 12/Sep/13 ]

i consider this question for educational purpose. As such, snapshot of a possible structure,
and related py-def file is provided.

i would not consider this as an operational design, while we would have to consider the
scenario "when things go wrong".
Here, one task aborting will affect the expected future runs. That means, with 3h interval,
it gives an idea of the amount of time for an analyst to come and fix a problem.
for operation, the third solution mentioned above remains my preferred one.

Also, we use time dependency as a required condition to start extraction,
as a window when to check the presence of an input file,
or on the other side, the time when to make the products available.
i would need to be explained why we need a one minute interval in processing tasks.

hope this helps,

Comment by Lennart Pontus Kristofer Brander [ 12/Sep/13 ]
Thank you Axel, I'll look into the script more tomorrow, I didn't get it to work right now
because :

ImportError: No module named inc_emos

A lot of the questions you'll get from me and Jonas will be of the educational variant
Comment by Axel Bonet [ 12/Sep/13 ]
a simpler possibility is the following:

def example():
    rel1 = "enlossadm/LoadFcstET_energi_fm_sum:1"
    rel2 = "enlossadm/LoadFcstET_energi_em_sum:1"
    return Family("example").add(
                Time("13:56 19:56 03:00")),

                       Time( "13:55 19:55 03:00" )),

                Repeat("ID", kind="integer", start=1, end=3),
                Trigger(rel1 + " or " + rel2),


Here, the tasks setting the event shall *sleep 60s* before completion to give a chance
to the server to schedule properly the dependencies.

Such pattern would be ok for local tasks, but it may reveal occasionally problematic for
remote jobs.

you may be interested in the page
to learn about the python module we use for suite design.

Comment by Lennart Pontus Kristofer Brander [ 16/Sep/13 ]
I'm running into trouble even if I have access to the modules in

File "/local_disk/prodserver/", line 5, in <module>
import parameters as ip
ImportError: No module named parameters

Worth to point out here is that we run python 2.6.6 at SMHI, and that it isn't upgradeable.

How would you solve the problem with the 2.7 dependency? python 2.7 virtualenv?
Comment by Axel Bonet [ 16/Sep/13 ] is just a file that contains the set of variables that can be used for the
setting of the suite,
and can be shared across suites. This facilitates comparing setting between research and
operation here at the centre.

Comment by Lennart Pontus Kristofer Brander [ 16/Sep/13 ]
Now I get this.
File "/local_disk/prodserver/", line 200, in <module>
SELECTION = sys.argv1
IndexError: list index out of range

I think it would be better if we in the begining of our work on getting to know ecflow can
stay away from ecmwf specific scripts and code. And when we get further along in our
work to use ecflow at SMHI we can look into how we can get help and ideas from how
you have solved certain things.
Comment by Axel Bonet [ 18/Sep/13 ]
Ok, here is the "expended definition" version:

   family example
     family enlossadm
       task LoadFcstET_energi_em_sum
         event 1
         time 13:56 19:56 03:00
       task LoadFcstET_energi_fm_sum
         event 1
        time 13:55 19:55 03:00
      family enlossprod
        trigger enlossadm/LoadFcstET_energi_fm_sum:1 or
        repeat integer ID 1 3
        task Prod_1150
        task Create_Prod_133
          trigger Prod_1150 == complete

Comment by Daniel Varela Santoalla [ 24/Sep/13 ]
Hi Pontus

Is this suggestion finally working for you?

Comment by Lennart Pontus Kristofer Brander [ 25/Sep/13 ]
Actually no! What is does now is when the first event is triggered the family enlossprod
runs three times in a row and then it completes and can't be triggered by the the second
Comment by John Hodkinson [ 02/Oct/13 ]
We found the example you gave a little confusing. The two timers running a minute apart
both of which can trigger a task is unusual, especially if your task can trigger on either of
these two tasks. Are both required? are they backups of each other. I text description of
what you are trying to achieve is normally easier to understand???

Looking at the examples Axel gave in his initial response, my preference would be to
have three families for each instance of the tasks that you want to run during the day. The
advantages are... a failure in once instance will not block the continued running of the
suite and you will preserve the output of the three separate instances.

I would not worry about ecflowview clutter as you can minimise those families you are
not concerned with. In fact our operators reduce clutter by setting up their ecflowview to
only show submitted, running suspended and aborted tasks. This gives them a very
"focussed" view of what is going on.

Comment by Lennart Pontus Kristofer Brander [ 04/Oct/13 ]
I went with the solution to use a copy of the task that need to be triggered twice.

But this solution isn't possible in for instance if I have a cyclic jobs that runs every 10
mins. I'd like a feature that said that if a job is dependent on a trigger that job should
always trigger independent of the state of that job.

t1 runs every 10 mins

t2 triggers when t1 completes

t2 should trigger even if its state is complete or aborted
Comment by Axel Bonet [ 04/Oct/13 ]
Good Afternoon Pontus,

i am still not with you:

if you need the task to run for sure, you can design the shell script to ignore the problem
and design this task as a cron or time dependent task.

If you want to keep memory that there was a problem at some stage with this task,
you can fit it with an event (nok for example) and a "memo task"

task t2
trigger t1 eq complete
cron 00:10
event nok

task memo # this task is just to keep memory of the problem met with t2
trigger t3:nok

task watchdog # script to requeue t2 when aborted
cron 00:10
trigger t2 eq aborted

This pattern consider the case where you design the t2 task wrapper with code like
#ksh script
if [[ ! -f $file ]] ; then xevent nok; sleep 60; fi
cmd || xevent nok # raise the event when the command has a problem
set +e # in some section of the code known to raise issue, closed with set -e

If your environment is such that t2 may become aborted due to a system/submission/disk
space problem
then you need a watchdog task that will requeue it when detected as aborted.

there is the idea in the air for a video conference call, or maybe a site visit,
you can call me before that if you are struggling with such pattern and written hints are
not clear.

kind regards,
Generated at Mon Jun 09 05:26:51 BST 2014 using JIRA 6.2.5#6262-

To top