New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paused jobs are not restored after app restart #107
Comments
To elaborate a little:
|
Hi, Sorry for the delay on it. Are you adding the jobs every time the app runs? If so, are you using the flag replace_existing=True? |
Hi thanks to reply. |
Would you be able to provide an example code that replicates the issue? I'm trying to replicate the issue using the advanced example using a persisted sqlite database like The only way I was able to reproduce the issue was using adding the jobs every time the app runs and using the flag |
Here is an example: import dirkules.config as config
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_apscheduler import APScheduler
app = Flask(__name__)
app.config.from_object(config)
db = SQLAlchemy(app)
import dirkules.models
db.create_all()
scheduler = APScheduler()
from dirkules import tasks
scheduler.init_app(app)
scheduler.start()
import dirkules.views tasks.py from dirkules import scheduler
import datetime
@scheduler.task('interval', id='refresh_disks', seconds=2, next_run_time=datetime.datetime.now())
def refresh_disks():
print("Executed") config.py import os
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
#from apscheduler.jobstores.memory import MemoryJobStore
baseDir = os.path.abspath(os.path.dirname(__file__))
staticDir = os.path.join(baseDir, 'static')
SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(baseDir, 'dirkules.db')
SQLALCHEMY_TRACK_MODIFICATIONS = False
# The SCHEDULER_JOB_DEFAULTS configuration is per job, that means each job can execute at most 3 threads at the same time.
# The SCHEDULER_EXECUTORS is a global configuration, in this case, only 1 thread will be used for all the jobs.
# I believe the best way for you is to use max_workers: 1 when running locally
SCHEDULER_JOBSTORES = {'default': SQLAlchemyJobStore(url='sqlite:///' + os.path.join(baseDir, 'dirkules.db'))}
#SCHEDULER_JOBSTORES = {'default': MemoryJobStore()}
SCHEDULER_EXECUTORS = {'default': {'type': 'threadpool', 'max_workers': 3}}
SCHEDULER_JOB_DEFAULTS = {'coalesce': False, 'max_instances': 1}
SCHEDULER_API_ENABLED = True views.py from dirkules import scheduler
from flask import Flask, render_template
@app.route('/', methods=['GET'])
def index():
scheduler.pause_job("refresh_disks")
return render_template('index.html') If I put the line from dirkules import tasks behind scheduler.start() I'm getting the following error message, because the job is unknown while reconstructing. Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 141, in _get_jobs
jobs.append(self._reconstitute_job(row.job_state))
File "/usr/local/lib/python3.6/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 128, in _reconstitute_job
job.__setstate__(job_state)
File "/usr/local/lib/python3.6/dist-packages/apscheduler/job.py", line 272, in __setstate__
self.func = ref_to_obj(self.func_ref)
File "/usr/local/lib/python3.6/dist-packages/apscheduler/util.py", line 292, in ref_to_obj
raise LookupError('Error resolving reference %s: error looking up object' % ref)
LookupError: Error resolving reference dirkules.tasks:refresh_disks: error looking up object
[2019-05-19 17:07:40 +0200] [4078] [INFO] Booting worker with pid: 4078
[2019-05-19 17:08:11 +0200] [4005] [CRITICAL] WORKER TIMEOUT (pid:4078)
[2019-05-19 17:08:11 +0200] [4078] [INFO] Worker exiting (pid: 4078)
Unable to restore job "refresh_disks" -- removing it
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/apscheduler/util.py", line 289, in ref_to_obj
obj = getattr(obj, name)
AttributeError: module 'dirkules.tasks' has no attribute 'refresh_disks' Maybe next_run_time overrides the paused job setting? |
Thanks for the code above. When using the decorator @scheduler.task, the task gets replaced every time you run the app, this is out of my control as this logic comes from APScheduler library. |
Thanks a lot for your help so far. |
The only difference on creating the job inside the config.py is that you can control the flag Using A workaround would be to change the SQLALchemyJobStore to ignore the conflict error. from apscheduler.jobstores.base import ConflictingIdError
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
class SQLJobStore(SQLAlchemyJobStore):
def add_job(self, job):
try:
super().add_job(job)
except ConflictingIdError:
pass
SCHEDULER_JOBSTORES = {'default': SQLJobStore(url='sqlite:///' + os.path.join(baseDir, 'dirkules.db'))} |
Hi just another little question @viniciuschiele : My job definition inside config.py looks like the following:
|
I'm not able to reproduce the error, I know that SQLite doesn't allow multi-threads, for some reason your code is shutting down the scheduler in a different thread/ Try disabling the check for multi-threads to see what is going to happen. SCHEDULER_JOBSTORES = {
'default': SQLAlchemyJobStore(url='sqlite:///test.db', engine_options={
'connect_args': {'check_same_thread': False}
})
} |
I would like to find out why this happens if you are not able to reproduce... import os
import datetime
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
# from apscheduler.jobstores.memory import MemoryJobStore
baseDir = os.path.abspath(os.path.dirname(__file__))
staticDir = os.path.join(baseDir, 'static')
SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(baseDir, 'dirkules.db')
SQLALCHEMY_TRACK_MODIFICATIONS = False
# The SCHEDULER_JOB_DEFAULTS configuration is per job, that means each job can execute at most 3 threads at the same time.
# The SCHEDULER_EXECUTORS is a global configuration, in this case, only 1 thread will be used for all the jobs.
# I believe the best way for you is to use max_workers: 1 when running locally
SCHEDULER_JOBSTORES = {'default': SQLAlchemyJobStore(url='sqlite:///' + os.path.join(baseDir, 'dirkules_tasks.db'))}
# SCHEDULER_JOBSTORES = {'default': MemoryJobStore()}
SCHEDULER_EXECUTORS = {'default': {'type': 'threadpool', 'max_workers': 3}}
SCHEDULER_JOB_DEFAULTS = {'coalesce': False, 'max_instances': 1}
SCHEDULER_API_ENABLED = True
# should not be here in final version
SECRET_KEY = b'gf3iz3V!R83@Ny!ri'
JOBS = [
{
'id': 'refresh_disks',
'func': 'dirkules.tasks:refresh_disks',
'trigger': 'interval',
'next_run_time': datetime.datetime.now(),
'replace_existing': True,
'seconds': 3600
}
] init.py import dirkules.config as config
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_apscheduler import APScheduler
app = Flask(__name__)
app.config.from_object(config)
db = SQLAlchemy(app)
import dirkules.models
# create db if not exists
db.create_all()
# start scheduler
scheduler = APScheduler()
scheduler.init_app(app)
scheduler.start()
# import views
import dirkules.views tasks.py import dirkules.manager.driveManager as drive_man
def refresh_disks():
drive_man.get_drives()
print("Drives haha refreshed") and finally the get_drives function used in tasks. It reads some drives and stores them in dirkules.db def get_drives():
current_time = datetime.datetime.now()
drive_dict = hardware_drives.get_all_drives()
for drive in drive_dict:
drive_obj = Drive(
drive.get("name"), drive.get("model"), drive.get("serial"),
drive.get("size"), drive.get("rota"), drive.get("rm"),
drive.get("hotplug"), drive.get("state"), drive.get("smart"), current_time)
db.session.add(drive_obj)
db.session.commit() May the problem occur because I'm accessing a diffrent database inside the job from another database? With Thanks for your help! |
Your code above works just fine for me. Do you see this error when the task runs or when you shutdown the scheduler? |
Only on shutdown (using ctrl + c). Here is the hole message. Maybe it is more helpful. Drives haha refreshed
^CException during reset or similar
Traceback (most recent call last):
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
fairy._reset(pool)
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
pool._dialect.do_rollback(self)
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 502, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140400320026368 and this is thread id 140400438904640.
Exception closing connection <sqlite3.Connection object at 0x7fb17e9fec70>
Traceback (most recent call last):
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 680, in _finalize_fairy
fairy._reset(pool)
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 867, in _reset
pool._dialect.do_rollback(self)
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 502, in do_rollback
dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140400320026368 and this is thread id 140400438904640.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 270, in _close_connection
self._dialect.do_close(connection)
File "/home/daniel/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 508, in do_close
dbapi_connection.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140400320026368 and this is thread id 140400438904640. You could just clone my repo. I'm using gunicorn3 to start the project. But that shouldn't cause the issue. |
I just took a look of the threads, using [<_MainThread(MainThread, started 140396238772032)>, <Thread(APScheduler, started daemon 140396127033088)>, <Thread(ThreadPoolExecutor-0_0, started daemon 140396118640384)>] On shutdown: sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140396118640384 and this is thread id 140396238772032. The If I enable DEBUG logging using: import logging
logging.basicConfig()
logging.getLogger('apscheduler').setLevel(logging.DEBUG) I see the following in my trace: |
Adding |
The error happens because in the last lines, you leave a session uncommitted when the variable So instead of this: old_drives = db.session.query(Drive).filter(Drive.last_update != current_time).all()
if old_drives:
for drive in old_drives:
drive.missing = True
db.session.commit() Use that: old_drives = db.session.query(Drive).filter(Drive.last_update != current_time).all()
if old_drives:
for drive in old_drives:
drive.missing = True
db.session.commit() This error also happens when an unhandled exception occurs within the task leaving the session uncommitted, so to prevent this error you need to have a try/except that rollback the session. try:
...
except:
db.session.rollback() |
Okay! Thanks a lot for your help! |
Well, you have to always either commit or rollback a session even in queries, those actions are going to release the session (state/resources) from memory. Flask-SQLAlchemy already does that for you after a request, it calls WIthin tasks (when using APScheduler), sessions are kept alive if you don't call SQLite is "single-thread", only the thread that creates the session is able to close it, that is why you get this error, the thread that creates the sessions is not the main one. I hope that helps. |
Thanks for your explanation! |
Hi,
I am trying to build my own little Flask app to manage the cleaning of specific folders on my home server.
I use the sqlalchemy jobstore, to persist my jobs, so I created a file, containing diffrent methods for diffrent jobs - all are using the intervall trigger.
Now I want to be able to disable the cleaning of some folders. So, I wrote a little web app, which does this fine, but the behavior of apschedule is a bit strange:
When I disable a job, the next runtime in the database is set to Null. If I perform a restart (of my server), all paused jobs are getting a new next runtime.
After my opinion the jobstate (paused/not paused) should be persited over restarts. Is there a way to get this behavior?
The text was updated successfully, but these errors were encountered: