Persistent job runs (successfully) forever #172

LouisCAD · 2016-06-01T10:17:37Z

Hi!

First, I observed the bug on v2 alpha2 and alpha3 (after refactoring according to changes).

I have one important job that needs to be sent, so I made it serializable, but I'm not using the half-default (half as I saw serialization methods in the Job class) method as if I need to update the Job implementation, a job serialized on an older app version needs to be properly deserialized on a newer version.

I implemented the SerializerProxy pattern (as seen in Effective Java, last item of the 2nd edition) to keep only the crucial information (a timestamp, and id and a boolean), and use the public constructor to ensure backward and forward compatibility.

It seems to work as there's no error during serialization and deserialization.

However, after adding the job once, the JobManager doesn't run it, but, after doing weird things (from my point of view, when I look at the logs), it deserializes the job, runs it, then redeserializes, and run it, and repeat, and repeat, forever, spamming the servers...

I saw that the job is being deserialized and run in loop by JobManager after I added Thread.dumpStack() in the constructor of my Job and analyzed the Stacktrace.

You can see the logs attached, from the user event which triggered the job creation and adding, to the point I had to kill my app to keep a trace of logs. You can clearly see that JobManager is way too chatty (afraid of performance...), and that the same job is being deserialized and run again and again.

Logs AndroidPriorityJobQueue issue.txt

I'll try to use my own JobSerializer, but if it doesn't work, I won't be able to use this library (that seemed awesome) as my app needs to be dogfoodly tested next week, then shipped ASAP...

Here's the relevant code:

The problematic job

package net.suivideflotte.drivercompanion.jobs;

import android.support.annotation.NonNull;

import net.suivideflotte.drivercompanion.DriverApp;
import net.suivideflotte.drivercompanion.event.PrivateModeEvent;
import net.suivideflotte.drivercompanion.model.DeviceStatus;
import net.suivideflotte.drivercompanion.model.DriverInfo;
import net.suivideflotte.drivercompanion.model.ImportantJob;
import net.suivideflotte.drivercompanion.model.PrivateModeSwitchRequest;
import net.suivideflotte.drivercompanion.rest.SDF;

import java.io.InvalidObjectException;
import java.io.ObjectInputStream;
import java.io.Serializable;

import retrofit2.Call;
import timber.log.Timber;

import static net.suivideflotte.drivercompanion.model.DeviceStatus.State.PRIVATE;

/**
 * @see SDF.API#switchPrivateMode(int, PrivateModeSwitchRequest)
 */
public final class SwitchPrivateModeJob extends ApiJobWithEvent<Void, PrivateModeEvent> implements ImportantJob, Serializable {

    private static final long serialVersionUID = 1L;

    private final int mDeviceId;
    private final PrivateModeSwitchRequest mRequest;

    public SwitchPrivateModeJob(boolean switchOn, int deviceId) {
        this(switchOn, deviceId, System.currentTimeMillis());
    }

    private SwitchPrivateModeJob(boolean switchOn, int deviceId, long time) {
        super(true, true);
        mDeviceId = deviceId;
        mRequest = new PrivateModeSwitchRequest(time, switchOn);
        Timber.i("Constructor called");
        Thread.dumpStack();
    }

    @NonNull
    @Override
    protected Call<Void> getCall(SDF.API api) {
        return api.switchPrivateMode(mDeviceId, mRequest);
    }

    @Override
    protected PrivateModeEvent getDoneEvent(Void aVoid) {
        //todo: Save the new private mode state
        if (mRequest.state) {
            DeviceStatus deviceStatus = DriverApp.getStickyEvent(DeviceStatus.class);
            if (deviceStatus != null) deviceStatus.state = PRIVATE;
        } else {
            DriverApp app = (DriverApp) getApplicationContext();
            DriverInfo driverInfo = DriverApp.getStickyEvent(DriverInfo.class);
            if (driverInfo != null) {
                app.getJobManager().addJobInBackground(new FetchDeviceStatusJob(driverInfo.deviceId));
            }
        }
        return PrivateModeEvent.STATE_CHANGED;
    }

    private static final class SerializationProxy implements Serializable {

        private static final long serialVersionUID = 1L;
        private final int deviceId;
        private final boolean switchOn;
        private final long time;

        public SerializationProxy(SwitchPrivateModeJob job) {
            deviceId = job.mDeviceId;
            switchOn = job.mRequest.state;
            time = job.mRequest.at.getTime();
        }

        private Object readResolve() {
            return new SwitchPrivateModeJob(switchOn, deviceId, time);
        }
    }

    private Object writeReplace() {
        return new SerializationProxy(this);
    }

    private void readObject(ObjectInputStream stream) throws InvalidObjectException {
        throw new InvalidObjectException("Use Serialization Proxy instead.");
    }
}

It's superclass

package net.suivideflotte.drivercompanion.jobs;

import android.support.annotation.CallSuper;

import net.suivideflotte.drivercompanion.model.CachedModel;

import java.io.IOException;

import retrofit2.Response;
import timber.log.Timber;

import static net.suivideflotte.drivercompanion.DriverApp.post;
import static net.suivideflotte.drivercompanion.DriverApp.postSticky;

/**
 * @see net.suivideflotte.drivercompanion.rest.SDF.API
 */
public abstract class ApiJobWithEvent<ResponseBody, Event> extends AbstractApiJob<ResponseBody> {

    public ApiJobWithEvent() {
        this(Priority.HIGH, false);
    }

    public ApiJobWithEvent(boolean postSticky) {
        super(postSticky);
    }

    public ApiJobWithEvent(boolean postSticky, boolean persistent) {
        super(postSticky, persistent);
    }

    public ApiJobWithEvent(@Priority int priority, boolean postSticky) {
        super(priority, postSticky);
    }

    public ApiJobWithEvent(@Priority int priority, boolean postSticky, boolean persistent) {
        super(priority, postSticky, persistent);
    }

    @Override
    public void onAdded() {

    }

    @CallSuper
    @Override
    protected void onCallExecuted(Response<ResponseBody> response) throws Throwable {
        if (!response.isSuccessful()) {
            throw new IOException("response unsuccessful. code" + response.code());
        }
        Event event = getDoneEvent(response.body());
        if (event == null) {
            Timber.tag(getClass().getSimpleName()).wtf("done event is null!");
        } else {
            if (event instanceof CachedModel) {
                ((CachedModel) event).toSharedPrefs(getApplicationContext());
            }
            if (mPostSticky) postSticky(event);
            else post(event);
        }
    }

    protected abstract Event getDoneEvent(ResponseBody body);
}

The superclass of it's superclass

package net.suivideflotte.drivercompanion.jobs;

import android.support.annotation.NonNull;
import android.support.annotation.Nullable;

import com.birbit.android.jobqueue.CancelReason;
import com.birbit.android.jobqueue.Job;
import com.birbit.android.jobqueue.Params;
import com.birbit.android.jobqueue.RetryConstraint;

import net.suivideflotte.drivercompanion.DriverApp;
import net.suivideflotte.drivercompanion.model.ImportantJob;
import net.suivideflotte.drivercompanion.rest.SDF;

import retrofit2.Call;
import retrofit2.Response;

/**
 * @see ApiJobWithEvent
 */
public abstract class AbstractApiJob<ResponseBody> extends Job {

    @Nullable
    private Call<ResponseBody> mCall;

    protected final boolean mPostSticky;

    public AbstractApiJob() {
        this(Priority.HIGH, false);
    }

    public AbstractApiJob(boolean postSticky) {
        this(Priority.HIGH, postSticky);
    }

    public AbstractApiJob(boolean postSticky, boolean persistent) {
        this(Priority.HIGH, postSticky, persistent);
    }

    public AbstractApiJob(@Priority int priority, boolean postSticky) {
        this(priority, postSticky, false);
    }

    public AbstractApiJob(@Priority int priority, boolean postSticky, boolean persistent) {
        super(new Params(priority).requireNetwork().setPersistent(persistent));
        mPostSticky = postSticky;
    }

    @Override
    public void onAdded() {

    }

    @Override
    public void onRun() throws Throwable {
        if (mCall == null || mCall.isExecuted()) mCall = getCall(DriverApp.getApi());
        Response<ResponseBody> response = mCall.execute();
        onCallExecuted(response);
        mCall = null;
    }

    protected abstract void onCallExecuted(Response<ResponseBody> response) throws Throwable;

    @NonNull
    protected abstract Call<ResponseBody> getCall(SDF.API api);

    @Override
    protected void onCancel(@CancelReason int cancelReason, @Nullable Throwable throwable) {
        if (mCall != null) mCall.cancel();
    }

    @Override
    protected RetryConstraint shouldReRunOnThrowable(@NonNull Throwable throwable, int runCount, int maxRunCount) {
        if (this instanceof ImportantJob) return RetryConstraint.createExponentialBackoff(runCount, 3000L);
        if (runCount > maxRunCount) return RetryConstraint.CANCEL;
        return RetryConstraint.createExponentialBackoff(runCount, 3000L);
    }
}

The code used in my Application class to initialize the JobManager

private void configureJobManager() {
        JobManagerLogger jobManagerLogger = new JobManagerLogger();
        Configuration config = new Configuration.Builder(this)
                .customLogger(jobManagerLogger)
                .minConsumerCount(1)
                .maxConsumerCount(3)
                .consumerKeepAlive(120)
                .build();
        mJobManager = new JobManager(config);
        mJobManager.addCallback(jobManagerLogger);
    }

The logger seen in the JobManager Configurationg.Builder

package net.suivideflotte.drivercompanion.util;

import android.support.annotation.NonNull;
import android.support.annotation.Nullable;

import com.birbit.android.jobqueue.Job;
import com.birbit.android.jobqueue.callback.JobManagerCallback;
import com.birbit.android.jobqueue.log.CustomLogger;

import net.suivideflotte.drivercompanion.BuildConfig;

import timber.log.Timber;

public class JobManagerLogger implements CustomLogger, JobManagerCallback {

    @Override
    public boolean isDebugEnabled() {
        // It seems that logs are still produced with false... Better for the issue report though.
        return BuildConfig.DEBUG && false;
    }

    @Override
    public void d(String text, Object... args) {
        Timber.d(text, args);
    }

    @Override
    public void e(Throwable t, String text, Object... args) {
        Timber.e(t, text, args);
    }

    @Override
    public void e(String text, Object... args) {
        Timber.e(text, args);
    }

    @Override
    public void v(String text, Object... args) {
        Timber.v(text, args);
    }

    @Override
    public void onJobAdded(@NonNull Job job) {
        Timber.i("onJobAdded(%s)", job);
    }

    @Override
    public void onJobRun(@NonNull Job job, int resultCode) {
        Timber.i("onJobRun(%s, %s)", job, getResultCodeString(resultCode));
    }

    @Override
    public void onJobCancelled(@NonNull Job job, boolean byCancelRequest, @Nullable Throwable throwable) {
        Timber.i(throwable, "onJobCancelled(%s, %b, ...)", job, byCancelRequest);
    }

    @Override
    public void onDone(@NonNull Job job) {
        Timber.i("onDone(%s)", job);
    }

    @Override
    public void onAfterJobRun(@NonNull Job job, int resultCode) {
        Timber.i("onAfterJobRun(%s, %s)", job, getResultCodeString(resultCode));
    }

    @NonNull
    private String getResultCodeString(int resultCode) {
        String resultStr;
        switch (resultCode) {
            case RESULT_SUCCEED:
                resultStr = "RESULT_SUCCEED";
                break;
            case RESULT_CANCEL_REACHED_RETRY_LIMIT:
                resultStr = "RESULT_CANCEL_REACHED_RETRY_LIMIT";
                break;
            case RESULT_CANCEL_CANCELLED_WHILE_RUNNING:
                resultStr = "RESULT_CANCEL_CANCELLED_WHILE_RUNNING";
                break;
            case RESULT_CANCEL_SINGLE_INSTANCE_WHILE_RUNNING:
                resultStr = "RESULT_CANCEL_SINGLE_INSTANCE_WHILE_RUNNING";
                break;
            case RESULT_CANCEL_CANCELLED_VIA_SHOULD_RE_RUN:
                resultStr = "RESULT_CANCEL_CANCELLED_VIA_SHOULD_RE_RUN";
                break;
            case RESULT_FAIL_WILL_RETRY:
                resultStr = "RESULT_FAIL_WILL_RETRY";
                break;
            default:
                throw new IllegalArgumentException("unknown resultCode: " + resultCode);
        }
        return resultStr;
    }
}

The code that created the Job

mApp.getJobManager().addJobInBackground(new SwitchPrivateModeJob(true, driverInfo.deviceId));

The text was updated successfully, but these errors were encountered:

LouisCAD · 2016-06-01T16:21:29Z

I tried using another serialization method (use ByteBuffer manually), and while the serialization and deserialization both work correctly, I still see the same behavior. I really wonder if I did something wrong, and anyway, I wonder what is wrong.

yigit · 2016-06-02T05:20:23Z

about deserialization, you don't seem to assign it in the configuration. You need to implement the JobSerializer interface and let the JobManager call it when necessary.
http://yigit.github.io/android-priority-jobqueue/javadoc/com/birbit/android/jobqueue/config/Configuration.Builder.html#jobSerializer(com.birbit.android.jobqueue.persistentQueue.sqlite.SqliteJobQueue.JobSerializer)

About going crazy, thats unnacceptable. I'll check the logs now but meanwhile, is it possible for you to extract the part that goes crazy so i can debug it in an app?

LouisCAD · 2016-06-02T06:57:53Z

@yigit I use the default configuration, which should automatically add a JavaSerializer, right? I also tried specifying explicitly to use a new JavaSerializer (the one written in the library) in the Configuration, but it seems to give the same behavior...

About an app so you can debug, I'm looking at what I can do.

yigit · 2016-06-02T07:06:17Z

actually, i'm afraid i broke it in v2 because base class was depending on it.
I'm changing it in #174 and making base class fully transient (since all information is already saved in the JobHolder).
Maybe it will explain the inconsistent logs. I'm writing it right now, we can try to give it a quick run.

LouisCAD · 2016-06-02T07:08:53Z

BTW, the problem doesn't seems to be serialization/deserialization itself, but what JobManager does with it. I could use hardcoded ByteBuffer serialization with a custom JobSerializer and still got the problem.

The behavior is that when the job is added (in background), it's serialized and persisted, but then, instead of just running (with available network) what it already have in memory, it doesn't do it, but retrieves the just added Job from the storage, deserializes it, which is already a wrong behavior, then starts running the job, then redoes these steps (retrieving from storage, running, and again...)

yigit · 2016-06-02T07:11:11Z

it is not a mistake that it always deserializers, rather a design decision to make things consistent.
People can easily make a mistake and have a class that does not serialize/deserialize properly. IN that case, if JobManager was using in memory job, it would hide errors in the app that only surface if job manager restarts before job is run.

LouisCAD · 2016-06-02T07:13:12Z

But it uses the storage unnecessarily. Shouldn't this just be a check which would be run only in Debug to detect serialization issues ASAP?

LouisCAD · 2016-06-02T07:21:57Z

Do you think using the v1 for now would be the best quick workaround for this bug so I can meet my deadline set to tomorrow evening (GMT+2)? I'm wondering if the backwards refactoring is worthwhile.

yigit · 2016-06-02T07:22:33Z

no. It is not unnecessary storage. For a persistent job, it HAS TO BE saved to disk before it can be accepted.
What happens if app crashes after onAdded is called before job is persisted?

JobManager makes the guarantee that when jobManager.add returns, it is saved to disk for sure and will survive app crash.

LouisCAD · 2016-06-02T07:24:21Z

@yigit I know it has to be saved, I was talking about reading from storage it just after, and deserialize it before running it while the app didn't stop.

yigit · 2016-06-02T07:35:13Z

Oh sorry but still it is better to make it consistent.
About v1 downgrade, it is really hard to comment on. It is a big change, maybe you just let it go crazy for the demo since it all happens behind the scenes.

yigit · 2016-06-02T08:33:08Z

i pushed a version into https://github.com/yigit/android-priority-jobqueue/tree/transient-job .
You can locally checkout the branch,
Change the AndroidManifest.xml version to android:versionName="2.0.0-alpha4-SNAPSHOT"
and then run ./gradlew upArc to create a local version.
Then use it in your app. I cannot push this yet because code needs a cleanup but tests are passing.

yigit · 2016-06-02T08:36:30Z

actually, i pushed a one off repo here:
https://drive.google.com/open?id=0BzQe_bXN4AuJOHB5OWtfZ2RSS2c

You can download the zip file, unzip to a folder and add it as a local maven repository.

Btw, this includes another WIP change that keeps job data on files rather than sqlite. That is also not ready to be merged but should be OK for you today.

Lmk if you get a chance to try with it. At least we can figure out whether serialization is the issue or not.

LouisCAD · 2016-06-02T09:00:25Z

Thank you very much for this, I'm going to try it out!

In the meantime, I made a Sample to reproduce the issue, and I got the bug!

How I saw the bug in my sample:

I had added all the libraries, made a Job using GitHub REST API, but I forgot to include the LoganSquare library (which generates model parsing code). The sources were in the third party retrofit library, so it compiled and run, but the code wasn't generated, making the job fail.
I added the required dependencies, run my app again (it did not use Instant Run), and BOOM!
The job which failed was properly serialized, so JobManager took care of it at restart. Once. Once again. And again, etc.

I hop GitHub won't get mad because of this small DDOS attack... The red cross to kill the app in Android Studio is very useful in this case.

You can download the sample project as a zip here: https://drive.google.com/open?id=0B59c-Qkht7tYSTRpOEJoTDMxWDA

It also includes the logs I got, but I guess the buffer was too small to contain the first lines, I couldn't be faster to kill the app 😆

LouisCAD · 2016-06-02T10:07:07Z

@yigit Ok, I tried with your alpha4 SNAPSHOT... and It's now working perfectly (even after killing and restarting the app) on the sample I linked above! I'm trying with my app, but I guess it'll work perfectly too 😄

Thanks a lot for your help!!

BTW, I'm wondering about backward compatibility for old jobs from previous versions (v2 alpha1,2,3 and v1) that may be waiting in the queue when the app will be updated with post v2 alpha4 version... Fortunately, I'm not in this case as it's first release, but I wonder if jobs get lost, or anything else, for developers that may be using your library and will update.

yigit · 2016-06-03T02:41:55Z

Cool so to clarify, once #174 gets in, your issue is gone right? That means we can single out this issue to custom serialization which is good.

About backward compatibility, since v2 is alpha, i'm not paying too much attention on that (v2 is incompatible with v1). Also, with #174, it will be a lot easier since changes in BaseJob will not affect the app's compatibility.

Glad it is resolved.

LouisCAD · 2016-06-03T07:22:28Z

Yes, I guess your next version will close this issue.

LouisCAD mentioned this issue Jun 1, 2016

JobManager going SERIOUSLY MAD #173

Closed

yigit closed this as completed Jun 5, 2016

LouisCAD mentioned this issue Jun 22, 2016

Persistence guarantee #196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent job runs (successfully) forever #172

Persistent job runs (successfully) forever #172

LouisCAD commented Jun 1, 2016

LouisCAD commented Jun 1, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016 •

edited

Loading

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

yigit commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 3, 2016

LouisCAD commented Jun 3, 2016

Persistent job runs (successfully) forever #172

Persistent job runs (successfully) forever #172

Comments

LouisCAD commented Jun 1, 2016

LouisCAD commented Jun 1, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016 • edited Loading

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 2, 2016

yigit commented Jun 2, 2016

yigit commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

LouisCAD commented Jun 2, 2016

yigit commented Jun 3, 2016

LouisCAD commented Jun 3, 2016

yigit commented Jun 2, 2016 •

edited

Loading