Feature/transcipt worker using whisper#7
Conversation
📝 WalkthroughWalkthroughThis pull request introduces an AI-powered transcription worker to the video processing pipeline. It adds a new Changes
Sequence DiagramsequenceDiagram
actor Transcode as Transcode Worker
participant Queue as Transcribe Queue
participant Worker as Transcribe Worker
participant S3 as S3 Storage
participant FFmpeg as FFmpeg
participant Python as Whisper Python
participant Database as PostgreSQL
Transcode->>Queue: Enqueue transcribe job<br/>{fileId, userId}
Queue->>Worker: Trigger job
Worker->>Database: Update status = 'processing'
Worker->>S3: Get presigned URL for<br/>original video
S3-->>Worker: Presigned URL
Worker->>Worker: Download video file
Worker->>FFmpeg: Extract mono 16kHz WAV
FFmpeg-->>Worker: Audio file path
Worker->>Python: Run faster-whisper<br/>subprocess with script
Python-->>Worker: JSON transcript result
Worker->>S3: Upload transcript JSON
S3-->>Worker: S3 transcript key
Worker->>Database: Update transcript_key &<br/>status = 'completed'
Worker->>Worker: Cleanup temp files
Worker-->>Queue: Job completed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Around line 51-72: The WHISPER_MODEL value is baked at image build (ARG/ENV)
but can be overridden at runtime via process.env, causing mismatches with
pre-downloaded models; to fix, prevent runtime overrides by removing
WHISPER_MODEL from runtime env sources (remove it from server/.env and from the
docker-compose environment block) so the container uses the build-time ENV only,
and add a short inline comment next to the ai-worker block documenting that
changing WHISPER_MODEL requires rebuilding the image; reference WHISPER_MODEL,
the ai-worker service in docker-compose, and
runWhisper/process.env.WHISPER_MODEL in
server/src/services/transcribe.service.ts to locate relevant code.
In `@server/Dockerfile`:
- Around line 60-68: The Dockerfile ai-worker stage currently installs system
packages but leaves the container running as root and has no HEALTHCHECK; add a
dedicated non-root user and a simple healthcheck: create a user/group (e.g.,
aiuser), add a WORKDIR, ensure required directories are owned by that user
(chown), switch to USER aiuser before finalizing the image, and add a
HEALTHCHECK that runs a lightweight command verifying binaries are present
(e.g., ffmpeg --version and python3 -m venv or python3 -c checks) so the
orchestrator can detect a wedged container; update the ai-worker stage around
the existing RUN that installs ffmpeg/python3 to create the user, set ownership,
and append a HEALTHCHECK instruction.
- Around line 75-85: Pin the faster-whisper dependency and set an explicit
HF_HOME cache path so builds are reproducible and the pre-download step caches
weights in a known location; change the pip install to install a fixed version
(e.g., faster-whisper==1.2.1) and add an ENV or ARG HF_HOME (e.g.,
HF_HOME=/app/.cache/huggingface) before the pre-download RUN, and ensure the
pre-download Python invocation that constructs WhisperModel('${WHISPER_MODEL}',
device='cpu', compute_type='int8') runs with HF_HOME exported so model weights
are stored under $HF_HOME (use the existing VIRTUAL_ENV and WHISPER_MODEL
symbols to locate the relevant blocks).
In `@server/drizzle/0001_glorious_shooting_star.sql`:
- Around line 1-2: The migration currently adds "transcript_key" to videoTable
and immediately adds a foreign key constraint videoTable_user_id_userTable_id_fk
which will validate all rows and take an ACCESS EXCLUSIVE lock; change the FK
addition to a two-step approach by adding the constraint as NOT VALID first and
then running VALIDATE CONSTRAINT videoTable_user_id_userTable_id_fk to avoid
long locks on large tables, and decide whether the delete semantics should be
changed from ON DELETE NO ACTION to ON DELETE CASCADE (or document/implement a
soft-delete alternative) if user deletes must remove related videos—update the
migration to implement the NOT VALID/VALIDATE pattern and adjust the ON DELETE
clause accordingly.
In `@server/src/models/video.model.ts`:
- Line 12: The new foreign key on videoTable (userId referencing userTable.id)
lacks an explicit onDelete setting so Drizzle defaults to NO ACTION; update the
column definition for userId to include references(() => userTable.id, {
onDelete: '<choice>' }) with your chosen behavior: use 'no action' to make the
current behavior explicit, 'cascade' to delete videos when a user is removed, or
'set null' (and remove .notNull() from userId) to orphan videos on user
deletion; also ensure existing videoTable.user_id values reference userTable
rows before running the migration to avoid failures.
In `@server/src/services/transcribe.service.ts`:
- Around line 119-137: The uploadTranscript function currently calls
s3Client.send with Bucket set from process.env.S3_BUCKET_NAME which may be
undefined; add a validation that ensures S3_BUCKET_NAME is present before using
it (preferably at module initialization/startup) and throw a descriptive
Error('S3_BUCKET_NAME is not set') if missing; update the uploadTranscript
function (referencing uploadTranscript, s3Client, and PutObjectCommand) to use
the validated bucket value or rely on the early fail so that no call is made to
the AWS SDK with an undefined Bucket.
- Around line 93-114: The current spawn usage (spawn(PYTHON_BIN, …) -> proc)
lacks an 'error' listener and tries to JSON.parse the entire stdout, which can
hang on synchronous spawn failures and break if libs emit noise to stdout; add a
proc.on('error', (err) => reject(err)) handler to ensure the Promise rejects on
spawn/exec errors, and change the parsing logic in the proc.on('close', ...)
handler to locate the last non-empty line of stdout (trim, split by newline,
take last non-empty) and JSON.parse that line instead of the whole stdout; keep
the existing stderr aggregation and include stderr/last-stdout-line in
parse/reject error messages for debugging (references: spawn, proc, PYTHON_BIN,
stdout, stderr, proc.on('close')).
- Around line 42-53: The current download loads the whole MP4 into memory via
axios.get(..., { responseType: 'arraybuffer' }) and then writes it with
fs.writeFileSync, which can OOM; change the flow in the function that calls
getDownloadUrl/signedUrl to request the file as a stream (axios.get(signedUrl, {
responseType: 'stream' })) and pipe the response.data into a
fs.createWriteStream(videoPath) (ensure downloadsDir exists as you already do),
await the stream's 'finish'/'error' with a Promise before returning videoPath,
and remove the arraybuffer + writeFileSync usage so the file is streamed
directly to disk.
In `@server/src/workers/hls.worker.ts`:
- Around line 68-74: The current hlsWorker.on("failed", ...) handler updates
videoTable.status to 'failed' on every attempt; change it to only mark final
failure after retries are exhausted by checking the job's attempts versus
configured attempts (use job.attemptsMade and job.opts?.attempts or similar) and
only set status: 'failed' when attemptsMade >= job.opts.attempts; otherwise only
update hlsStatus to 'failed' (or leave status unchanged). Update the handler
referenced as hlsWorker.on("failed", async (job, err) => { ... }) and ensure the
DB update uses that conditional logic so transient failures don't permanently
flip videoTable.status.
In `@server/src/workers/thumbnail.worker.ts`:
- Around line 42-48: The failed-event handler thumbnailWorker.on("failed", ...)
is prematurely setting videoTable.status = 'failed' on every failed attempt;
change it to only mark the DB as failed when the job has exhausted retries by
checking job.attemptsMade >= (job.opts.attempts ?? 1) (same pattern used in
transcode.worker.ts). Wrap the existing db.update(...) and
thumbnailStatus/status writes inside that conditional, keeping the console.log
for visibility but avoid updating the video row until the attempts check passes,
and reference the job, job.attemptsMade, job.opts.attempts,
thumbnailWorker.on("failed") and db.update(videoTable) to locate and modify the
code.
In `@server/src/workers/transcode.worker.ts`:
- Around line 98-100: The transcribe job (and similarly hlsQueue.add and
thumbnailQueue.add) can be enqueued multiple times on retries because no
deterministic jobId is provided; update the calls to
transcribeQueue.add("transcribe", { fileId, userId }) to include a deterministic
jobId (e.g. `jobId: \`transcribe:${fileId}\``) so BullMQ will dedupe retries,
and apply the same pattern to hlsQueue.add and thumbnailQueue.add (e.g.
`hls:${fileId}`, `thumbnail:${fileId}`); ensure you pass the jobId option in the
same call site in transcode.worker.ts and keep any existing job data
(fileId/userId) unchanged.
In `@server/src/workers/transcribe.queue.ts`:
- Around line 12-14: The queue options currently set removeOnFail: false which
retains every failed transcription job indefinitely; update the transcribe queue
options (the removeOnFail setting in the queue configuration in
server/src/workers/transcribe.queue.ts) to a bounded retention instead — e.g.,
set removeOnFail to a number (keep last N failed jobs) or to an object with
retention rules (like { count: 100 } or { age: 7 * 24 * 3600 }) so failed jobs
are automatically removed after the bound; keep the symbol name removeOnFail and
adjust only its value in the queue options block.
In `@server/src/workers/transcribe.worker.ts`:
- Around line 14-49: The worker's pipeline (transcribeWorker) can throw before
cleanupTranscribeFiles(videoPath, audioPath) is called, leaking temp files;
refactor the job handler to declare videoPath and audioPath in the outer scope,
wrap the main steps (downloadOriginalForTranscription, extractAudio, runWhisper,
uploadTranscript and DB updates) in a try block and call
cleanupTranscribeFiles(videoPath, audioPath) in a finally block so files are
always removed even on errors; keep the existing failed handler for DB status
updates but do not rely on it for file cleanup.
- Around line 55-60: The failed event handler for transcribeWorker can throw
when job or job.data is undefined; update the handler in
transcribeWorker.on("failed", ...) to guard access to job and job.data.fileId:
first check whether job is present and extract fileId with a safe check (e.g.,
job && job.data && job.data.fileId), log a clear message including err and
job?.id when job is missing or deserialization/stall occurred, and only call
db.update(videoTable).set(...).where(eq(videoTable.id, fileId)) when fileId is
defined; if fileId is missing, log a warning/error instead of issuing an update
so you don't execute eq(videoTable.id, undefined).
- Around line 47-49: The Worker instantiation is using an unnecessary type
assertion "as any" on the redis connection; remove the cast so the real IORedis
instance is passed to the Worker options (replace connection: redis as any with
connection: redis) to preserve type information and let the compiler validate
the IORedis type used by the Worker constructor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e67d5e68-1cb8-41c6-8bdc-21be632611d2
📒 Files selected for processing (14)
docker-compose.ymlserver/Dockerfileserver/drizzle/0001_glorious_shooting_star.sqlserver/drizzle/meta/0001_snapshot.jsonserver/drizzle/meta/_journal.jsonserver/src/ai-worker.tsserver/src/models/video.model.tsserver/src/services/transcode.service.tsserver/src/services/transcribe.service.tsserver/src/workers/hls.worker.tsserver/src/workers/thumbnail.worker.tsserver/src/workers/transcode.worker.tsserver/src/workers/transcribe.queue.tsserver/src/workers/transcribe.worker.ts
| ALTER TABLE "videoTable" ADD COLUMN "transcript_key" text;--> statement-breakpoint | ||
| ALTER TABLE "videoTable" ADD CONSTRAINT "videoTable_user_id_userTable_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."userTable"("id") ON DELETE no action ON UPDATE no action; No newline at end of file |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Confirm FK addition strategy and delete semantics.
Two small things worth confirming before this hits a populated environment:
ADD CONSTRAINT ... FOREIGN KEY ... REFERENCES ...validates every existing row and takes anACCESS EXCLUSIVElock onvideoTable. On a large table this blocks reads/writes. If non-trivial data already exists in the target environment, prefer a two-stepADD CONSTRAINT ... NOT VALID;followed byVALIDATE CONSTRAINT ...;.ON DELETE no actionmeans deleting auserTablerow with any videos will raise a referential-integrity error instead of cascading. If user deletion is a real flow (GDPR/account deletion), you probably wantON DELETE CASCADE(or an explicit soft-delete/archival path). Please confirm this is the intended semantic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/drizzle/0001_glorious_shooting_star.sql` around lines 1 - 2, The
migration currently adds "transcript_key" to videoTable and immediately adds a
foreign key constraint videoTable_user_id_userTable_id_fk which will validate
all rows and take an ACCESS EXCLUSIVE lock; change the FK addition to a two-step
approach by adding the constraint as NOT VALID first and then running VALIDATE
CONSTRAINT videoTable_user_id_userTable_id_fk to avoid long locks on large
tables, and decide whether the delete semantics should be changed from ON DELETE
NO ACTION to ON DELETE CASCADE (or document/implement a soft-delete alternative)
if user deletes must remove related videos—update the migration to implement the
NOT VALID/VALIDATE pattern and adjust the ON DELETE clause accordingly.
| // Dispatch transcription job — picked up by the ai-worker container | ||
| await transcribeQueue.add("transcribe", { fileId, userId }); | ||
| console.log(`🎙️ Transcription job queued for fileId: ${fileId}`); |
There was a problem hiding this comment.
Transcribe job can be enqueued multiple times on transcode retries — use a deterministic jobId.
The worker body is not idempotent across retries: if anything between line 75 (thumbnailQueue.add) and line 103 (the final DB status update) throws, BullMQ will retry the whole handler and this transcribeQueue.add("transcribe", …) will fire again, causing duplicate Whisper runs (expensive) and a double transcriptKey write in videoTable. The same risk exists for the hlsQueue.add at L95 and thumbnailQueue.add at L75, but it's especially painful for transcription.
Add a deterministic jobId so BullMQ deduplicates:
🛠️ Proposed fix
- // Dispatch transcription job — picked up by the ai-worker container
- await transcribeQueue.add("transcribe", { fileId, userId });
+ // Dispatch transcription job — picked up by the ai-worker container.
+ // jobId ensures idempotency across transcode retries.
+ await transcribeQueue.add(
+ "transcribe",
+ { fileId, userId },
+ { jobId: `transcribe:${fileId}` }
+ );Consider the same treatment for hlsQueue.add (jobId: \hls:${fileId}`) and thumbnailQueue.add (jobId: `thumbnail:${fileId}``).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Dispatch transcription job — picked up by the ai-worker container | |
| await transcribeQueue.add("transcribe", { fileId, userId }); | |
| console.log(`🎙️ Transcription job queued for fileId: ${fileId}`); | |
| // Dispatch transcription job — picked up by the ai-worker container. | |
| // jobId ensures idempotency across transcode retries. | |
| await transcribeQueue.add( | |
| "transcribe", | |
| { fileId, userId }, | |
| { jobId: `transcribe:${fileId}` } | |
| ); | |
| console.log(`🎙️ Transcription job queued for fileId: ${fileId}`); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/workers/transcode.worker.ts` around lines 98 - 100, The transcribe
job (and similarly hlsQueue.add and thumbnailQueue.add) can be enqueued multiple
times on retries because no deterministic jobId is provided; update the calls to
transcribeQueue.add("transcribe", { fileId, userId }) to include a deterministic
jobId (e.g. `jobId: \`transcribe:${fileId}\``) so BullMQ will dedupe retries,
and apply the same pattern to hlsQueue.add and thumbnailQueue.add (e.g.
`hls:${fileId}`, `thumbnail:${fileId}`); ensure you pass the jobId option in the
same call site in transcode.worker.ts and keep any existing job data
(fileId/userId) unchanged.
| export const transcribeWorker = new Worker("transcribeQueue", async (job: Job) => { | ||
| const { fileId, userId } = job.data as { fileId: string; userId: string }; | ||
|
|
||
| console.log(`\n🎙️ Transcription job ${job.id} started for fileId: ${fileId}`); | ||
|
|
||
| // Mark as processing | ||
| await db.update(videoTable).set({ | ||
| transcriptStatus: 'processing' | ||
| }).where(eq(videoTable.id, fileId)); | ||
|
|
||
| // 1. Download original video from S3 | ||
| const videoPath = await downloadOriginalForTranscription(fileId, userId, job.id!); | ||
|
|
||
| // 2. Extract audio (16kHz mono WAV — Whisper's preferred format) | ||
| const audioPath = await extractAudio(videoPath); | ||
|
|
||
| // 3. Run faster-whisper | ||
| console.log(`⏳ Running Whisper on ${audioPath} (model: ${process.env.WHISPER_MODEL ?? 'base'})...`); | ||
| const transcript = await runWhisper(audioPath); | ||
| console.log(`✅ Transcription complete — language: ${transcript.language}, duration: ${transcript.duration}s, segments: ${transcript.segments.length}`); | ||
|
|
||
| // 4. Upload transcript JSON to S3 | ||
| const transcriptKey = await uploadTranscript(transcript, fileId, userId); | ||
|
|
||
| // 5. Persist the S3 key in the DB | ||
| await db.update(videoTable).set({ | ||
| transcriptKey, | ||
| transcriptStatus: 'completed', | ||
| }).where(eq(videoTable.id, fileId)); | ||
|
|
||
| // 6. Clean up local temp files | ||
| cleanupTranscribeFiles(videoPath, audioPath); | ||
|
|
||
| }, { | ||
| connection: redis as any, | ||
| }); |
There was a problem hiding this comment.
Temp files leak on any pipeline failure — wrap in try/finally.
If downloadOriginalForTranscription, extractAudio, runWhisper, or uploadTranscript rejects, execution returns immediately (the error propagates to BullMQ), and cleanupTranscribeFiles(videoPath, audioPath) on line 45 is never reached. The downloaded MP4 and extracted WAV remain in /app/downloads/transcribe/ forever. With a 2 GB memory-capped container and retried jobs, this will fill the container disk quickly and wedge the worker.
The failed handler on lines 55–60 only updates the DB — it cannot clean up temp files because videoPath / audioPath are out of scope there.
🛡️ Proposed fix — ensure cleanup always runs
export const transcribeWorker = new Worker("transcribeQueue", async (job: Job) => {
const { fileId, userId } = job.data as { fileId: string; userId: string };
console.log(`\n🎙️ Transcription job ${job.id} started for fileId: ${fileId}`);
// Mark as processing
await db.update(videoTable).set({
transcriptStatus: 'processing'
}).where(eq(videoTable.id, fileId));
- // 1. Download original video from S3
- const videoPath = await downloadOriginalForTranscription(fileId, userId, job.id!);
-
- // 2. Extract audio (16kHz mono WAV — Whisper's preferred format)
- const audioPath = await extractAudio(videoPath);
-
- // 3. Run faster-whisper
- console.log(`⏳ Running Whisper on ${audioPath} (model: ${process.env.WHISPER_MODEL ?? 'base'})...`);
- const transcript = await runWhisper(audioPath);
- console.log(`✅ Transcription complete — language: ${transcript.language}, duration: ${transcript.duration}s, segments: ${transcript.segments.length}`);
-
- // 4. Upload transcript JSON to S3
- const transcriptKey = await uploadTranscript(transcript, fileId, userId);
-
- // 5. Persist the S3 key in the DB
- await db.update(videoTable).set({
- transcriptKey,
- transcriptStatus: 'completed',
- }).where(eq(videoTable.id, fileId));
-
- // 6. Clean up local temp files
- cleanupTranscribeFiles(videoPath, audioPath);
-
+ let videoPath: string | undefined;
+ let audioPath: string | undefined;
+ try {
+ videoPath = await downloadOriginalForTranscription(fileId, userId, job.id!);
+ audioPath = await extractAudio(videoPath);
+
+ console.log(`⏳ Running Whisper on ${audioPath} (model: ${process.env.WHISPER_MODEL ?? 'base'})...`);
+ const transcript = await runWhisper(audioPath);
+ console.log(`✅ Transcription complete — language: ${transcript.language}, duration: ${transcript.duration}s, segments: ${transcript.segments.length}`);
+
+ const transcriptKey = await uploadTranscript(transcript, fileId, userId);
+
+ await db.update(videoTable).set({
+ transcriptKey,
+ transcriptStatus: 'completed',
+ }).where(eq(videoTable.id, fileId));
+ } finally {
+ if (videoPath || audioPath) {
+ cleanupTranscribeFiles(videoPath ?? '', audioPath ?? '');
+ }
+ }
}, {
connection: redis as any,
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/workers/transcribe.worker.ts` around lines 14 - 49, The worker's
pipeline (transcribeWorker) can throw before cleanupTranscribeFiles(videoPath,
audioPath) is called, leaking temp files; refactor the job handler to declare
videoPath and audioPath in the outer scope, wrap the main steps
(downloadOriginalForTranscription, extractAudio, runWhisper, uploadTranscript
and DB updates) in a try block and call cleanupTranscribeFiles(videoPath,
audioPath) in a finally block so files are always removed even on errors; keep
the existing failed handler for DB status updates but do not rely on it for file
cleanup.
| }, { | ||
| connection: redis as any, | ||
| }); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the shape of the exported redis client
fd -t f 'redis.ts' -p 'config'
rg -nP --type=ts -C3 'export\s+(const|default)\s+redis\b'Repository: vector17002/video-transcoding
Length of output: 366
🏁 Script executed:
cat -n server/src/config/redis.tsRepository: vector17002/video-transcoding
Length of output: 314
🏁 Script executed:
sed -n '40,60p' server/src/workers/transcribe.worker.tsRepository: vector17002/video-transcoding
Length of output: 687
🏁 Script executed:
rg -A5 -B5 'new Queue' server/src/workers/transcribe.worker.tsRepository: vector17002/video-transcoding
Length of output: 55
🏁 Script executed:
head -50 server/src/workers/transcribe.worker.tsRepository: vector17002/video-transcoding
Length of output: 1881
🏁 Script executed:
rg -n 'import.*Queue|new Worker|new Queue' server/src/workers/transcribe.worker.ts -A2Repository: vector17002/video-transcoding
Length of output: 246
🏁 Script executed:
rg -n 'ConnectionOptions|connection:' server/src -t ts -B2 -A2Repository: vector17002/video-transcoding
Length of output: 2470
🏁 Script executed:
# Check package.json to see BullMQ version
rg '"bullmq"' server/package.jsonRepository: vector17002/video-transcoding
Length of output: 98
🏁 Script executed:
rg 'new Worker|new Queue' server/src -t ts -A3 | head -50Repository: vector17002/video-transcoding
Length of output: 2688
🏁 Script executed:
# Check if there's a tsconfig or tsconfig.json that shows strict checking enabled
cat tsconfig.json | head -20Repository: vector17002/video-transcoding
Length of output: 119
🏁 Script executed:
# Check BullMQ type definitions for what it accepts
rg -i 'connection.*redis|ConnectionOptions' server/src/workers -t ts | head -20Repository: vector17002/video-transcoding
Length of output: 611
🏁 Script executed:
# Check if there's a server-level tsconfig
find server -name "tsconfig*.json" -type fRepository: vector17002/video-transcoding
Length of output: 94
🏁 Script executed:
# Look for any imports of ConnectionOptions or type definitions used with connection option
rg -n 'ConnectionOptions|WorkerOptions|QueueOptions' server/src -t tsRepository: vector17002/video-transcoding
Length of output: 55
🏁 Script executed:
cat server/tsconfig.jsonRepository: vector17002/video-transcoding
Length of output: 1215
🏁 Script executed:
# Check if there are any node_modules or type stubs that would show BullMQ's expected types
# Look for any comments or documentation about the connection parameter
rg -n 'BullMQ|Worker|Queue' server/src/workers/transcribe.worker.ts -B5 | head -30Repository: vector17002/video-transcoding
Length of output: 646
🏁 Script executed:
# Search for any TypeScript error suppressions or type assertions related to redis/connection
rg -n '@ts-' server/src/workers -t ts | grep -i redisRepository: vector17002/video-transcoding
Length of output: 55
Remove the as any cast on the redis connection.
The redis import is already an IORedis instance (instantiated in server/src/config/redis.ts), which BullMQ's Worker accepts directly for the connection option. The as any assertion is unnecessary and masks type information that the type checker should verify.
Code context
}, {
connection: redis as any,
});
Change to:
}, {
connection: redis,
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/workers/transcribe.worker.ts` around lines 47 - 49, The Worker
instantiation is using an unnecessary type assertion "as any" on the redis
connection; remove the cast so the real IORedis instance is passed to the Worker
options (replace connection: redis as any with connection: redis) to preserve
type information and let the compiler validate the IORedis type used by the
Worker constructor.
| transcribeWorker.on("failed", async (job, err) => { | ||
| console.error(`❌ Transcription job ${job?.id} failed: ${err.message}`); | ||
| await db.update(videoTable).set({ | ||
| transcriptStatus: 'failed', | ||
| }).where(eq(videoTable.id, job?.data.fileId)); | ||
| }); |
There was a problem hiding this comment.
failed handler can itself crash — NPE on job?.data.fileId.
BullMQ does not guarantee job is defined in the failed event (e.g. stalled jobs that were removed, or deserialization errors). When job is undefined, job?.data is undefined, and then .fileId throws TypeError: Cannot read properties of undefined (reading 'fileId'), which escapes the handler and is logged as an unhandled rejection — masking the original failure.
Secondarily, if fileId ends up undefined, eq(videoTable.id, undefined) will execute an UPDATE with a bound null/undefined that matches no rows — silently succeeding but not doing what you think.
🛡️ Proposed fix
transcribeWorker.on("failed", async (job, err) => {
console.error(`❌ Transcription job ${job?.id} failed: ${err.message}`);
- await db.update(videoTable).set({
- transcriptStatus: 'failed',
- }).where(eq(videoTable.id, job?.data.fileId));
+ const fileId = (job?.data as { fileId?: string } | undefined)?.fileId;
+ if (!fileId) {
+ console.warn(`⚠️ failed handler: missing fileId on job ${job?.id}; skipping DB update`);
+ return;
+ }
+ await db.update(videoTable).set({
+ transcriptStatus: 'failed',
+ }).where(eq(videoTable.id, fileId));
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/workers/transcribe.worker.ts` around lines 55 - 60, The failed
event handler for transcribeWorker can throw when job or job.data is undefined;
update the handler in transcribeWorker.on("failed", ...) to guard access to job
and job.data.fileId: first check whether job is present and extract fileId with
a safe check (e.g., job && job.data && job.data.fileId), log a clear message
including err and job?.id when job is missing or deserialization/stall occurred,
and only call db.update(videoTable).set(...).where(eq(videoTable.id, fileId))
when fileId is defined; if fileId is missing, log a warning/error instead of
issuing an update so you don't execute eq(videoTable.id, undefined).
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Chores