Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect mindsdb to the bot #36

Merged
merged 3 commits into from
Jan 2, 2024
Merged

Conversation

vishesh-baghel
Copy link
Owner

@vishesh-baghel vishesh-baghel commented Jan 2, 2024

Summary by CodeRabbit

  • New Features

    • Implemented a new comment generation system for pull requests.
    • Added predictive risk scoring for files in repositories.
    • Introduced automated training and retraining of predictive models.
  • Improvements

    • Enhanced logging for fetch operations with more structured messages.
    • Improved error handling and debugging in the main application flow.
  • Database Enhancements

    • Established a new connection methodology for MindsDB.
    • Updated the File model to include mandatory risk scoring.
    • Created a new TrainingFile model to assist in model training processes.
  • Documentation

    • Updated comments and logging to provide clearer context and information.
  • Bug Fixes

    • Refined comment content to include predicted risk scores for files.

@vishesh-baghel vishesh-baghel linked an issue Jan 2, 2024 that may be closed by this pull request
Copy link

coderabbitai bot commented Jan 2, 2024

Walkthrough

The project has seen a comprehensive update, enhancing its pull request commenting capabilities and integrating with MindsDB for predictive modeling. It now connects to MongoDB more aptly, refines logging, and revises the data schema to include risk assessment features. Additionally, the codebase has been refactored to streamline GitHub event handling and facilitate machine learning model training, aiming to predict risk scores for files in repositories.

Changes

File Path Change Summary
src/constants/Comments.ts Introduced CommentType enum and functions for PR comments.
src/db/.../mindsdbConnection.ts Added functions for MindsDB connection via mindsdb-js-sdk and Probot.
src/db/models/File.ts Changed riskScore to mandatory; added predictedRiskScore; renamed export to repoFile.
src/db/models/TrainingFile.ts &
src/types/TrainingFileType.ts
Created Mongoose schema and type for training file data.
src/db/mongodbConnection.ts Renamed connectDb to connectMongoDB.
src/fetch/... Updated error and log messages with square brackets for clarity.
src/listeners/githubWebhookListener.ts Added function for GitHub webhook event listening.
src/main.ts Overhauled main function for new DB connections and model training logic.
src/services/commentService.ts Added constants and updated comment construction with predictedScore.
src/services/predictionService.ts Added model training and retraining functionality with MindsDB.
src/services/pullRequestService.ts &
src/services/repositoryService.ts
Updated functions for file updates and batch processing with new risk score features.
src/types/FileScoreMap.ts &
src/types/FileType.ts
Added predictedScore and predictedRiskScore properties.

Poem

🐇 "In the realm of code and bit,"
"Where the data streams do flit,"
"A rabbit's touch, now models train,"
"Predicting risks, in code's domain." 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • You can tag CodeRabbit on specific lines of code or entire files in the PR by tagging @coderabbitai in a comment. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • You can tag @coderabbitai in a PR comment and ask questions about the PR and the codebase. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid.
    • @coderabbitai read the files in the src/scheduler package and generate README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@vishesh-baghel vishesh-baghel merged commit c3b92f0 into main Jan 2, 2024
1 of 2 checks passed
@vishesh-baghel vishesh-baghel deleted the 6-connect-mongodb-to-mindsdb branch January 2, 2024 17:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 7

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5b0baa4 and 8edd8e6.
Files ignored due to filter (2)
  • package-lock.json
  • package.json
Files selected for processing (17)
  • src/constants/Comments.ts (1 hunks)
  • src/db/mindsdbConnection.ts (1 hunks)
  • src/db/models/File.ts (1 hunks)
  • src/db/models/TrainingFile.ts (1 hunks)
  • src/db/mongodbConnection.ts (1 hunks)
  • src/fetch/fetch.ts (2 hunks)
  • src/fetch/fetchCommits.ts (1 hunks)
  • src/fetch/fetchFiles.ts (2 hunks)
  • src/listeners/githubWebhookListener.ts (1 hunks)
  • src/main.ts (3 hunks)
  • src/services/commentService.ts (2 hunks)
  • src/services/predictionService.ts (1 hunks)
  • src/services/pullRequestService.ts (7 hunks)
  • src/services/repositoryService.ts (5 hunks)
  • src/types/FileScoreMap.ts (1 hunks)
  • src/types/FileType.ts (1 hunks)
  • src/types/TrainingFileType.ts (1 hunks)
Files skipped from review due to trivial changes (4)
  • src/db/mongodbConnection.ts
  • src/fetch/fetch.ts
  • src/fetch/fetchCommits.ts
  • src/fetch/fetchFiles.ts
Additional comments: 17
src/types/FileScoreMap.ts (1)
  • 1-5: The addition of the predictedScore property to the FileScoreMap type aligns with the PR's objective of enhancing the bot with predictive analytics capabilities.
src/types/TrainingFileType.ts (1)
  • 1-8: The introduction of the TrainingFileType with properties necessary for training predictive models is consistent with the PR's objectives to integrate predictive analytics.
src/types/FileType.ts (1)
  • 7-11: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [1-11]

The addition of the predictedRiskScore property to the FileType interface aligns with the PR's objective of enhancing the bot with predictive analytics capabilities.

src/db/models/File.ts (1)
  • 7-16: The changes to the FileSchema are appropriate, making riskScore mandatory and adding predictedRiskScore. Ensure that the change of the export name from "File" to "repoFile" does not affect other parts of the codebase where the model might be imported.
src/services/commentService.ts (1)
  • 23-39: The modifications to the constructComment function to include predictedScore and the use of markdown generation logic are consistent with the PR's objectives to provide feedback on file scores in comments.
src/constants/Comments.ts (1)
  • 1-29: The new functions for generating markdown comments and error fallback messages are well-implemented and enhance the commenting system with predictive analytics feedback.
src/main.ts (1)
  • 15-106: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [1-142]

The changes to the main function, including database connections and model training, are in line with the PR's objectives. Verify the usage of the debugFlag to ensure that the debug functionality is controllable and does not expose sensitive information in production environments.

src/services/pullRequestService.ts (10)
  • 9-13: Added multiple imports related to the integration with MindsDB and handling of training files.

  • 15-16: The signature of processPullRequestOpenEvent has been modified to include app: Probot as a parameter. Ensure that all invocations of this function have been updated accordingly.

Verification successful

The invocation of processPullRequestOpenEvent in src/main.ts has been updated to include the new app: Probot parameter as required.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for invocations of processPullRequestOpenEvent to ensure they include the new parameter.
ast-grep --lang typescript --pattern $'processPullRequestOpenEvent($_, $_)'

Length of output: 269

* 52-52: The log message for a pull request not being merged into the default branch has been updated. Ensure that the log level and message are appropriate for the condition being logged.
  • 59-59: The log message for no files modified in the pull request has been updated. Verify that this condition is correctly identified and that the log message is clear and informative.

  • 99-99: A new field predictedRiskScore is being updated in the File collection. Confirm that the field exists in the schema and that the value is being calculated correctly before this update.

Verification successful

The predictedRiskScore field exists in the File model schema as shown by the script output:

  • predictedRiskScore: { type: Number, required: false },

This confirms that the field is present in the schema and can be updated as indicated in the code snippet. However, the script does not verify if the value is being calculated correctly before the update, as this would require an analysis of the logic that assigns a value to predictedRiskScore, which is not part of the provided context.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check if the File model has been updated to include the predictedRiskScore field.
rg --type typescript "predictedRiskScore" src/db/models/File.ts

Length of output: 98



Script:

#!/bin/bash
# Re-attempt to check if the File model has been updated to include the predictedRiskScore field without specifying file type.
rg "predictedRiskScore" src/db/models/File.ts

Length of output: 102

* 102-107: Creation of `TrainingFileType` object and simultaneous update to `File` and creation of `TrainingFile`. Ensure that the `TrainingFile` schema supports the fields being set here and that the creation of these objects is part of the intended workflow.
Verification successful

The TrainingFile schema includes the following fields:

  • installationId: Number, required
  • owner: String, required
  • repoName: String, required
  • filePath: String, required
  • numberOfCommits: Number, required
  • riskScore: Number, required
  • createdAt: Date, with a default value
  • updatedAt: Date, with a default value

These fields match the expected structure for a TrainingFileType object as indicated by the schema. Therefore, the creation of TrainingFileType objects and their subsequent use in the TrainingFile.create(trainingFile) call is consistent with the schema definition provided.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check if the TrainingFile model supports the fields being set in the TrainingFileType object.
rg --type typescript "TrainingFile" src/db/models/TrainingFile.ts

Length of output: 100



Script:

#!/bin/bash
# Check if the TrainingFile model supports the fields being set in the TrainingFileType object.
rg "TrainingFile" src/db/models/TrainingFile.ts

Length of output: 158



Script:

#!/bin/bash
# Retrieve the schema definition for TrainingFile to verify the fields.
cat src/db/models/TrainingFile.ts

Length of output: 615

* 120-137: Handling of added and removed files with updates to both `File` and `TrainingFile` collections. Ensure that the logic for handling different file statuses is consistent and that the database operations are correct.
  • 210-210: The predictedScore field has been added to the object returned when no file object is found. Confirm that this field is expected in all cases and that a default value of 0 is appropriate.

  • 217-217: The predictedScore field is now being returned with the riskScore. Verify that this field is being calculated and stored correctly in the File model before being used here.

Verification successful

The predictedRiskScore field is indeed present in the File model as indicated by the output of the script, which found the line predictedRiskScore: { type: Number, required: false }, in the src/db/models/File.ts file. This confirms that the File model has been updated to include the predictedRiskScore field, and it is not required for every file object.

Based on this information, the predictedScore field is being calculated and stored correctly in the File model before being used in the pullRequestService.ts as indicated by the review comment.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check if the File model has been updated to include the predictedRiskScore field.
rg --type typescript "predictedRiskScore" src/db/models/File.ts

Length of output: 98



Script:

#!/bin/bash
# Re-attempt to check if the File model has been updated to include the predictedRiskScore field without specifying file type.
rg "predictedRiskScore" src/db/models/File.ts

Length of output: 102

* 268-275: The `createTrainingFileTypeObject` function has been added to create `TrainingFileType` objects. Ensure that the fields being set match the schema of the `TrainingFile` model and that this function is being used correctly in the context of the service.

Comment on lines +1 to +16
import { Probot } from "probot";

export function listeningForGithubWebhookEvents(app: Probot, events: any[]) {
return new Promise((resolve, reject) => {
try {
app.log.info(`Listening for ${events} events`);

app.on(events, async (context) => {
resolve(context);
});
} catch (error) {
app.log.error(`Error occurred while listening for ${events} events`);
reject(error);
}
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The listeningForGithubWebhookEvents function is well-structured with proper error handling. Consider enhancing the error logging to include more specific details about the event that caused the error for easier debugging.

- app.log.error(`Error occurred while listening for ${events} events`);
+ app.log.error(`Error occurred while listening for ${events} events: ${error.message}`);

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
import { Probot } from "probot";
export function listeningForGithubWebhookEvents(app: Probot, events: any[]) {
return new Promise((resolve, reject) => {
try {
app.log.info(`Listening for ${events} events`);
app.on(events, async (context) => {
resolve(context);
});
} catch (error) {
app.log.error(`Error occurred while listening for ${events} events`);
reject(error);
}
});
}
import { Probot } from "probot";
export function listeningForGithubWebhookEvents(app: Probot, events: any[]) {
return new Promise((resolve, reject) => {
try {
app.log.info(`Listening for ${events} events`);
app.on(events, async (context) => {
resolve(context);
});
} catch (error) {
app.log.error(`Error occurred while listening for ${events} events: ${error.message}`);
reject(error);
}
});
}

Comment on lines +1 to +14
import { Schema, model } from "mongoose";

const TrainingFileSchema = new Schema({
installationId: { type: Number, required: true },
owner: { type: String, required: true },
repoName: { type: String, required: true },
filePath: { type: String, required: true },
numberOfCommits: { type: Number, required: true },
riskScore: { type: Number, required: true },
createdAt: { type: Date, default: new Date().toLocaleString() },
updatedAt: { type: Date, default: new Date().toLocaleString() },
});

export const TrainingFile = model("trainingFile", TrainingFileSchema);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TrainingFile schema is well-defined. However, consider using Date.now for the createdAt and updatedAt fields to ensure consistent date formatting in the database.

- createdAt: { type: Date, default: new Date().toLocaleString() },
- updatedAt: { type: Date, default: new Date().toLocaleString() },
+ createdAt: { type: Date, default: Date.now },
+ updatedAt: { type: Date, default: Date.now },

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
import { Schema, model } from "mongoose";
const TrainingFileSchema = new Schema({
installationId: { type: Number, required: true },
owner: { type: String, required: true },
repoName: { type: String, required: true },
filePath: { type: String, required: true },
numberOfCommits: { type: Number, required: true },
riskScore: { type: Number, required: true },
createdAt: { type: Date, default: new Date().toLocaleString() },
updatedAt: { type: Date, default: new Date().toLocaleString() },
});
export const TrainingFile = model("trainingFile", TrainingFileSchema);
import { Schema, model } from "mongoose";
const TrainingFileSchema = new Schema({
installationId: { type: Number, required: true },
owner: { type: String, required: true },
repoName: { type: String, required: true },
filePath: { type: String, required: true },
numberOfCommits: { type: Number, required: true },
riskScore: { type: Number, required: true },
createdAt: { type: Date, default: Date.now },
updatedAt: { type: Date, default: Date.now },
});
export const TrainingFile = model("trainingFile", TrainingFileSchema);

);

retrainPredictorModel(app);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output from the script indicates that the retrainPredictorModel function does indeed require a parameter, specifically an instance of Probot. This means that the review comment is correct, and the function should not be called without passing the required app parameter.

  • The retrainPredictorModel function requires an app parameter of type Probot.
  • The function call retrainPredictorModel(app); should include the app argument to match the function's signature.
Analysis chain

The retrainPredictorModel function is called without any arguments. Verify that this function does not require any parameters and that it is safe to call it without additional context.


Let's correct the script to search for the retrainPredictorModel function definition without specifying the file type, as rg will search through text files by default, which should include TypeScript files.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Check the definition of retrainPredictorModel to ensure it requires no parameters.
rg --type typescript "retrainPredictorModel" src/services/predictionService.ts

Length of output: 113



Script:

#!/bin/bash
# Check the definition of retrainPredictorModel to ensure it requires no parameters.
rg "function retrainPredictorModel" src/services/predictionService.ts

Length of output: 130

@@ -228,6 +251,8 @@ async function createFileTypeObject(
filePath
);
const riskScore = calculateRiskScore(app, commits);
// fetch predicted risk score
const predictedRiskScore = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predictedRiskScore is hardcoded to 0. This should be replaced with an actual call to the prediction service to fetch the predicted risk score.

-  const predictedRiskScore = 0;
+  // TODO: Replace with actual call to prediction service to fetch the predicted risk score
+  const predictedRiskScore = await fetchPredictedRiskScore(app, filePath, installationId, owner, repoName);

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
const predictedRiskScore = 0;
// TODO: Replace with actual call to prediction service to fetch the predicted risk score
const predictedRiskScore = await fetchPredictedRiskScore(app, filePath, installationId, owner, repoName);

Comment on lines +1 to +23
import MindsDB from "mindsdb-js-sdk";
import { Probot } from "probot";

export async function connectMindsDB(app: Probot) {
return new Promise(async (resolve, reject) => {
try {
await MindsDB.connect({
user: "",
password: "",
host: "http://127.0.0.1:47334",
});

resolve(MindsDB);
} catch (error: any) {
app.log.error("Error while connecting to mindsdb");
reject(error);
}
});
}

export async function getMindsDBInstance(app: Probot) {
return await connectMindsDB(app);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connectMindsDB function should not contain hardcoded values for user, password, and host. Use environment variables to manage sensitive information and allow for configuration flexibility.

- user: "",
- password: "",
- host: "http://127.0.0.1:47334",
+ user: process.env.MINDSDB_USER,
+ password: process.env.MINDSDB_PASSWORD,
+ host: process.env.MINDSDB_HOST,

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
import MindsDB from "mindsdb-js-sdk";
import { Probot } from "probot";
export async function connectMindsDB(app: Probot) {
return new Promise(async (resolve, reject) => {
try {
await MindsDB.connect({
user: "",
password: "",
host: "http://127.0.0.1:47334",
});
resolve(MindsDB);
} catch (error: any) {
app.log.error("Error while connecting to mindsdb");
reject(error);
}
});
}
export async function getMindsDBInstance(app: Probot) {
return await connectMindsDB(app);
}
import MindsDB from "mindsdb-js-sdk";
import { Probot } from "probot";
export async function connectMindsDB(app: Probot) {
return new Promise(async (resolve, reject) => {
try {
await MindsDB.connect({
user: process.env.MINDSDB_USER,
password: process.env.MINDSDB_PASSWORD,
host: process.env.MINDSDB_HOST,
});
resolve(MindsDB);
} catch (error: any) {
app.log.error("Error while connecting to mindsdb");
reject(error);
}
});
}
export async function getMindsDBInstance(app: Probot) {
return await connectMindsDB(app);
}

Comment on lines +32 to +125
export async function retrainPredictorModel(app: Probot) {
await MindsDB.Models.retrainModel(
predictorName,
targetField,
projectName,
regressionTrainingOptions
)
.then(() => {
app.log.info(`[${predictorName}] model is retrained successfully`);
})
.catch((error: any) => {
app.log.error(
`Error occurred while retraining the model [${predictorName}]`
);
app.log.error(error);
});
}

export async function trainPredictorModel(app: Probot) {
try {
const models: Model[] = await MindsDB.Models.getAllModels(projectName);
const modelNames = models.map((model: Model) => model.name);

if (modelNames.includes(predictorName)) {
app.log.info(`[${predictorName}] model is already present in mindsdb`);
return;
}
app.log.info(`Started training the model: [${predictorName}]`);
const dbList: Database[] = await MindsDB.Databases.getAllDatabases();
const dbNames: string[] = dbList.map((db: Database) => db.name);

if (!dbNames.includes(databaseName)) {
const db: Database | undefined = await createDatabase(app);
app.log.info(`Created database: ${db?.name} in mindsdb successfully`);
}

let predictionModel: Model | undefined = await MindsDB.Models.trainModel(
predictorName,
targetField,
projectName,
regressionTrainingOptions
);

const intervalId = setInterval(async () => {
predictionModel = await MindsDB.Models.getModel(
predictorName,
projectName
);

if (predictionModel?.status.match("error")) {
app.log.info("Prediction model training is complete");
clearInterval(intervalId);
}
}, 2000);

app.log.info(`training completed for [${predictorName}]`);
} catch (error: any) {
app.log.error("Error while training the model");
app.log.error(error);
}
}

async function createDatabase(app: Probot): Promise<Database | undefined> {
if (
MONGODB_USER === undefined ||
MONGODB_PASSWORD === undefined ||
MONGODB_PORT === undefined ||
MONGODB_CONNECTION_STRING === undefined ||
MONGODB_DATABASE === undefined
) {
app.log.error(
`MindsDB environment values are undefined: DB:${MONGODB_DATABASE}, USER:${MONGODB_USER}, PASSWORD: ${MONGODB_PASSWORD}, PORT: ${MONGODB_PORT}, HOST: ${MONGODB_CONNECTION_STRING}`
);
return;
}

const connectionParams: Record<string, JsonValue> = {
username: MONGODB_USER,
password: MONGODB_PASSWORD,
port: MONGODB_PORT,
host: MONGODB_CONNECTION_STRING,
database: MONGODB_DATABASE,
};

try {
const mongoDB: Database = await MindsDB.Databases.createDatabase(
databaseName,
"mongodb",
connectionParams
);

return mongoDB;
} catch (error: any) {
app.log.error("Error while creating database in mindsdb");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trainPredictorModel and retrainPredictorModel functions are well-structured for handling model training and retraining. Ensure to clear the interval set for checking the model status upon successful training to prevent potential memory leaks.

+ clearInterval(intervalId); // Add this line after logging the successful training completion.

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
export async function retrainPredictorModel(app: Probot) {
await MindsDB.Models.retrainModel(
predictorName,
targetField,
projectName,
regressionTrainingOptions
)
.then(() => {
app.log.info(`[${predictorName}] model is retrained successfully`);
})
.catch((error: any) => {
app.log.error(
`Error occurred while retraining the model [${predictorName}]`
);
app.log.error(error);
});
}
export async function trainPredictorModel(app: Probot) {
try {
const models: Model[] = await MindsDB.Models.getAllModels(projectName);
const modelNames = models.map((model: Model) => model.name);
if (modelNames.includes(predictorName)) {
app.log.info(`[${predictorName}] model is already present in mindsdb`);
return;
}
app.log.info(`Started training the model: [${predictorName}]`);
const dbList: Database[] = await MindsDB.Databases.getAllDatabases();
const dbNames: string[] = dbList.map((db: Database) => db.name);
if (!dbNames.includes(databaseName)) {
const db: Database | undefined = await createDatabase(app);
app.log.info(`Created database: ${db?.name} in mindsdb successfully`);
}
let predictionModel: Model | undefined = await MindsDB.Models.trainModel(
predictorName,
targetField,
projectName,
regressionTrainingOptions
);
const intervalId = setInterval(async () => {
predictionModel = await MindsDB.Models.getModel(
predictorName,
projectName
);
if (predictionModel?.status.match("error")) {
app.log.info("Prediction model training is complete");
clearInterval(intervalId);
}
}, 2000);
app.log.info(`training completed for [${predictorName}]`);
} catch (error: any) {
app.log.error("Error while training the model");
app.log.error(error);
}
}
async function createDatabase(app: Probot): Promise<Database | undefined> {
if (
MONGODB_USER === undefined ||
MONGODB_PASSWORD === undefined ||
MONGODB_PORT === undefined ||
MONGODB_CONNECTION_STRING === undefined ||
MONGODB_DATABASE === undefined
) {
app.log.error(
`MindsDB environment values are undefined: DB:${MONGODB_DATABASE}, USER:${MONGODB_USER}, PASSWORD: ${MONGODB_PASSWORD}, PORT: ${MONGODB_PORT}, HOST: ${MONGODB_CONNECTION_STRING}`
);
return;
}
const connectionParams: Record<string, JsonValue> = {
username: MONGODB_USER,
password: MONGODB_PASSWORD,
port: MONGODB_PORT,
host: MONGODB_CONNECTION_STRING,
database: MONGODB_DATABASE,
};
try {
const mongoDB: Database = await MindsDB.Databases.createDatabase(
databaseName,
"mongodb",
connectionParams
);
return mongoDB;
} catch (error: any) {
app.log.error("Error while creating database in mindsdb");
export async function trainPredictorModel(app: Probot) {
try {
const models: Model[] = await MindsDB.Models.getAllModels(projectName);
const modelNames = models.map((model: Model) => model.name);
if (modelNames.includes(predictorName)) {
app.log.info(`[${predictorName}] model is already present in mindsdb`);
return;
}
app.log.info(`Started training the model: [${predictorName}]`);
const dbList: Database[] = await MindsDB.Databases.getAllDatabases();
const dbNames: string[] = dbList.map((db: Database) => db.name);
if (!dbNames.includes(databaseName)) {
const db: Database | undefined = await createDatabase(app);
app.log.info(`Created database: ${db?.name} in mindsdb successfully`);
}
let predictionModel: Model | undefined = await MindsDB.Models.trainModel(
predictorName,
targetField,
projectName,
regressionTrainingOptions
);
const intervalId = setInterval(async () => {
predictionModel = await MindsDB.Models.getModel(
predictorName,
projectName
);
if (predictionModel?.status.match("error")) {
app.log.info("Prediction model training is complete");
clearInterval(intervalId);
}
}, 2000);
app.log.info(`training completed for [${predictorName}]`);
clearInterval(intervalId); // Add this line after logging the successful training completion.
} catch (error: any) {
app.log.error("Error while training the model");
app.log.error(error);
}
}

Comment on lines 86 to 133

if (fileCommitMaps.length === 0) {
app.log.warn(
`Cannot proceed to save files, because either file-commit-map is empty or it exceeded the allowed size limit for ${owner}/${repo.name} with installation id: ${installationId}`
`Cannot proceed to save files, because either file-commit-map is empty or it exceeded the allowed size limit for [${owner}/${repo.name}] with installation id: [${installationId}]`
);
return;
}

const files = fileCommitMaps.map(({ file, commits }) => ({
const files: FileType[] = fileCommitMaps.map(({ file, commits }) => ({
installationId: installationId,
owner: owner,
repoName: repo.name,
filePath: file.path,
commits: commits,
riskScore: calculateRiskScore(app, commits),
predictedRiskScore: 0, // we have no predicted value as of yet, because this file is newly added
}));

await Promise.all([File.insertMany(files)]);
const trainingFiles: TrainingFileType[] = files.map(
(file: FileType) => ({
installationId: file.installationId,
owner: file.owner,
repoName: file.repoName,
filePath: file.filePath,
riskScore: file.riskScore,
numberOfCommits: file.commits.length, // FIXME: length is zero in the db. it should not because commits array is not zero in files collection
})
);

await Promise.all([
File.insertMany(files),
TrainingFile.insertMany(trainingFiles),
]);

app.log.info(
`Completed the processing of ${owner}/${repo.name} repository successfully for installation id: ${installationId}`
`Completed the processing of [${owner}/${repo.name}] repository successfully for installation id: [${installationId}]`
);
} catch (error: any) {
app.log.error(error);
}
})
);

retrainPredictorModel(app);
})
);
} catch (error: any) {
app.log.error(`Error while processing the repository batch`);
app.log.error(error);
}
}

async function getDefaultBranch(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [7-131]

The updates to the repositoryService.ts file, including the separation of files and trainingFiles processing, are appropriate. However, address the FIXME comment regarding the numberOfCommits property to ensure accurate data in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connect mongodb to mindsdb
1 participant