Security Update and Enhancement for run.py #264

MiChaelinzo · 2024-03-21T19:09:43Z

The key changes:

1.) Validate checkpoint integrity by comparing hashes
2.) Add rate limiting on inferences
3.) Use authentication for any inference endpoints
4.) Other general security best practices
This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access.

The key changes: Validate checkpoint integrity by comparing hashes Add rate limiting on inferences Use authentication for any inference endpoints Other general security best practices This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access. Let me know if you have any other questions!

Aareon

This is unsuitable for merging as is. Formatting issues throughout, as well as an incomplete stub for authentication.

run.py

Aareon · 2024-03-22T23:47:02Z

run.py

@@ -53,7 +63,10 @@ def main():
            model=grok_1_model,
            bs_per_device=0.125,
            checkpoint_path=CKPT_PATH,
+        # Limit inference rate
+  inference_runner.rate_limit = 100


Formatting here as well

run.py

Aareon · 2024-03-22T23:47:06Z

run.py

+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
+  if calculated_hash != expected_hash:
+    raise ValueError("Invalid checkpoint file!")


Could this error message be improved? It might also be nice to utilize logging

import logging # Set up logging logger = logging.getLogger(__name__) def validate_checkpoint(path: Text, expected_hash: Text): with open(path, 'rb') as f: contents = f.read() calculated_hash = hashlib.sha256(contents).hexdigest() if calculated_hash != expected_hash: logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, " f"Actual hash: {calculated_hash}") raise ValueError("Checkpoint validation failed")

The key changes:

Imported the logging module and created a logger object

Logged an error with the expected and actual hash values for more detail

Updated the exception message to be more specific

This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.

Other enhancements could include:

Adding the checkpoint path to the log message

Logging at INFO level when validation succeeds

Configuring logging to output to a file for production debugging

It would make the code longer etc. is this necessary?

Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging. You just have not submitted an actual commit with fixes for file context management.

100% convinced this user is just repeating garbage from an LLM.

Aareon · 2024-03-22T23:47:10Z

run.py

+
+
+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()


Please use a context manager with for opening and reading the given path. It might also be in our best interest to utilize type hints in the function signature.

from typing import Text import hashlib CKPT_HASH = "expected_checkpoint_hash" def validate_checkpoint(path: Text, expected_hash: Text): with open(path, 'rb') as f: contents = f.read() calculated_hash = hashlib.sha256(contents).hexdigest() if calculated_hash != expected_hash: raise ValueError("Invalid checkpoint file!")

The key changes:

Added type hints for the path (Text) and expected_hash (Text) parameters.

Opened the file using a with statement, which automatically closes it when done.

Stored the file contents in a variable called 'contents' to avoid re-reading the file.

Passed the contents variable to hashlib.sha256 rather than the file object.

It seems we would need to import text from typing is this necessary?

I think the Text import is superfluous and could just as easily be replaced with str without importing any extra type hints.

It looks like there were some formatting issues in the code. I've taken the liberty of re-formatting it to be more readable.

MiChaelinzo

Sent some code sample to review.

MiChaelinzo · 2024-03-26T09:36:12Z

run.py

+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
+  if calculated_hash != expected_hash:
+    raise ValueError("Invalid checkpoint file!")


import logging # Set up logging logger = logging.getLogger(__name__) def validate_checkpoint(path: Text, expected_hash: Text): with open(path, 'rb') as f: contents = f.read() calculated_hash = hashlib.sha256(contents).hexdigest() if calculated_hash != expected_hash: logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, " f"Actual hash: {calculated_hash}") raise ValueError("Checkpoint validation failed")

The key changes:

Imported the logging module and created a logger object

Logged an error with the expected and actual hash values for more detail

Updated the exception message to be more specific

This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.

Other enhancements could include:

Adding the checkpoint path to the log message

Logging at INFO level when validation succeeds

Configuring logging to output to a file for production debugging

It would make the code longer etc. is this necessary?

MiChaelinzo · 2024-03-26T09:40:07Z

run.py

+
+
+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()


from typing import Text import hashlib CKPT_HASH = "expected_checkpoint_hash" def validate_checkpoint(path: Text, expected_hash: Text): with open(path, 'rb') as f: contents = f.read() calculated_hash = hashlib.sha256(contents).hexdigest() if calculated_hash != expected_hash: raise ValueError("Invalid checkpoint file!")

The key changes:

Added type hints for the path (Text) and expected_hash (Text) parameters.

Opened the file using a with statement, which automatically closes it when done.

Stored the file contents in a variable called 'contents' to avoid re-reading the file.

Passed the contents variable to hashlib.sha256 rather than the file object.

It seems we would need to import text from typing is this necessary?

Aareon · 2024-03-27T03:59:18Z

run.py

 if __name__ == "__main__":
  logging.basicConfig(level=logging.INFO)
-    main()
+  main()


2 space indent is not standard. Please view PEP8

I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!

I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!

This change is not accepted and requires fixing. 1 space indent is not standard and worsens readability and consistently in all code affected.

Not accepted.

Aareon · 2024-03-27T04:01:55Z

run.py

+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
+  if calculated_hash != expected_hash:
+    raise ValueError("Invalid checkpoint file!")


Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging. You just have not submitted an actual commit with fixes for file context management.

Aareon · 2024-03-27T04:02:45Z

run.py

+
+
+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()


I think the Text import is superfluous and could just as easily be replaced with str without importing any extra type hints.

Aareon · 2024-03-27T04:08:35Z

The initial review is still out for fixing, as the problem is still present i.e.

def validate_checkpoint(path, expected_hash):
  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()

Two problems with the above code. No handling for closing the file context, and inconsistent styling.

def validate_checkpoint(path, expected_hash):
    try:  # handle issues with opening the path, as well as hashing
        with open(path, "rb") as f:
            calculated_hash = hashlib.sha256(f.read()).hexdigest()
            ...  # rest of the inner function
    except Exception as e:  # refine this exception (OSError, FileNotFoundError, etc.)
        logging.error(f"Failed to validate checkpoint path {path}: {e}")
        raise

Aareon · 2024-04-05T22:11:45Z

You requested a new review but have not addressed the previous review. Please correct the issues described and request a review.

Aareon · 2024-04-05T22:15:53Z

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

MiChaelinzo · 2024-04-05T23:16:23Z

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

Aareon · 2024-04-06T00:28:10Z

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability.
Bad PR is bad. Close and try again.

MiChaelinzo · 2024-04-06T10:20:23Z

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability. Bad PR is bad. Close and try again.

Alright, thanks also your review is not that good etc. that's why you don't get paid enough you should stay in your lane.

Aareon

Do not merge

Aareon · 2024-04-06T15:41:09Z

run.py

+
+
+def validate_checkpoint(path, expected_hash):
+  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()


This is still left to fix. Calling open outside of a context manager is bad practice and not recommended for production.

Aareon · 2024-04-06T15:42:22Z

run.py

+  # Validate checkpoint integrity
+  validate_checkpoint(CKPT_PATH, CKPT_HASH)
+
+  grok_1_model = LanguageModelConfig(


The only change here is a dedent from the PEP8 standard 4 space indent.

Aareon · 2024-04-06T15:43:38Z

run.py

+      bs_per_device=0.125, 
+      checkpoint_path=CKPT_PATH,
+    # Limit inference rate
+    inference_runner.rate_limit = 100


This appears to reference inference_runner.rate_limit before it is defined.

Aareon · 2024-04-06T15:45:04Z

run.py

+
+    name="local",
+    load=CKPT_PATH,
+    tokenizer_path="./tokenizer.model",


If you were to improve anything, I'd suggest improving how file paths are defined by utilizing pathlib

run.py

Aareon · 2024-04-06T15:49:17Z

run.py

-    logging.basicConfig(level=logging.INFO)
-    main()
+  logging.basicConfig(level=logging.INFO)
+  main()


Overall, a complete waste of a PR. Nothing of value was added.

Aareon · 2024-04-06T15:52:45Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

Aareon · 2024-04-06T15:53:15Z

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability. Bad PR is bad. Close and try again.

Alright, thanks also your review is not that good etc. that's why you don't get paid enough you should stay in your lane.

My pay is acceptable. This PR is not.

Aareon · 2024-04-06T15:55:23Z

I'm going to go ahead and unsubscribe to updates from this PR as the user is using abusive language because of the reviews that this PR has received.

MiChaelinzo · 2024-04-06T20:41:51Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing AI or something.

MiChaelinzo · 2024-04-06T20:42:15Z

I'm going to go ahead and unsubscribe to updates from this PR as the user is using abusive language because of the reviews that this PR has received.

Good, thanks for the review.

Aareon · 2024-04-06T20:58:03Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

MiChaelinzo · 2024-04-06T23:10:18Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

Aareon · 2024-04-07T00:25:01Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

No idea what you're on about. But your code smells so whatever it is you're selling, I'm not buying.

MiChaelinzo · 2024-04-07T08:53:16Z

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

No idea what you're on about. But your code smells so whatever it is you're selling, I'm not buying.

I'm not selling anything I'm just doing a pull request and I already reviewed my code in the latest LLM's from AWS, I don't know why you are reviewing it like you're an AI more accurate or can give more than this LLM's.

Aareon · 2024-04-07T16:11:24Z

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

MiChaelinzo · 2024-04-07T21:12:06Z

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

Aareon · 2024-04-07T21:16:11Z

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

No worries there, I'm selective when it comes to employers, and you're not one I'd ever consider working for. LLMs are a thousand miles away from replacing software engineers. Also, define "spamming reviews". What's the point of a PR if it's not an implementation. This project has absolutely no need for stub functions in production. That's just bad practice. You would know that if you had any wherewithall. Who cares how long it takes to develop something? I'm one person, and I work on things as I see fit.

MiChaelinzo · 2024-04-07T21:39:30Z

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

No worries there, I'm selective when it comes to employers, and you're not one I'd ever consider working for. LLMs are a thousand miles away from replacing software engineers. Also, define "spamming reviews". What's the point of a PR if it's not an implementation. This project has absolutely no need for stub functions in production. That's just bad practice. You would know that if you had any wherewithall. Who cares how long it takes to develop something? I'm one person, and I work on things as I see fit.

I don't know about not replacing jobs but it seems they are replacing a lot of jobs recently., we hold several millions on mutual funds in banks/malls/tech/real-estate back home, AI/AGI seems very efficiency that's why investors give them millions/billions of dollars, also WEF plans and sees this too. Even Sam Altman going to the government of UAE asking for trillions of dollars he should go to Qatar trillion here is available.

MiChaelinzo changed the title ~~Update run.py~~ Security Update and Enhancement for run.py Mar 21, 2024

Aareon suggested changes Mar 22, 2024

View reviewed changes

re-formatting it to be more readable run.py

6ed2d78

It looks like there were some formatting issues in the code. I've taken the liberty of re-formatting it to be more readable.

MiChaelinzo commented Mar 26, 2024

View reviewed changes

Aareon suggested changes Mar 27, 2024

View reviewed changes

MiChaelinzo requested a review from Aareon April 3, 2024 15:38

Aareon suggested changes Apr 6, 2024

View reviewed changes



		def validate_checkpoint(path, expected_hash):
		calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()

Security Update and Enhancement for run.py #264

Are you sure you want to change the base?

Security Update and Enhancement for run.py #264

Conversation

MiChaelinzo commented Mar 21, 2024 • edited

Aareon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MiChaelinzo Mar 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MiChaelinzo left a comment

Choose a reason for hiding this comment

MiChaelinzo Mar 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aareon commented Mar 27, 2024

Aareon commented Apr 5, 2024

Aareon commented Apr 5, 2024

MiChaelinzo commented Apr 5, 2024

Aareon commented Apr 6, 2024

MiChaelinzo commented Apr 6, 2024

Aareon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aareon Apr 6, 2024 • edited

Choose a reason for hiding this comment

Aareon Apr 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aareon commented Apr 6, 2024

Aareon commented Apr 6, 2024

Aareon commented Apr 6, 2024

MiChaelinzo commented Apr 6, 2024 • edited

MiChaelinzo commented Apr 6, 2024

Aareon commented Apr 6, 2024

MiChaelinzo commented Apr 6, 2024

Aareon commented Apr 7, 2024 • edited

MiChaelinzo commented Apr 7, 2024

Aareon commented Apr 7, 2024

MiChaelinzo commented Apr 7, 2024

Aareon commented Apr 7, 2024

MiChaelinzo commented Apr 7, 2024

MiChaelinzo commented Mar 21, 2024 •

edited

MiChaelinzo Mar 26, 2024 •

edited

MiChaelinzo Mar 26, 2024 •

edited

Aareon Apr 6, 2024 •

edited

Aareon Apr 6, 2024 •

edited

MiChaelinzo commented Apr 6, 2024 •

edited

Aareon commented Apr 7, 2024 •

edited