-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92
base: master
Are you sure you want to change the base?
Conversation
…ture backend operations
Make sure to work on a separate branch when you start doing the golang development. Don't worry about the frontend for now, if the request doesn't have a quarter default it to the latest one. |
For the "latest one", since the backend and scraper do not share the same config currently, I just created another config entry for the backend. Don't know if that works |
For the current error:
I did some search and realized that this is a problem with versions of chrome and chrome-driver. The driver in the driver directory seems to mismatch with the version needed by the container. And I don't quite understand the reason for using this image. |
…-Schedule-Planner into scratch/issue86
We might need to prune the requirements.txt a little bit. I'm trying to figure it out |
Still working on pruning. Hard to test since each scraping takes at least 20min. Maybe we want a feature that our scraper only scrapes for one department for testing. Don't know if the new Dockerfile and requirements.txt are good |
…rse data functionality
…-Schedule-Planner into scratch/issue86
…tually get cloned
Correct me if I'm wrong: it seems that the code creates a new db connection each time data is requested, which should not be the case, right? I can take a look at this |
docker-compose.yml
Outdated
@@ -4,9 +4,11 @@ services: | |||
sdschedule-database: | |||
container_name: sdschedule-database | |||
image: mariadb:latest | |||
ports: | |||
- 10800:3306 # TODEL: DEV port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODEL: Temporary Development Code
I am confused how you tested the code previously. I believe testing should not use real database. I fixed the tests except one, this is the output:
I did not change the code for getClassData at all but the JSON thing does not seem to be working. |
Brief:
Sample code: const logTagDepartment = "[Department]"
// GetDepartments is a pre-http.HandlerFunc for department route and will become a closure with *DatabaseStruct
func GetDepartments(writer http.ResponseWriter, request *http.Request, ds *db.DatabaseStruct) {
if request.Method != "GET" {
errInvalidMethod(logTagDepartment, writer, request)
return
}
queryAndResponse(
ds, logTagDepartment, writer, request,
rowScannerOneString,
"SELECT DISTINCT DEPT_CODE FROM DEPARTMENT",
"DEPARTMENT")
} Two functions calls is exactly what is needed.
|
So if there is a test failing that is an issue because I made the tests such that the JSON format is what the frontend expects. Also I agree that it's not great to require a database connection in order to run the tests, but these integration tests are what's needed for full confidence, unit tests won't guarantee anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So overall I do like the changes you've done. There are a lot of issues - I think you've fallen for premature optimization but everyone does. You've tried to make methods for each individual usecase that you might have and I think that is not a great way to proceed - sure if a specific pattern is done a lot we can make a method for it, but most of these cases we have only used once or twice.
Moreover some granularity is good. Instead of 10 custom error methods, you can do something like
err(ERR_STRING, type)
And you can have a bunch of different types of error messages instead of having a bunch of different types of error methods. Try to build up your intuition on this - it might take some time but always remember that you program to make stuff easier for yourself. So if you rip out a bunch of stuff to make things look nicer but when you want to add a new error message or a new usecase you have to write a bunch of boilerplate code then consider if you are making the right design choices.
Thank you so much for your feedback! I will try to follow them and try to dive deeper! I found this sql query interesting: https://stackoverflow.com/questions/1136380/sql-where-in-clause-multiple-columns |
Let's use the username: "splanner", shall we? I refactored things. It compiles
Next step is
|
This is a long pull request. I finally figure out why the class data route test fails. The test expects a pure array of json. But the code returns a map like this: {"ABC 11": [...], "MATH 12": [...]}
Let's first reach a consensus about out API:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks really good. Don't merge this yet of course but I'm impressed by your code quality. Just a few things here and there. If the tests fail because they expect some format, that is the format that the frontend expects.
backend/backend.go
Outdated
|
||
// TODO: use env in handlers | ||
// TODO: make a handler factory | ||
http.HandleFunc("/api_course_nums", route.MakeHandler(route.GetCourseNums, db, route.LogPrefixCourseNums)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this LogPrefix?
) | ||
|
||
// LogPrefixClassData log prefix for ClassData route | ||
const LogPrefixClassData = "[ClassData]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is only one prefix can't you just make it named LogPrefix instead of having to put ClassData at the end? I realize you have this in other places too, but can't you do Package.LogPrefix instead of LogPrefixPackage in those other spots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they are all under the route package, they are all in the same name scope. I actually struggled with that too but decided to go with this
[]interface{}{"LTEN", "26", "LTEN", "26"}, | ||
}, | ||
{ | ||
fmt.Sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are combining all statements together instead of making separate statements. This is probably for a performance increase, but have you noticed any markably different performance differences? If the code is must more reasonable doing each query separately and you don't have full confidence in your method of doing this, then I would say the performance drop is justified. However, if you are confident then I'm willing to go along with it.
|
||
} | ||
|
||
func TestGetClassData(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tests for get class data and departments and such, these are the big ones that the web app uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add after a consensus for REST API
Ok sure, I can write something up on the contract. |
That will be great! Include but not limit to, api names (do we really need the api prefix? I have no idea), api format, api input format, api input check, cache-ability. Should work as the first scratch, but somethings existed before, like caching, is not implemented yet. About the query Performance of simple query always triumphs. However, back and forth database calls should be generally avoided. That is sync calls between components, which is to put components in series. But of course, there is never a perfect solution that works optimally for all situations. I think I will be confident if I read the front-end code which calls that api. |
Sure I agree with you that it will be more performant, but I'm saying there is a tradeoff between readability and maintainability in combining all the queries into one. By doing this way we add more edge cases and potential mishaps. That's why you should be confident in your current algorithm before switching to this, but I agree doing it in one query is more performant. |
Ok let's define the format. We have
|
If we restrict the quarter to be identical per query (it's weird for people to schedule multiple quarters on one calendar), then my SQL building function will be much simpler (just creating correct number of question marks) I was thinking using a api.example.com domain for apis would be better than the "api_" prefix api_department?quarter={QUARTER}
api_classreview?quarter={QUARTER}&department={DEPARTMENT}
api_classdata POST with a list of requests
I would like more details about the response format that fits the front-end 's needs well.
I couldn't figure out why it's easier to process with one class per request. For front end, we will send an array of promises and wait_all, otherwise it will be an overkill for performance. For back end, we will have exponential number of requests with more users since they will add more and more classes. https://restfulapi.net/rest-api-design-tutorial-with-example/ P.S. about row-constructor performance, according to this page, it seems that as long as we create proper index for composite keys, there should be no huge difference. And it should be safe to conclude that doing the same thing with multiple queries will be slower than with one query. Finally I came up with a solution that works! We just need to support 2 HTTP methods for "class_data", the GET with query strings returns one class. the POST with a json array returns an array. And we could potentially hard-code some upper limit about the post request and may even check the duplications inside. I'm looking forward to your input! |
No description provided.