#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92

CTrando · 2019-04-03T05:05:26Z

No description provided.

… not yet hooked up yet

scraper/scraper_impl/capes_scraper.py

scraper/scraper_impl/course_scraper.py

…ture backend operations

…eded query

CTrando · 2019-04-08T19:51:51Z

Make sure to work on a separate branch when you start doing the golang development. Don't worry about the frontend for now, if the request doesn't have a quarter default it to the latest one.

snowme34 · 2019-04-10T02:32:07Z

Make sure to work on a separate branch when you start doing the golang development. Don't worry about the frontend for now, if the request doesn't have a quarter default it to the latest one.

For the "latest one", since the backend and scraper do not share the same config currently, I just created another config entry for the backend. Don't know if that works

snowme34 · 2019-04-15T07:35:32Z

For the current error:

sdschedule-scraper     | Traceback (most recent call last):
sdschedule-scraper     |   File "./webreg_scrape_upload.py", line 51, in <module>
sdschedule-scraper     |     main()
sdschedule-scraper     |   File "./webreg_scrape_upload.py", line 23, in main
sdschedule-scraper     |     department_scraper = DepartmentScraper()
sdschedule-scraper     |   File "/app/scraper_impl/department_scraper.py", line 20, in __init__
sdschedule-scraper     |     self.browser = webdriver.Chrome(chrome_options=options, executable_path=DRIVER_PATH)
sdschedule-scraper     |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
sdschedule-scraper     |     desired_capabilities=desired_capabilities)
sdschedule-scraper     |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 154, in __init__
sdschedule-scraper     |     self.start_session(desired_capabilities, browser_profile)
sdschedule-scraper     |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 243, in start_session
sdschedule-scraper     |     response = self.execute(Command.NEW_SESSION, parameters)
sdschedule-scraper     |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 311, in execute
sdschedule-scraper     |     self.error_handler.check_response(response)
sdschedule-scraper     |   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
sdschedule-scraper     |     raise exception_class(message, screen, stacktrace)
sdschedule-scraper     | selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome version must be >= 69.0.3497.0
sdschedule-scraper     |   (Driver info: chromedriver=72.0.3626.69 (3c16f8a135abc0d4da2dff33804db79b849a7c38),platform=Linux 4.9.0-8-amd64 x86_64)
sdschedule-scraper     |

I did some search and realized that this is a problem with versions of chrome and chrome-driver.

The driver in the driver directory seems to mismatch with the version needed by the container.

And I don't quite understand the reason for using this image.

…-Schedule-Planner into scratch/issue86

snowme34 · 2019-04-19T17:23:09Z

We might need to prune the requirements.txt a little bit. I'm trying to figure it out

snowme34 · 2019-04-19T18:47:38Z

Still working on pruning.

Hard to test since each scraping takes at least 20min.

Maybe we want a feature that our scraper only scrapes for one department for testing.

Don't know if the new Dockerfile and requirements.txt are good

…rse data functionality

…-Schedule-Planner into scratch/issue86

…tually get cloned

snowme34 · 2019-05-30T17:40:40Z

Correct me if I'm wrong: it seems that the code creates a new db connection each time data is requested, which should not be the case, right?

I can take a look at this

backend/routes/RoutesCommon.go

docker-compose.yml

snowme34 · 2019-06-03T16:03:01Z

docker-compose.yml

@@ -4,9 +4,11 @@ services:
  sdschedule-database:
    container_name: sdschedule-database
    image: mariadb:latest
+    ports:
+      - 10800:3306 # TODEL: DEV port


TODEL: Temporary Development Code

backend/db/db.go

…ing one

…ot exported

snowme34 · 2019-06-06T04:07:38Z

I am confused how you tested the code previously. I believe testing should not use real database.

I fixed the tests except one, this is the output:

go test -v ./...                                                                                 
=== RUN   TestGetClassData                                                                         
--- FAIL: TestGetClassData (0.00s)                                                                 
    GetClassData_test.go:73: json: cannot unmarshal object into Go value of type []routes.Subclass 
    GetClassData_test.go:77: Subclasses for CSE 11 should definitely not be 0!                     
=== RUN   TestGetRequestClassDataFails                                                             
--- PASS: TestGetRequestClassDataFails (0.00s)                                                     
=== RUN   TestGetDepartmentSummary                                                                 
--- PASS: TestGetDepartmentSummary (0.00s)                                                         
=== RUN   TestGetDepartmentSummaryFailsOnPost                                                      
--- PASS: TestGetDepartmentSummaryFailsOnPost (0.00s)                                              
=== RUN   TestGetDepartments                                                                       
--- PASS: TestGetDepartments (0.00s)                                                               
=== RUN   TestGetDepartmentsOnPostFails                                                            
--- PASS: TestGetDepartmentsOnPostFails (0.00s)                                                    
=== RUN   TestGetInstructors                                                                       
--- PASS: TestGetInstructors (0.00s)                                                               
=== RUN   TestGetTypes                                                                             
--- PASS: TestGetTypes (0.00s)                                                                     
FAIL                                                                                               
FAIL    github.com/ucsdscheduleplanner/UCSD-Schedule-Planner/backend/test       0.314s

I did not change the code for getClassData at all but the JSON thing does not seem to be working.

snowme34 · 2019-06-06T04:19:34Z

Brief:

I employed principles including but not limited to SRP, DRY, OO, OCP and so on when typing on the keyboard

Sample code:

const logTagDepartment = "[Department]"

// GetDepartments is a pre-http.HandlerFunc for department route and will become a closure with *DatabaseStruct
func GetDepartments(writer http.ResponseWriter, request *http.Request, ds *db.DatabaseStruct) {

	if request.Method != "GET" {
		errInvalidMethod(logTagDepartment, writer, request)
		return
	}

	queryAndResponse(
		ds, logTagDepartment, writer, request,
		rowScannerOneString,
		"SELECT DISTINCT DEPT_CODE FROM DEPARTMENT",
		"DEPARTMENT")
}

Two functions calls is exactly what is needed.

I did not change the way of query for GetClassData since I do not quite understand the reason behind using a for loop rather than using one query
I did not put too much efforts on re-writing the testing files (no major programming design principles applied) but one test is still failing
Perhaps instead of "db", we should call that file "ds" or "Env" for the environment of the whole server-side software
I left several TODO and TODEL in the code
I'm not extremely confident about the way to handle errors and logging (for routes)
The next step is to write Dockerfile and try to integrate it with other componenets

backend/routes/ClassData.go

CTrando · 2019-06-16T05:18:33Z

So if there is a test failing that is an issue because I made the tests such that the JSON format is what the frontend expects. Also I agree that it's not great to require a database connection in order to run the tests, but these integration tests are what's needed for full confidence, unit tests won't guarantee anything.

CTrando

So overall I do like the changes you've done. There are a lot of issues - I think you've fallen for premature optimization but everyone does. You've tried to make methods for each individual usecase that you might have and I think that is not a great way to proceed - sure if a specific pattern is done a lot we can make a method for it, but most of these cases we have only used once or twice.

Moreover some granularity is good. Instead of 10 custom error methods, you can do something like

err(ERR_STRING, type)

And you can have a bunch of different types of error messages instead of having a bunch of different types of error methods. Try to build up your intuition on this - it might take some time but always remember that you program to make stuff easier for yourself. So if you rip out a bunch of stuff to make things look nicer but when you want to add a new error message or a new usecase you have to write a bunch of boilerplate code then consider if you are making the right design choices.

backend/backend.go

backend/db/db.go

backend/routes/Department.go

backend/routes/Instructors.go

backend/routes/RoutesCommon.go

docker-compose.yml

snowme34 · 2019-06-17T06:11:23Z

Thank you so much for your feedback!

I will try to follow them and try to dive deeper! I found this sql query interesting: https://stackoverflow.com/questions/1136380/sql-where-in-clause-multiple-columns

snowme34 · 2019-08-03T07:57:07Z

Let's use the username: "splanner", shall we?

I refactored things. It compiles

add several structs, like ErrorStruct and QueryStruct
separate queryAndResponse into query and response
rename files based on the recommendations here
more comments
rewrite the beast, class data route, to create a query on the fly

Next step is

add env to remove awkward hard-coded variable
use docker to add integration (test)
more refactor!
~~fix the awkward multiple query for class data by means of row constructor~~
improve error according to https://tools.ietf.org/html/rfc7807
use env instead of *db
closure the query function for handlers with env.db
add caching

snowme34 · 2019-08-04T20:05:38Z

This is a long pull request.

I finally figure out why the class data route test fails. The test expects a pure array of json. But the code returns a map like this: {"ABC 11": [...], "MATH 12": [...]}

unit tests are suggested to be put right next to the source code by effective go
test directory is fine with integration test

Let's first reach a consensus about out API:

contract
- format
- keys and values
duplicates
- what happens if user requests same classes multiple times

CTrando

Overall, looks really good. Don't merge this yet of course but I'm impressed by your code quality. Just a few things here and there. If the tests fail because they expect some format, that is the format that the frontend expects.

backend/backend.go

CTrando · 2019-08-10T04:34:02Z

backend/backend.go

+
+	// TODO: use env in handlers
+	// TODO: make a handler factory
+	http.HandleFunc("/api_course_nums", route.MakeHandler(route.GetCourseNums, db, route.LogPrefixCourseNums))


What is this LogPrefix?

backend-py/config/config.example.ini

CTrando · 2019-08-10T04:37:53Z

backend/route/classdata.go

+)
+
+// LogPrefixClassData log prefix for ClassData route
+const LogPrefixClassData = "[ClassData]"


If there is only one prefix can't you just make it named LogPrefix instead of having to put ClassData at the end? I realize you have this in other places too, but can't you do Package.LogPrefix instead of LogPrefixPackage in those other spots?

Since they are all under the route package, they are all in the same name scope. I actually struggled with that too but decided to go with this

backend/route/classdata.go

backend/route/types.go

backend/store/db.go

backend/util/util.go

CTrando · 2019-08-10T05:02:01Z

backend/route/classdata_test.go

+			[]interface{}{"LTEN", "26", "LTEN", "26"},
+		},
+		{
+			fmt.Sprintf(


I see that you are combining all statements together instead of making separate statements. This is probably for a performance increase, but have you noticed any markably different performance differences? If the code is must more reasonable doing each query separately and you don't have full confidence in your method of doing this, then I would say the performance drop is justified. However, if you are confident then I'm willing to go along with it.

CTrando · 2019-08-10T05:02:43Z

backend/route/classdata_test.go

+
+}
+
+func TestGetClassData(t *testing.T) {


More tests for get class data and departments and such, these are the big ones that the web app uses.

Will add after a consensus for REST API

CTrando · 2019-08-10T05:05:20Z

Ok sure, I can write something up on the contract.

snowme34 · 2019-08-10T05:52:38Z

Ok sure, I can write something up on the contract.

That will be great! Include but not limit to, api names (do we really need the api prefix? I have no idea), api format, api input format, api input check, cache-ability.

Should work as the first scratch, but somethings existed before, like caching, is not implemented yet.

About the query

Performance of simple query always triumphs. However, back and forth database calls should be generally avoided. That is sync calls between components, which is to put components in series.

But of course, there is never a perfect solution that works optimally for all situations. I think I will be confident if I read the front-end code which calls that api.

CTrando · 2019-08-12T05:30:00Z

Performance of simple query always triumphs. However, back and forth database calls should be generally avoided. That is sync calls between components, which is to put components in series.

Sure I agree with you that it will be more performant, but I'm saying there is a tradeoff between readability and maintainability in combining all the queries into one. By doing this way we add more edge cases and potential mishaps. That's why you should be confident in your current algorithm before switching to this, but I agree doing it in one query is more performant.

CTrando · 2019-08-12T06:18:36Z

Ok let's define the format.

We have

api_department?quarter={QUARTER} : Returns a list of departments for the given quarter, e.g ["CSE", "ECE", etc] - This is called when the page initially loads up to get a list of departments to show.
api_classpreview?quarter={QUARTER}&department={DEPARTMENT} : Returns a list of objects of the form:
{courseNum: COURSE_NUM, data : {description: DESCRIPTION, instructors: [INSTRUCTORS], types: [TYPES]}}. It's used to show the user a preview of the class, for example it shows the user that 11 is a course number for department CSE. INSTRUCTORS and TYPES are a list of instructors and types such as [Joe Politz, Rick Ord], [LE, DI].
api_classdata: This one is up for debate. We can either make it take in only one class (a department + courseNum pair) and return the data for just that class, or we can have it such that it takes in a list of pairs and it gives results for that entire one. I think maybe it would be better to do it as multiple small requests because it would be easier to process on the backend and frontend.

snowme34 · 2019-08-12T07:35:05Z

Performance of simple query always triumphs. However, back and forth database calls should be generally avoided. That is sync calls between components, which is to put components in series.

Sure I agree with you that it will be more performant, but I'm saying there is a tradeoff between readability and maintainability in combining all the queries into one. By doing this way we add more edge cases and potential mishaps. That's why you should be confident in your current algorithm before switching to this, but I agree doing it in one query is more performant.

If we restrict the quarter to be identical per query (it's weird for people to schedule multiple quarters on one calendar), then my SQL building function will be much simpler (just creating correct number of question marks)

I was thinking using a api.example.com domain for apis would be better than the "api_" prefix

api_department?quarter={QUARTER}

[{"AAA", "BBB"}]

api_classreview?quarter={QUARTER}&department={DEPARTMENT}

[{courseNum: 99, data : {description: abyss, instructors: ["A", "B"], types: ["LE", "DI"]}}, {courseNum: 100, data : {description: void, instructors: ["A", "B"], types: ["LE", "DI"]}}]

api_classdata

POST with a list of requests

{
"quarter": "CLASS_DATA",
"classes": [{"department": "AAA", "courseNum": "1"}, {"department": "AAA", "courseNum": "2"}] 
}

I would like more details about the response format that fits the front-end 's needs well.

[...]

I couldn't figure out why it's easier to process with one class per request. For front end, we will send an array of promises and wait_all, otherwise it will be an overkill for performance. For back end, we will have exponential number of requests with more users since they will add more and more classes.

https://restfulapi.net/rest-api-design-tutorial-with-example/

P.S. about row-constructor performance, according to this page, it seems that as long as we create proper index for composite keys, there should be no huge difference. And it should be safe to conclude that doing the same thing with multiple queries will be slower than with one query.

Finally I came up with a solution that works! We just need to support 2 HTTP methods for "class_data", the GET with query strings returns one class. the POST with a json array returns an array. And we could potentially hard-code some upper limit about the post request and may even check the duplications inside. I'm looking forward to your input!

CTrando and others added 3 commits April 2, 2019 20:40

Making new scraper folder and editing docker file to accommodate

84b7ca3

#86: decouple scraper from backend

59321d7

#86: Allowing the scraper to work with multiple quarters - backend is…

0d19ed8

… not yet hooked up yet

CTrando assigned snowme34 Apr 3, 2019

CTrando commented Apr 3, 2019

View reviewed changes

scraper/scraper_impl/capes_scraper.py Outdated Show resolved Hide resolved

scraper/scraper_impl/course_scraper.py Show resolved Hide resolved

CTrando and others added 5 commits April 4, 2019 10:12

Fixing comment and pruning quarter insert script

279b401

Making webreg upload script create database and quarters table for fu…

c9afcc0

…ture backend operations

Adding necessary config file

4e9ebc5

add quarter parameter to application

db0f27e

bug fix, add quarter variable, note frontend still not sending the ne…

922b633

…eded query

snowme34 added 2 commits April 8, 2019 17:40

#86 trying to add default quarter support for backend

4cc3bb1

#86 fix import path

490ca06

CTrando added 2 commits April 15, 2019 10:02

Fixing chrome version

e3f76a9

Merge branch 'scratch/issue86' of github.com:ucsdscheduleplanner/UCSD…

47faaba

…-Schedule-Planner into scratch/issue86

snowme34 added 5 commits April 19, 2019 11:42

#86 trying to prune Dockerfile

682a549

#86 working on pruning requirements.txt

4452996

#86 add note for possible caching bug

3c6a080

#86 disable WI19, no one cares

9ad923d

#86 add mysql non-root user, based on config.ini

498ce1e

CTrando added 5 commits April 25, 2019 17:45

Adding golang routes and adding back department functionality and cou…

29e49a2

…rse data functionality

Merge branch 'scratch/issue86' of github.com:ucsdscheduleplanner/UCSD…

6862d9a

…-Schedule-Planner into scratch/issue86

Adding other github repos, they will not be submodules so will not ac…

2872b6a

…tually get cloned

Finished moving backend to Golang

03b4504

Adding more tests and working out compatability issues with frontend

72ca1a6

snowme34 reviewed Jun 3, 2019

View reviewed changes

backend/routes/RoutesCommon.go Outdated Show resolved Hide resolved

snowme34 reviewed Jun 3, 2019

View reviewed changes

docker-compose.yml Outdated Show resolved Hide resolved

snowme34 reviewed Jun 3, 2019

View reviewed changes

snowme34 reviewed Jun 4, 2019

View reviewed changes

backend/db/db.go Outdated Show resolved Hide resolved

snowme34 added 5 commits June 5, 2019 10:06

Update RoutesCommon functions and error processing

0eb55b0

Cleanup Routes Code and update comments

a7bde26

Make a straightforward constructor for db struct and update the exist…

d8a97fa

…ing one

Add miltiple constructors for db since lower-case struct fields are n…

fafe144

…ot exported

Test using mock db

513216b

snowme34 reviewed Jun 6, 2019

View reviewed changes

backend/routes/ClassData.go Outdated Show resolved Hide resolved

CTrando commented Jun 16, 2019

View reviewed changes

refactor, fix error handling

98c989a

refacotr more

49ddc45

snowme34 added 2 commits August 4, 2019 21:36

refactor, restructure, add a ctx package with env

f6af1f6

change embarrassing package name

9eb7de2

CTrando commented Aug 10, 2019

View reviewed changes

snowme34 added 2 commits August 11, 2019 01:08

refactor, add handler factoary

06feb39

remove python code

3dd064a

add Docker support for go backend

ddf6944

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92

#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92

CTrando commented Apr 3, 2019

CTrando commented Apr 8, 2019

snowme34 commented Apr 10, 2019

snowme34 commented Apr 15, 2019

snowme34 commented Apr 19, 2019

snowme34 commented Apr 19, 2019 •

edited

Loading

snowme34 commented May 30, 2019 •

edited

Loading

snowme34 Jun 3, 2019

snowme34 commented Jun 6, 2019

snowme34 commented Jun 6, 2019

CTrando commented Jun 16, 2019

CTrando left a comment

snowme34 commented Jun 17, 2019

snowme34 commented Aug 3, 2019 •

edited

Loading

snowme34 commented Aug 4, 2019 •

edited

Loading

CTrando left a comment

CTrando Aug 10, 2019

CTrando Aug 10, 2019

snowme34 Aug 10, 2019

CTrando Aug 10, 2019

CTrando Aug 10, 2019

snowme34 Aug 11, 2019 •

edited

Loading

CTrando commented Aug 10, 2019

snowme34 commented Aug 10, 2019 •

edited

Loading

CTrando commented Aug 12, 2019

CTrando commented Aug 12, 2019

snowme34 commented Aug 12, 2019 •

edited

Loading

#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92

Are you sure you want to change the base?

#86 Decoupling the scraper from the backend and editing the scraper to make it more versatile with different quarters #92

Conversation

CTrando commented Apr 3, 2019

CTrando commented Apr 8, 2019

snowme34 commented Apr 10, 2019

snowme34 commented Apr 15, 2019

snowme34 commented Apr 19, 2019

snowme34 commented Apr 19, 2019 • edited Loading

snowme34 commented May 30, 2019 • edited Loading

snowme34 Jun 3, 2019

Choose a reason for hiding this comment

snowme34 commented Jun 6, 2019

snowme34 commented Jun 6, 2019

CTrando commented Jun 16, 2019

CTrando left a comment

Choose a reason for hiding this comment

snowme34 commented Jun 17, 2019

snowme34 commented Aug 3, 2019 • edited Loading

snowme34 commented Aug 4, 2019 • edited Loading

CTrando left a comment

Choose a reason for hiding this comment

CTrando Aug 10, 2019

Choose a reason for hiding this comment

CTrando Aug 10, 2019

Choose a reason for hiding this comment

snowme34 Aug 10, 2019

Choose a reason for hiding this comment

CTrando Aug 10, 2019

Choose a reason for hiding this comment

CTrando Aug 10, 2019

Choose a reason for hiding this comment

snowme34 Aug 11, 2019 • edited Loading

Choose a reason for hiding this comment

CTrando commented Aug 10, 2019

snowme34 commented Aug 10, 2019 • edited Loading

CTrando commented Aug 12, 2019

CTrando commented Aug 12, 2019

snowme34 commented Aug 12, 2019 • edited Loading

snowme34 commented Apr 19, 2019 •

edited

Loading

snowme34 commented May 30, 2019 •

edited

Loading

snowme34 commented Aug 3, 2019 •

edited

Loading

snowme34 commented Aug 4, 2019 •

edited

Loading

snowme34 Aug 11, 2019 •

edited

Loading

snowme34 commented Aug 10, 2019 •

edited

Loading

snowme34 commented Aug 12, 2019 •

edited

Loading