Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking-change: change fetch command #214

Merged
merged 10 commits into from Sep 13, 2021
Merged

Conversation

MaineK00n
Copy link
Collaborator

@MaineK00n MaineK00n commented Sep 12, 2021

Breaking change:

The options for the fetch command have been changed.

$ go-cve-dictionary fetch nvd
$ go-cve-dictionary fetch jvn

What did you implement:

Removed years, latest, and last2y from fetch command.
By reviewing the fetch procedure, insert speed is stabilized overall, and memory consumption is greatly reduced.
Since we implemented fetching all the years each time, the list command is not so necessary and removed.

// upstream/master
// first insert(NVD)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch nvd --force --years 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
real(s): 204.56, memory(kB): 6199864
// first insert(JVN)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch jvn --force --years 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
real(s): 135.31, memory(kB): 3236148

// second insert(NVD)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch nvd --force --years 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
real(s): 1616.14, memory(kB): 5538592
// second insert(JVN)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch jvn --force --years 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
real(s): 189.53, memory(kB): 3281564

// PR
// first insert(NVD)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch nvd
real(s): 201.97, memory(kB): 742148
// first insert(JVN)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch jvn
real(s): 127.89, memory(kB): 254640

// second insert(NVD)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch nvd
real(s): 222.95, memory(kB): 725916
// second insert(JVN)
$ /usr/bin/time -f "real(s): %e, memory(kB): %M" ./go-cve-dictionary fetch jvn
real(s): 127.31, memory(kB): 237708

In addition, in the part of the JVN fetch that collects Cert, there is an implementation that assumes html as shown below, but in reality there is also txt, etc., so I changed it to parse only for html.
https://github.com/kotakanbe/go-cve-dictionary/blob/69922490c76abccd9bd828f0556b2df49217eae8/fetcher/jvn/jvn.go#L423-L494

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

$ git checkout upstream/master
$ patch -p1 < old.patch
$ make build
$ mv go-cve-dictionary integration/go-cve.old
$ git reset --hard

$ git checkout MaineK00n/change-fetch
$ make build
$ mv go-cve-dictionary integration/go-cve.new

$ for i in `seq 2002 2021`; do integration/go-cve.old fetch nvd --dbpath=$(PWD)/integration/cve.new.sqlite3 --years $i; done
$ for i in `seq 1998 2021`; do integration/go-cve.old fetch jvn --dbpath=$(PWD)/integration/cve.new.sqlite3 --years $i; done
$ for i in `seq 2002 2021`; do integration/go-cve.old fetch nvd --dbtype redis --dbpath "redis://127.0.0.1:6379/0" --years $i; done
$ for i in `seq 1998 2021`; do integration/go-cve.old fetch jvn --dbtype redis --dbpath "redis://127.0.0.1:6379/0" --years $i; done

$ integration/go-cve.new fetch nvd --dbpath=$(PWD)/integration/cve.new.sqlite3
$ integration/go-cve.new fetch jvn --dbpath=$(PWD)/integration/cve.new.sqlite3
$ integration/go-cve.new fetch nvd --dbtype redis --dbpath "redis://127.0.0.1:6380/0"
$ integration/go-cve.new fetch jvn --dbtype redis --dbpath "redis://127.0.0.1:6380/0"

$ make diff-server-rdb && make diff-server-redis
  • old.patch
diff --git a/fetcher/jvn/jvn.go b/fetcher/jvn/jvn.go
index 7c987d1..69c445d 100644
--- a/fetcher/jvn/jvn.go
+++ b/fetcher/jvn/jvn.go
@@ -448,28 +448,31 @@ func collectCertLinks(links []CertLink) (certs []models.JvnCert, err error) {
 	for _, l := range links {
 		tasks <- func() {
 			url := <-reqChan
-			req, err := http.NewRequest("GET", url, nil)
-			if err != nil {
-				log.Debugf("Failed to get %s: err: %s", url, err)
-				errChan <- err
-				return
-			}
+			var title string
+			if strings.HasSuffix(url, ".html") {
+				req, err := http.NewRequest("GET", url, nil)
+				if err != nil {
+					log.Debugf("Failed to get %s: err: %s", url, err)
+					errChan <- err
+					return
+				}
 
-			res, err := httpClient.Do(req)
-			if err != nil {
-				log.Debugf("Failed to get %s: err: %s", url, err)
-				errChan <- err
-				return
-			}
-			defer res.Body.Close()
+				res, err := httpClient.Do(req)
+				if err != nil {
+					log.Debugf("Failed to get %s: err: %s", url, err)
+					errChan <- err
+					return
+				}
+				defer res.Body.Close()
 
-			doc, err := goquery.NewDocumentFromReader(res.Body)
-			if err != nil {
-				log.Debugf("Failed to get %s: err: %s", url, err)
-				errChan <- err
-				return
+				doc, err := goquery.NewDocumentFromReader(res.Body)
+				if err != nil {
+					log.Debugf("Failed to get %s: err: %s", url, err)
+					errChan <- err
+					return
+				}
+				title = doc.Find("title").Text()
 			}
-			title := doc.Find("title").Text()
 			resChan <- models.JvnCert{
 				Cert: models.Cert{
 					Title: title,

Checklist:

You don't have to satisfy all of the following.

  • Write tests
  • Write documentation
  • Check that there aren't other open pull requests for the same issue/feature
  • Format your source code by make fmt
  • Pass the test by make test
  • Provide verification config / commands
  • Enable "Allow edits from maintainers" for this PR
  • Update the messages below

Is this ready for review?: YES

Reference

@MaineK00n MaineK00n self-assigned this Sep 12, 2021
@MaineK00n MaineK00n changed the title [WIP] feat: change fetch command feat: change fetch command Sep 13, 2021
Copy link
Collaborator

@kotakanbe kotakanbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

README.md Outdated

Global Flags:
--config string config file (default is $HOME/.go-cve-dictionary.yaml)
--dbpath string /path/to/sqlite3 or SQL connection string (default "$PWD/cve.sqlite3")
--dbpath string /path/to/sqlite3 or SQL connection string (default "$HOME/cve.sqlite3")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$HOME -> $PWD

@kotakanbe kotakanbe merged commit 6c93275 into master Sep 13, 2021
@kotakanbe kotakanbe changed the title feat: change fetch command breaking-change: change fetch command Sep 13, 2021
@MaineK00n MaineK00n deleted the MaineK00n/change-fetch branch September 13, 2021 11:54
This was referenced Sep 13, 2021
@MaineK00n MaineK00n mentioned this pull request Dec 23, 2021
9 tasks
MaineK00n pushed a commit to vulsdoc/vuls that referenced this pull request Jul 26, 2022
> Since we implemented fetching all the years each time, the list command is not so necessary and removed.
vulsio/go-cve-dictionary#214
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants