Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build a binary without embedded dictionary included? #5

Closed
1 of 3 tasks
AmazingRise opened this issue May 13, 2022 · 2 comments
Closed
1 of 3 tasks

How to build a binary without embedded dictionary included? #5

AmazingRise opened this issue May 13, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@AmazingRise
Copy link

AmazingRise commented May 13, 2022

Sorry for the wrong proposal. The problem should only be related to gse-bleve.

  • Gse version (or commit ref): 0.70.1
  • Go version: 1.17
  • Operating system and bit: Ubuntu 20.04 64bit
  • Can you reproduce the bug at Examples:
    • Yes (provide example code)
    • No
    • Not relevant
  • Provide example code:
package main

import (
	"fmt"
	"os"

	"github.com/blevesearch/bleve/v2"
	gse "github.com/vcaesar/gse-bleve"
)

func main() {
	opt := gse.Option{
		Index: "test.blv",
		// Dicts: "embed, ja",
		// Dicts: "embed, zh",
		Dicts: "dict.txt",
		Stop:  "",
		Opt:   "search-hmm",
		Trim:  "trim",
	}

	index, err := gse.New(opt)
	if err != nil {
		fmt.Println("new mapping error is: ", err)
		return
	}

	text := `見解では、謙虚なヴォードヴィリアンのベテランは、運命の犠牲者と悪役の両方の変遷として代償を払っています`
	err = index.Index("1", text)
	index.Index("3", text+"浮き沈み")
	index.Index("4", `In view, a humble vaudevillian veteran cast vicariously as both victim and villain vicissitudes of fate.`)
	index.Index("2", `It's difficult to understand the sum of a person's life.`)
	if err != nil {
		fmt.Println("index error: ", err)
	}

	query := "運命の犠牲者"
	req := bleve.NewSearchRequest(bleve.NewQueryStringQuery(query))
	req.Highlight = bleve.NewHighlight()
	res, err := index.Search(req)
	fmt.Println(res, err)

	os.RemoveAll("test.blv")
}

I've tested these dictionary configurations, trying to find out the bug:

  • Using embed, zh as Dicts. 43Mb of binary generated.
  • Using dict.txt (a custom dictionary, only 3 lines) as Dicts. 43Mb of binary generated.
  • Replace gse-bleve with the default bleve. 11Mb of binary generated.

It seems that whatever the dictionary is, it would always include the embedded dictionaries in the binary file. I managed to find the bug, but I failed.

Thanks for your help.

@vcaesar
Copy link
Owner

vcaesar commented May 18, 2022

The bleve RegisterTokenizer will load all embed files, I will think how to optimize it.

@vcaesar vcaesar added the enhancement New feature or request label May 18, 2022
@vcaesar
Copy link
Owner

vcaesar commented May 19, 2022

You can updated and use go build -tags=ne -v, the build will not load embed files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants