Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a command to extract <title> from external links #837

Closed
ubitux opened this issue Mar 23, 2020 · 11 comments
Closed

Add a command to extract <title> from external links #837

ubitux opened this issue Mar 23, 2020 · 11 comments

Comments

@ubitux
Copy link
Contributor

ubitux commented Mar 23, 2020

A feature I miss in Vimwiki is the ability to name external links in a more meaningful way by basically remotely query the URL to extract the <title>.

One typical use-case is that I tend to aggregate many external links into my notes. Unfortunately, many URLs have no meaning whatsoever. Pick for example Youtube, where I sometime come across a music I want to downl^Wbuy. I need to manually write [Foobar - Blabaz](https://youtube.com/...). A trigger command to insert that automatically from the URL would help a lot. And if the link dies somehow, at least I have the reference.

There are various oneliners available to do that, but here is mine as a PoC:

python -c 'import bs4,urllib;print(bs4.BeautifulSoup(urllib.request.urlopen("https://wikipedia.org/")).head.title.contents[0])'

Python and BeautifulSoup are generally packaged by distro, I think it's not much of a requirement as optional dependencies. Of course, some curl and regex magic could do as well.

@brennen
Copy link
Member

brennen commented Mar 24, 2020

This might be a better fit for something in utils.

@ubitux
Copy link
Contributor Author

ubitux commented Mar 26, 2020

I managed to hack something:

nnoremap <Leader>u :call FormatCurrentURL()<CR>

" ...

function! FormatCurrentURL()
    let l:url = expand("<cWORD>")
    let l:esc_url = shellescape(l:url)
    let l:useragent = shellescape('Mozilla/5.0')
    let l:pycode  = 'import bs4, urllib.request;'
    let l:pycode .= 'req = urllib.request.Request('.l:esc_url.', headers={''User-Agent'': '.l:useragent.'});'
    let l:pycode .= 'print(bs4.BeautifulSoup(urllib.request.urlopen(req)).head.title.contents[0])'
    let l:cmd = 'python -c "'.l:pycode.'"'
    let l:title = substitute(substitute(trim(system(l:cmd)), '[', '<', ''), ']', '>', '')
    execute "normal! viW\<ESC>`>a)\<ESC>`<i[".l:title."]("
endfunction

So If i have the following under my cursor:

https://github.com/vimwiki/vimwiki

And I press <Leader>u, I get:

[GitHub - vimwiki/vimwiki: Personal Wiki for Vim](https://github.com/vimwiki/vimwiki)

It doesn't:

  • support the vimwiki language
  • escape urls properly (parenthesis need to be replaced with %28 and %29)
  • deal with a request fail sanely
  • check for the python3 + bs4 dependency first

But so far it's already very useful for me as I paste a lot of URLs in my wiki.

@tinmarino
Copy link
Member

Vimwiki is a standalone plugin. So no python dependent code will be added. As you have a custom solution, I advise you to keep it.
Otherwise, the vim way to go is :h netrw. For example:

enew
NRead https://github.com/vimwiki/vimwiki

Then:

  1. Get only the head
  2. Put it in a variable and not a buffer
  3. Grep <title>HERE</title>

4h work: If you make a PR, it will be accepted <= useful feature.

@jeromg
Copy link
Contributor

jeromg commented Aug 5, 2020

Hi @tinmarino

I'm willing to have a look at this one. As I'm a complete beginner in vimscript, I don't see how you can get the result of execute('NRead https://github.com/vimwiki/vimwiki') into a variable.

When I try to do this, the content of the page gets inserted in the current buffer. I've been looking around, including in vim default autoload scripts where netrw.vim is, I could find one example in spellfile.vim but they don't put the output of Nread into a variable.

Do I have to use a hidden buffer or is there a better way?
Any suggestion welcome!

Cheers,
Jerome

@tinmarino
Copy link
Member

Hi @jeromg,

Thanks for the willing to implement this very nice feature, I would quote from Top Gun: "There is nothing stronger than the heart of a volunteer."

1 Analyse

As the :h netrw is large, Reading the code :e $VIMRINTIME/autoload/netrw.vim show that netrw can only apply to file:

  • Because the inteligence is bound to the file writing: which is cool is network is slow or file is big or both
" netrw#NetRead: responsible for reading a file over the net {{{2
"   mode: =0 read remote file and insert before current line
"         =1 read remote file and insert after current line
"         =2 replace with remote file
"         =3 obtain file, but leave in temporary format
fun! netrw#NetRead(mode,...)

2 Solutions

So you have to write in a file as far as I know. To get the content in a variable you have an easy solution (quick and dirty for testing) and a harder but cleaner one (for produciton because have no side effect as moving cursor, save the buffer, modifining the undolist ...)

2.1 Dirty

  1. In test: I would read at end of file and then get the result, deleting it
dG
:let net_content = @"

2.2 Clean

  1. In production: I would get it to a temporary file
:3Nread  https://github.com/vimwiki/vimwiki  # To get content in a temporary buffer
:exe 'e ' . b:netrw_tmpfile   # To edit the temp file

and get the content of the file in a variable as this answer

:let buff=join(getline(1, '$'), "\n")

In your case, it will be a little harder, you'll have to load a file and use getbufline() to get content of an other buffer, then delete (bwipeout) it.

3 Note on VimDoc

The best part of vimdoc for:

  • vimscript is the list of function ; :h function-list
  • getting the patch of the release of a function :h verion8 or :h version7 and then search

Libre sourcement votre,
Tinmarino.

@ubitux
Copy link
Contributor Author

ubitux commented Aug 5, 2020

If I may, here is a list of things to be aware of while working on this feature:

  • <title> may be present outside of the <head> (and it's valid)
  • <title> can contain line breaks
  • there is a need to handle charset/encoding (this one is hard)
  • HTML escaping is required as it's pretty common (&gt;, &nbsp;, &#x200F;, ...)
  • conflict with selected vimwiki syntax requires escaping: for example, if a title contains [...] it may break the markdown syntax for links, and I'm not sure vimwiki properly supports syntax such as [hello \[world\](foobar)](https://...)
  • similarly, you want to URL encode special characters, such as parenthesis (so that an URL with them doesn't break the vimwiki syntax, but with no escaping this time so it is still valid verbatim for a browser)
  • you will need to trick the user agent for most websites to accept the query

@tinmarino
Copy link
Member

@jeromg glad to see people for different activities (project manager => CTO, in your case) using some secret deep tools like vimwiki.

You are an interesting customer for being between computerists (your co-workers) and computer noobs (your clients).

This leads to an obviously different use than me (working 100% in Vim) so feel free to raise some issues, potentially with screenshot of your config.

@tinmarino
Copy link
Member

tinmarino commented Aug 5, 2020

Colorscheme Advice (Just for refernce <= not related to this issue ... )

If I may: it took me 10 years to gather the motivation to invest in searching good colorscheme, now there is a vimcolor site. A nice colorscheme changes your life and may make you loyal ("fideliser") you to vim.

try the gruvbox-material colorscheme or the list I forked (I added 800 more) or directly from rafi, the source of the fork

From net advice and tested

  1. gruvbox-material
  2. apprentice (grey)
  3. eighties (base16, contrasted and clear)
  4. jellybeans
  5. hybrid_material
  6. neverland-darker (nice idea to hi curent cursor line number)
  7. solarized8
  8. dracula

From my experience

  1. dante (of course, but bad for diff)
  2. evening (low contrast)
  3. fromthehell (red and yellow, nice contrast)
  4. frood (exactly the opposite blue)
  5. fruit (like dante almost
  6. gravity
  7. gryffin (many color and darker than grishin)
  8. gurunew (puple comments)
  9. hackerman (like aquaman, this is blue !!)
  10. harlequin (super !!!)
  11. hearld (not bad)
  12. inkpot (brown, white, light pink))
  13. duoduo
  14. kkruby (colorfull)

@jeromg
Copy link
Contributor

jeromg commented Aug 11, 2020

Hi there, quick update: I do have a working prototype that seems to work reasonably well in pure vimscript. I still need to add some error handling and also need to do some testing with vim7/8 (I developed and tested the prototype with neovim).

Where shall I add the two functions needed in the vimwiki source tree for the PR? I was thinking of base.vim, thoughts?

Cheers,

@tinmarino
Copy link
Member

tinmarino commented Aug 12, 2020

@jeromg yes base.vim : there is no other places already.
We will once add link, head and laybe some parse or cmd.
As long as you put them together in base.vim or a new cmd.vim with comments we will refactor later (I raised an issue).
My vote would be to put these independant commands in cmd thought so it does no get loaded at startup but when user launch a command

jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
jeromg added a commit to jeromg/vimwiki that referenced this issue Aug 12, 2020
@jeromg jeromg mentioned this issue Aug 12, 2020
Closed
6 tasks
@tinmarino
Copy link
Member

Fixed by #PR #982 that got integrated by hand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants