helpful tools for scraping
using npm
npm i @trosckey/scraper-utils
using yarn
yarn add @trosckey/scraper-utils
removes html tags, codes and line breaks(\n, \t, etc)
import { clearHtml } from '@trosckey/scraper-utils'
clearHtml('<div>Hello, =World!\n</div>') // "Hello, World!"
parsing date formats, returns Date
object
or null
if cannot parse date
parse ru date format
import { parseRuDate } from '@trosckey/scraper-utils'
parseRuDate('01 ноя 2020').toDateString() // "Sun Nov 01 2020"
parseRuDate('28 Декабря 2016').toDateString() // "Wed Dec 28 2016"
replaces double spaces by single
import { removeDoubleSpaces } from '@trosckey/scraper-utils'
removeDoubleSpaces('Hello , World ! ') // 'Hello , World ! '
tries to execute the function until it runs out of
attempts (default 3). returns promise with resolved value
from executed function or null
if all attempts failed.
retryOnError(
// function to execute
func: Function,
{
// The number of times to retry, 3 by default
retries?: number,
// func if you just want to log all failed tries
logError?: (error: Error) => Promise<any> | any
// executes on every error
onError?: ({
// error from executed function
error: Error,
// returns `true` if it was last try
isFinalTry: boolean,
}) => Promise<any> | any
}
)
examples:
import { retryOnError } from '@trosckey/scraper-utils'
await retryOnError(() => (
throw new Error("erorr >.>"),
), {
tries: 2,
logError: console.error
})
// ...
await retryOnError(() => throw new Error("erorr >.>"))
// ...
const data = await retryOnError(async () => {
const response = await fetch('https://example.com/')
return response.text()
}, {
tries: 5,
onError: ({ error, isFinalTry }) => {
if (isFinalTry) {
console.error("Cannot download page :(", error)
}
}
})
console.log(data) // '<!doctype html><html><head>...' or null
simple sleep function, takes a number (milliseconds)
import { sleep } from '@trosckey/scraper-utils'
await sleep(5000)
slices text by words
import { sliceTextByWords } from '@trosckey/scraper-utils'
sliceTextByWords(
`
Lorem Ipsum: It was popularised in the 1960s
Ipsum passages: and more recently with desktop
publishing: software like Aldus PageMaker including versions of.
`,
[
'Lorem Ipsum:',
'Ipsum passages:',
'publishing:'
]
)
/**
* {
* 'Lorem Ipsum:': ' It was popularised in the 1960s',
* 'Ipsum passages:': ' and more recently with desktop',
* 'publishing:': ' software like Aldus PageMaker including versions of.'
* }
search node in tree, returns found node or null
import { treeSearch } from '@trosckey/scraper-utils'
const tree = {
id: 1,
children: [
{ id: 2 },
{
id: 3,
children: [
{ id: 4 }
]
}
]
}
treeSearch(
tree,
'children',
node => node.id === 4
) // { id: 4 }
treeSearch(
tree,
'children',
node => node.id === 5
) // null