Skip to content
This repository has been archived by the owner on Mar 8, 2019. It is now read-only.

additional parsing option, conditional parsing and callbacks for attribute checking #152

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Mipme
Copy link

@Mipme Mipme commented Jul 17, 2012

Added support for "extract" tag parser option:

consider following case:

<div><p>some text</p></div>

parser rule

"div": "extract"

gives you

<p>some text</p>

Added support for conditions on parser options:

<div class="dontlike"><p>text A</p><div class="like">text B</div></div>

parser rule

"div": {
  "extract": {
    "if": {
      "class": "^dontlike$"
    }
  }
}

result:

<p>text A</p><div class="like">text B</div>

and finally callbacks for attribute checking:

"a": {
  "check_attributes": {
    "rel": { "func": "checkLinkRelation" }
  }
}
function checkLinkRelation(attr, node) {
  var l = node.getAttribute('href');
  var p = new RegExp('^((ht|f)tps?:\/\/)');
  if (l.match(p)) { return 'nofollow'; }
  return false;
}

So if link text gets pasted or link set, then no rel-attribute is set.
On the other hand inserting to Goggles results in: rel="nofollow"

Oh, yes - and for convenience reasons, i sorted the advanced tag roles...

…ttribute checks

- Added support for node extraction
- All operations (remove, rename_tag, extract, keep) can be conditioned
by if and ifnot depending on the attributes of the node.
- attributes can be checked by a callback
This reverts commit d349863.
@p3drosola
Copy link

I think it's important to be able to extend the rule parser, but I think a better approach would be something like this.

#166

where we could write real functions in the rule set. something like:

"div": {
  "content" : function (oldNode){
    return $(oldNode).text();
  }
}

to perform an extract for example.

@Mipme
Copy link
Author

Mipme commented Jul 26, 2012

Hi Pedro,

that's of course also a way to achieve it - the only wish i'd have is to simply provide the name of the function, since i prefer having the functions reside in my main script file and not in the config settings.

Maybe i've got time this weekend to code it. Cheers!

@p3drosola
Copy link

You already can. Use something like:

"div": {
  "content" : App.helpers.extract
}

Just pass the actual function object

App.helpers.extract = function() { ... }

@Mipme
Copy link
Author

Mipme commented Jul 30, 2012

oh, cool - thx for the hint! This way i can check the attributes then of course as well.

By the way is it possible to mark a tag+attribute as none-deletable?
Like for a columnisation?

@p3drosola
Copy link

I think if you pass

{
   "attribute" : true
}

it might work

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants