Permalink
Browse files

Merge branch '3.x'

  • Loading branch information...
technosophos committed Jun 1, 2012
2 parents 1073f41 + 4c7665c commit ec3c0aa38b8d8bdaa8191afbb29fd5779e2834f2
Showing with 228 additions and 35 deletions.
  1. +83 −33 README.md
  2. +2 −2 composer.json
  3. +30 −0 examples/at_a_glance.php
  4. +113 −0 quickstart-guide.md
View
116 README.md
@@ -2,7 +2,7 @@
**New development is happening on the `3.x` branch.**
-Authors: Matt Butcher (lead), Emily Brand, and others
+Authors: Matt Butcher (lead), Emily Brand, and many others
[Website](http://querypath.org) |
[API Docs](http://api.querypath.org) |
@@ -11,16 +11,63 @@ Authors: Matt Butcher (lead), Emily Brand, and others
[Developer List](http://groups.google.com/group/devel-querypath) |
[Pear channel](http://pear.querypath.org) |
-This package is licensed under the GNU LGPL 2.1 (COPYING-LGPL.txt) or, at your choice, an MIT-style
-license (COPYING-MIT.txt). The licenses should have been distributed with this library.
+This package is licensed under an MIT license (COPYING-MIT.txt) or, at your option, the
+LGPL version 2.1 or later. The licenses should have been distributed with this library.
+
+## At A Glance
+
+QueryPath is a jQuery-like library for working with XML and HTML
+documents in PHP.
+
+Say we have a document like this:
+```xml
+<?xml version="1.0"?>
+<table>
+ <tr id="row1">
+ <td>one</td><td>two</td><td>three</td>
+ </tr>
+ <tr id="row2">
+ <td>four</td><td>five</td><td>six</td>
+ </tr>
+</table>
+```
+
+And say that the above is stored in the variable `$xml`. Now
+we can use QueryPath like this:
+
+```php
+<?php
+// Get all of the <td> elements in the document and add the
+// attribute `foo='bar'`:
+qp($xml, 'td')->attr('foo', 'bar');
+
+// Or print the contents of the third TD in the second row:
+print qp($xml, '#row2>td:nth(3)')->text();
+
+// Or append another row to the XML and then write the
+// result to standard output:
+qp($xml, 'tr:last')->after('<tr><td/><td/><td/></tr>')->writeXML();
+
+?>
+```
+
+(This example is in `examples/at-a-glance.php`.)
+
+With over 60 functions and robust support for chaining, you can
+accomplish sophisticated XML and HTML processing using QueryPath.
## QueryPath Installers
The following packages of QueryPath are available:
- * PEAR package (`pear install querypath/QueryPath`): Installs the library and documentation.
- * Download from the [GitHub Tags page](https://github.com/technosophos/querypath/tags).
- * [Composer](http://packagist.org): Add this to the 'require' section of your `composer.json`:
+ * A PEAR package (`pear install querypath/QueryPath`): Installs the library and documentation.
+ * A download from the [GitHub Tags page](https://github.com/technosophos/querypath/tags).
+ * Via [Composer](http://getcomposer.org)
+
+### Composer
+
+To add QueryPath as a library in your project, add this to the 'require'
+section of your `composer.json`:
```json
{
@@ -30,63 +77,66 @@ The following packages of QueryPath are available:
}
```
-Or if you prefer PEAR:
+The run `php composer.phar install` in that directory.
+
+### Pear
+
+To install QueryPath as a server-wide library, you may wish to use
+PEAR or Pyrus. See [pear.querypath.org](http://pear.querypath.org)
+for more information, or simply run these commands:
```
$ pear channel-discover pear.querypath.org
$ pear install querypath/QueryPath
```
-### Downloads (for manual installation)
-
- * Phar (QueryPath-VERSION.phar): This is a Phar package which can be used as-is. Its size has been
- minimized by stripping comments. It is designed for direct inclusion in PHP 5.3 applications.
- * Minimal (QueryPath-VERSION-minimal.tgz): This contains *only* the QueryPath library, with no
- documentation or additional build environment. It is designed for production systems.
- * Full (QueryPath-VERSION.tgz): This contains QueryPath, its unit tests, its documentation,
- examples, and all supporting material. If you are starting with QueryPath, this might be the
- best package.
- * Docs (QueryPath-VERSION-docs.tgz): This package contains *only* the documentation for QueryPath.
- Generally, this is useful to install as a complement to the minimal package.
- * Git repository clone: You can always clone [this repository](http://github.com/technosophos/querypath) and work from that code.
+### Manual
-
-If in doubt, you probably want the PEAR version or the [Full package](http://github.com/technosophos/querypath/downloads).
+You can either download a stable release from the
+[GitHub Tags page](https://github.com/technosophos/querypath/tags)
+or you can use `git` to clone
+[this repository](http://github.com/technosophos/querypath) and work from
+the code. `master` typically has the latest stable, while `3.x` is where
+active development is happening.
## Including QueryPath
-If you installed QueryPath as a PEAR package, use it like this:
-
+As of QueryPath 3.x, QueryPath uses the Composer autoloader if you
+installed with composer:
```php
<?php
-require 'QueryPath/QueryPath.php';
+require 'vendor/autoload.php';
+
+// Optional: Use this to load `qp()` and `htmlqp()`
+require 'vendor/querypath/QueryPath/src/qp.php';
?>
```
-From the Full Install:
+If you installed QueryPath as a PEAR package, use it like this:
```php
<?php
-require 'QueryPath/src/QueryPath/QueryPath.php';
+require 'QueryPath/qp.php';
?>
```
-With the Phar archive, you can include QueryPath like this:
+From the download or git clone:
```php
<?php
-require 'QueryPath.phar';
+require 'QueryPath/src/QueryPath/qp.php';
?>
```
-Unfortunately, in the 2.1 branch of QueryPath, the Composer include is:
+With the Phar archive, you can include QueryPath like this:
```php
<?php
-require 'vendor/querypath/QueryPath/src/QueryPath/QueryPath.php';
+require 'QueryPath.phar';
?>
```
-The next major release of QueryPath will support Composer autoloading.
-
-From there, the main functions you will want to use are `qp()` and `htmlqp()`. Start with the [API docs](http://api.querypath.org/docs).
+From there, the main functions you will want to use are `qp()`
+(alias of `QueryPath::with()`) and `htmlqp()` (alias of
+`QueryPath::withHTML()`). Start with the
+[API docs](http://api.querypath.org/docs).
View
@@ -1,9 +1,9 @@
{
"name": "querypath/QueryPath",
"type": "library",
- "description": "HTML/XML querying and processing (like jQuery)",
+ "description": "HTML/XML querying (CSS 4 or XPath) and processing (like jQuery)",
"homepage": "https://github.com/technosophos/querypath",
- "license": "MIT-style",
+ "license": "MIT",
"keywords": ["xml", "html", "css", "jquery", "xslt"],
"require" : {
"php" : ">=5.3.0"
View
@@ -0,0 +1,30 @@
+<?php
+require '../src/qp.php';
+$xml =<<<EOF
+<?xml version="1.0"?>
+<table>
+ <tr id="row1">
+ <td>one</td><td>two</td><td>three</td>
+ </tr>
+ <tr id="row2">
+ <td>four</td><td>five</td><td>six</td>
+ </tr>
+ </table>
+EOF;
+
+print "\nExample 1: \n";
+// Get all of the <td> elements in the document and add the
+// attribute `foo='bar'`:
+qp($xml, 'td')->attr('foo', 'bar')->writeXML();
+
+print "\nExample 2: \n";
+
+// Or print the contents of the third TD in the second row:
+print qp($xml, '#row2>td:nth(3)')->text();
+
+print "\nExample 3: \n";
+// Or append another row to the XML and then write the
+// result to standard output:
+qp($xml, 'tr:last')->after('<tr><td/><td/><td/></tr>')->writeXML();
+
+?>
View
@@ -0,0 +1,113 @@
+# QueryPath QuickStart
+
+This short guide is intended to help you get started with QueryPath 3.
+
+## Using QueryPath in Your Project
+
+To use QueryPath inside of your own application, you will need to make sure that PHP can find the QueryPath library. There are a few possible ways of doing this. The first is to use an autoloader. The second is to include QueryPath manually. We'll look briefly at each.
+
+### Autoloaders and QueryPath
+
+In recent time, PHP has standardized a method of automatically importing classes by name. This is often called [PSR-0 autoloading](https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-0.md). Symfony, Composer, and many other PHP projects use PSR-0 autoloaders, and QueryPath should work with those. In addition, QueryPath has its own autoloader in `qp.php`.
+
+To use QueryPath's autoloader, all you need to do is include `qp.php`. This will detect if another autoloader is already in place, and if not, it will configure it's own autoloader:
+
+```{php}
+<?php
+require 'qp.php';
+
+print QueryPath::withHTML('http://technosophos.com', 'title')->text();
+
+print htmlqp('http://technosophos.com', 'title')->text();
+?>
+```
+
+The above illustrates the requiring of QueryPath's autoloader. Note that in that case we don't need to do anything else to get the `QueryPath` class or the `htmlqp()` functions.
+
+QueryPath also ships with [Composer](http://getcomposer.org) support. Composer provides PSR-0 autoloading. To use Composer's autoloader, you can do this:
+
+```{php}
+<?php
+// The composer autoloader.
+require 'vender/autoload.php';
+
+print QueryPath::withHTML('http://technosophos.com', 'title')->text();
+
+// THIS DOESN'T WORK!
+// print htmlqp('http://technosophos.com', 'title')->text();
+?>
+```
+
+Notice, though, that the `qp()` and `htmlqp` functions *will not work* with this method. Why? Because PHP's autoloader does not know about functions. It operates on classes only. So you can use QueryPath's Object-Oriented API (`QueryPath::with()`, `QueryPath::withHTML()`, `QueryPath::withXML()`), but not the `qp()` and `qphtml()` functions. If you want to use those, too, simply include `qp.php`:
+
+```{php}
+<?php
+// The composer autoloader.
+require 'vender/autoload.php';
+require 'qp.php';
+
+print QueryPath::withHTML('http://technosophos.com', 'title')->text();
+
+// This works because qp.php was imported
+print htmlqp('http://technosophos.com', 'title')->text();
+?>
+```
+
+## A Simple Example
+
+So far, we have seen a few variations of the same program. Let's learn what it does. Here's the program:
+
+```{php}
+<?php
+require 'qp.php';
+
+print QueryPath::withHTML('http://technosophos.com', 'title')->text();
+
+print htmlqp('http://technosophos.com', 'title')->text();
+?>
+```
+
+This does the same thing two different ways. Let's look at line 3:
+
+```{php}
+<?php
+print QueryPath::withHTML('http://technosophos.com', 'title')->text();
+?>
+```
+
+This line does three things:
+
+1. It loads and parses the HTML document it finds at `http://techosophos.com`. QueryPath can load documents locally and remotely. It can also load strings of HTML or XML, as well as `SimpleXML` objects and `DOMDocument` objects. It should be easy to get your HTML or XML loaded into QueryPath.
+2. It performs a search for the tag named `title`. QueryPath uses CSS 4 Selectors (as the current draft stands) as a query language -- just like jQuery and CSS. (If you prefer XPath, check out the `xpath()` method on QueryPath). Of course, `title` is a very basic selector. You can do more advanced selectors like `#bar-one table>tr:odd td>a:first-of-type()`, which looks for the element with ID `bar-one` and then fetches every odd row from its table, then from each cell in the row, it finds the first hyperlink.
+3. Finally, the example calls `text()`, which will fetch the text content of the first element it's found (in this case, the `title` tag in the HTML head). If not title is found, this will return an empty string. Otherwise it will return the text of that tag.
+
+QueryPath has well over 60 methods like `text()`. Some are for navigating, like `top()`, `children()`, `next()`, and `prev()`. Some are for manipulating the parts of an HTML or XML element, like `attar()`. Others are for doing sophisticated finding and filtering operations (`find()`, `filter()`, `filterCallback()`, `map()`, and so on). And, of course, there are methods for modifying the document (`append()`, `before()`, `after()`, `attr()`, `text()`, and many more).
+
+The goal of QueryPath is to make it easy for you to process XML and HTML documents. There may be a lot of methods to learn (just like jQuery), but those methods are there to make your life simpler.
+
+## HTML vs XML
+
+When QueryPath was first introduced, it did not distinguish between XML and HTML documents. At that time, momentum was behind XHTML, and it looked like the future was XML. But over time, it has become abundantly clear that HTML documents cannot be treated as XML during parsing and processing, or during output.
+
+So there are now separate parser functions for HTML and XML -- as well as a generic parser function that inspects the document and attempts to determine whether it is XML or HTML:
+
+* `QueryPath::withXML()`: This *only* handles XML documents. If you give it an HTML document, it will attempt to force XML parsing on that document.
+* `htmlqp()`, `QueryPath::withHTML()`: This will force QueryPath to use the HTML parser. it will also make a number of adjustments to QueryPath to accommodate common HTML breakages.
+* `qp()`, `QueryPath::with()`: This will attempt to guess whether the document is XML or HTML. In general, it favors XML slightly. Guessing may be done by…
+ - File extension
+ - XML declaration
+ - The suggestions made by any options passed into the document
+
+###… And Character Encoding
+
+XML suggests that all documents be encoded as UTF-8. Most HTML documents are encoded using one of the ISO specifications (typically ISO-8859-1). And web servers are often misconfigured to report that documents are using one character set when they are actually using another.
+
+To work around all of these issues, QueryPath attempts to convert documents automatically. It does this using PHP's internal character detection libraries. But sometimes it guesses wrong. You can adjust this feature manually by passing in language settings in the `$options` array. See the documentation on `qp()` for details.
+
+
+## Where to go from here
+
+* [QueryPath.org](http://querypath.org) has pointers to other resources.
+* [The API docs](http://api.querypath.org) have detailed explanations of every single part of QueryPath.
+
+

0 comments on commit ec3c0aa

Please sign in to comment.