Skip to content
This repository has been archived by the owner on Jan 23, 2020. It is now read-only.

Commit

Permalink
Refactors client materials moving them from "pig" to "client" module
Browse files Browse the repository at this point in the history
Also:

- Replaces ./bin/ambrose-package with `mvn package`
- Adds Eclipse Jetty to ambrose-pig deps and removes references to Hadoop installation materials from pig-ambrose invocation script
- Adds demo pig script and data to amrbose-pig
- Updates ambrose-demo script to work with new client web resources location
- Updates README.md quick start text
  • Loading branch information
sagemintblue committed Jul 10, 2012
1 parent b41546c commit a65ce09
Show file tree
Hide file tree
Showing 27 changed files with 221 additions and 86 deletions.
4 changes: 4 additions & 0 deletions NOTICE
Expand Up @@ -11,6 +11,10 @@ Apache Pig
Apache Public License 2.0
http://pig.apache.org/

Eclipse Jetty
Apache Public License 2.0
http://www.eclipse.org/jetty/

D3.js
Copyright 2012, Michael Bostock
BSD License
Expand Down
9 changes: 4 additions & 5 deletions README.md
Expand Up @@ -69,19 +69,18 @@ following command and then browse to
```

Finally, you can run Ambrose with an actual Pig script. To do so, you'll need to build the
Ambrose distribution and untar it:
Ambrose Pig distribution and untar it:

```
./bin/ambrose-package
VERSION=0.1.0-SNAPSHOT
tar zxvf ambrose-$VERSION.tar.gz
mvn package
tar zxvf pig/target/ambrose-pig-$VERSION-bin.tar.gz
```

You can then run the following commands to execute `path/to/my/script.pig` with an Ambrose app server
embedded in the Pig client:

```
cd ambrose-$VERSION
cd ambrose-pig-$VERSION
./bin/pig-ambrose -f path/to/my/script.pig
```

Expand Down
21 changes: 12 additions & 9 deletions bin/ambrose-demo
Expand Up @@ -13,16 +13,19 @@
# See the License for the specific language governing permissions and
# limitations under the License.

WEBROOT="$(dirname $0)/../pig/src/main/resources"
function log { echo "$@" >&2; }
function die { log "$@"; exit 1; }

WEBROOT="$(dirname $0)/../client/src/main/resources"
PORT="${1:-8080}"

cd "$WEBROOT"
cd "$WEBROOT" \
|| die "Failed to cd to WEBROOT '$WEBROOT'"

echo "Starting web server on port $PORT" >&2
log "Starting web server on port '$PORT'"
log "Browse to either of the following URLs to see Ambrose in action:"
log " http://localhost:$PORT/web/index.html?localdata=small"
log " http://localhost:$PORT/web/index.html?localdata=large"
log "Hit ctrl-c to stop the web server"

# To instead see a larger job, pass localdata=large below
echo "Starting demo Ambrose server on port $PORT. Browse to either of the following URLs to see Ambrose in action:"
echo " http://localhost:$PORT/web/index.html?localdata=small"
echo " http://localhost:$PORT/web/index.html?localdata=large"
echo "Hit ctrl-c to stop the demo Ambrose server"
exec python -m SimpleHTTPServer "$PORT"
exec python -m SimpleHTTPServer "$PORT"
39 changes: 0 additions & 39 deletions bin/ambrose-package

This file was deleted.

19 changes: 19 additions & 0 deletions client/pom.xml
@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.twitter.ambrose</groupId>
<artifactId>ambrose</artifactId>
<version>0.1.0-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>

<groupId>com.twitter.ambrose</groupId>
<artifactId>ambrose-client</artifactId>
<version>0.1.0-SNAPSHOT</version>
<name>Ambrose Client</name>

</project>
10 changes: 5 additions & 5 deletions pig/README.md
Expand Up @@ -2,20 +2,20 @@

## Implementation

Ambrose integrates with Pig via Pig's `PigProgressNotificationListener` interface. The `ambrose-pig`
Ambrose integrates with Pig via Pig's `PigProgressNotificationListener` interface. The `./bin/pig-ambrose`
script launches Pig with the Ambrose implementation of PPNL. This implementation starts an embedded
[Jetty](http://jetty.codehaus.org/jetty/) server that exposes job runtime information to the Ambrose web UI.

## Known issues

* Ambrose currently requires Apache Pig's `0.11.0-SNAPSHOT` build, which is not a production release.
* Pig scripts with `exec` statements in them are not currently supported.
* Pig scripts which include `exec` statements are not currently supported.

## Pig patches

The Ambrose Pig integration requires a number of patches that are committed on the Pig trunk and
scheduled for release in Pig 0.11.0. Hence, the Ambrose distribution includes a Pig 0.11.0-SNAPSHOT
build. Note that running the `pig-ambrose` script will result in the script being executed with
scheduled for release in Pig 0.11.0. Hence, the Ambrose distribution references a Pig 0.11.0-SNAPSHOT
build. Note that running the `./bin/pig-ambrose` script will result in the script being executed with
the Pig 0.11.0-SNAPSHOT runtime.

Running Ambrose with a released version of Pig < 0.11.0 should be possible by applying these patches
Expand All @@ -24,4 +24,4 @@ to the release:
* [PIG-2660](https://issues.apache.org/jira/browse/PIG-2660) - PPNL should get notified of plan before it gets executed (ready for commit)
* [PIG-2663](https://issues.apache.org/jira/browse/PIG-2663) - Expose helpful ScriptState methods
* [PIG-2664](https://issues.apache.org/jira/browse/PIG-2664) - Allow PPNL impls to get more job info during the run
* [PIG-2525](https://issues.apache.org/jira/browse/PIG-2525) - Support pluggable PigProgressNotifcationListeners on the command line
* [PIG-2525](https://issues.apache.org/jira/browse/PIG-2525) - Support pluggable PigProgressNotifcationListeners on the command line
41 changes: 40 additions & 1 deletion pig/pom.xml
Expand Up @@ -27,6 +27,12 @@
</repositories>

<dependencies>
<!-- ambrose -->
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>ambrose-client</artifactId>
</dependency>

<!-- testing -->
<dependency>
<groupId>junit</groupId>
Expand All @@ -43,12 +49,22 @@
<artifactId>slf4j-simple</artifactId>
</dependency>

<!-- pig runtime -->
<!-- pig -->
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
</dependency>

<!-- web -->
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</dependency>
<dependency>
<groupId>org.eclipse.jetty.aggregate</groupId>
<artifactId>jetty-webapp</artifactId>
</dependency>

<!-- hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
Expand All @@ -61,4 +77,27 @@
<version>1.0</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<descriptors>
<descriptor>src/main/assembly/bin.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
64 changes: 64 additions & 0 deletions pig/src/main/assembly/bin.xml
@@ -0,0 +1,64 @@
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id>bin</id>
<formats>
<format>tar.gz</format>
</formats>

<fileSets>
<!-- scripts -->
<fileSet>
<directory>src/main/scripts</directory>
<outputDirectory>bin</outputDirectory>
<fileMode>0755</fileMode>
<includes>
<include>*</include>
</includes>
</fileSet>

<!-- demo -->
<fileSet>
<directory>src/main/demo</directory>
<outputDirectory>demo</outputDirectory>
<includes>
<include>*</include>
</includes>
</fileSet>

<!-- documentation from parent -->
<fileSet>
<directory>${project.basedir}/..</directory>
<outputDirectory>/</outputDirectory>
<includes>
<include>README*</include>
<include>LICENSE*</include>
<include>NOTICE*</include>
<include>docs/**/*</include>
</includes>
</fileSet>
</fileSets>

<files>
<!-- documentation -->
<file>
<source>README.md</source>
<outputDirectory>/</outputDirectory>
<destName>README-PIG.md</destName>
</file>
</files>

<!-- jars -->
<dependencySets>
<dependencySet>
<outputDirectory>lib</outputDirectory>
<includes>
<include>*:ambrose-client</include>
<include>*:ambrose-pig</include>
<include>org.apache.pig:pig</include>
<include>org.eclipse.jetty.aggregate:jetty-webapp</include>
</includes>
<useTransitiveDependencies>false</useTransitiveDependencies>
</dependencySet>
</dependencySets>
</assembly>
6 changes: 6 additions & 0 deletions pig/src/main/demo/data.tsv
@@ -0,0 +1,6 @@
1 John {(2),(3)} {(4)}
2 Mary {(1),(3)} {}
3 Mike {(1),(2),(4)} {}
4 Jane {(3),(4)} {(1)}
5 Andy {} {(5)}
6 Unicorn Sparklepants {(1),(2),(3),(4),(5)} {}
49 changes: 49 additions & 0 deletions pig/src/main/demo/demo.pig
@@ -0,0 +1,49 @@
/*
Demo pig script for use with pig-ambrose. Reads some data and does some stuff.
*/

%default BASEDIR `pwd`;
%default INPUT_PATH 'file://$BASEDIR/demo';
%default OUTPUT_PATH 'file://$BASEDIR/demo/output';

user = LOAD '$INPUT_PATH/data.tsv' AS (
user_id: long, name: chararray, friends: {(user_id: long)}, enemies: {(user_id: long)}
);

user_counts = FOREACH user GENERATE
user_id, COUNT(friends) AS num_friends, COUNT(enemies) AS num_enemies;

friends_users_histogram = FOREACH (GROUP user_counts BY num_friends) GENERATE
group AS friends, COUNT(user_counts) AS users;

enemies_users_histogram = FOREACH (GROUP user_counts BY num_enemies) GENERATE
group AS enemies, COUNT(user_counts) AS users;

narcissists = FOREACH user GENERATE user_id, name, FLATTEN(friends) AS (friend_id);
narcissists = FILTER narcissists BY user_id == friend_id;
narcissists = FOREACH narcissists GENERATE user_id, name;

masochists = FOREACH user GENERATE user_id, name, FLATTEN(enemies) AS (enemy_id);
masochists = FILTER masochists BY user_id == enemy_id;
masochists = FOREACH masochists GENERATE user_id, name;

user_enemy = FOREACH user GENERATE user_id, name, FLATTEN(enemies) AS (enemy_id);
user_enemy2 = FOREACH user_enemy GENERATE user_id, enemy_id;
user_enemy_enemy = FOREACH (
JOIN user_enemy BY enemy_id, user_enemy2 BY user_id
) GENERATE user_enemy::user_id AS user_id, user_enemy::name AS name,
user_enemy2::enemy_id AS enemy_enemy_id;
user_enemies_of_enemies = FOREACH (GROUP user_enemy_enemy BY user_id) {
name = LIMIT user_enemy_enemy.name 1;
enemies_of_enemies = DISTINCT user_enemy_enemy.enemy_enemy_id;
GENERATE group AS user_id, FLATTEN(name), enemies_of_enemies AS enemies_of_enemies;
}

rmf $OUTPUT_PATH
STORE friends_users_histogram INTO '$OUTPUT_PATH/friends_users_hist';
STORE enemies_users_histogram INTO '$OUTPUT_PATH/enemies_users_hist';
STORE narcissists INTO '$OUTPUT_PATH/narcissists';
STORE masochists INTO '$OUTPUT_PATH/masochists';
STORE user_enemies_of_enemies INTO '$OUTPUT_PATH/frenemies';
31 changes: 4 additions & 27 deletions bin/pig-ambrose → pig/src/main/scripts/pig-ambrose
Expand Up @@ -29,33 +29,10 @@ AMBROSE_HOME="${AMBROSE_HOME:-$(dirname "$0")/..}"
AMBROSE_PORT="${AMBROSE_PORT:-8080}"

# configure paths
LIB_DIR="$AMBROSE_HOME/lib"

# find pig and ambrose jars
if [ ! -f "$PIG_JAR" ]; then
PIG_JAR=$(ls "$LIB_DIR"/pig-*.jar) \
|| die "Failed to find pig jar in path '$LIB_DIR'"
fi
if [ ! -f "$AMBROSE_JAR" ]; then
AMBROSE_JAR=$(ls "$LIB_DIR"/ambrose-*.jar) \
|| die "Failed to find ambrose jar in path '$LIB_DIR'"
fi

# find extra dependancies in Hadoop's lib
if [ ! -d "$HADOOP_HOME" ]; then
HADOOP_HOME=$(ls -d /usr/lib/hadoop) \
|| die "Failed to find HADOOP_HOME"
fi

# Insert some magic into the classpath that pig uses
HADOOP_LIB_DIR="$HADOOP_HOME/lib"
JETTY_JAR=$(ls "$HADOOP_LIB_DIR"/jetty-6*.jar) \
|| die "Failed to find jetty 6 jar within '$HADOOP_LIB_DIR'"
JETTY_UTIL_JAR=$(ls "$HADOOP_LIB_DIR"/jetty-util-6*.jar) \
|| die "Failed to find jetty 6 util jar within '$HADOOP_LIB_DIR'"
SERVLET_API_JAR=$(ls "$HADOOP_LIB_DIR"/servlet-api-*.jar) \
|| die "Failed to find servlet api jar within '$HADOOP_LIB_DIR'"
export PIG_CLASSPATH="$PIG_JAR:$AMBROSE_JAR:$JETTY_JAR:$JETTY_UTIL_JAR:$SERVLET_API_JAR"
PIG_CLASSPATH=\
$(find "$AMBROSE_HOME/lib" -name '*.jar' -exec printf '%s:' '{}' '+')\
"$PIG_CLASSPATH"
export PIG_CLASSPATH="${PIG_CLASSPATH%%:}"

# configure the ambrose pig notification listener and port
export PIG_OPTS="\
Expand Down

0 comments on commit a65ce09

Please sign in to comment.