Skip to content

Commit

Permalink
Initial Commit
Browse files Browse the repository at this point in the history
  • Loading branch information
tlhunter committed May 3, 2012
0 parents commit 533c2cb
Show file tree
Hide file tree
Showing 13 changed files with 1,055 additions and 0 deletions.
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
Eve Crawl
===

This is a project I started working on and abandoned in 2009. It is a spider which
was specifically built for crawling websites which contained EVE Kill Mail's. In the
game of EVE, every time you kill a player, an in-game 'mail' is sent to you containing
information. Players would copy this informaiton and paste it into a website, which
would show statistics about the kill.

The purpose of this project was to take information about all of the kills and display
them on an interactive 2D heatmap. There would also be a slider which represented time,
and a user could slide this slider. The purpose of the project was to show which areas
in space were the most dangerous.

Unfortunately, I stopped playing EVE and abandoned the project. The crawler, which
worked in 2009, is likely no longer useful. It would download pages and run regular
expressions to find data, as most 'kill mail' websites used the same software. But, I'm
sure the software has changed by now.

File Information
==
cache.txt ID of the last crawled page
ccp_map_data.zip Solar system coordinate data; released by CCP
crawler.php Main crawler
downloader.php Page downloading class
eve-map-bg.jpg Background graphic for map
eve-map.xml.php PHP script for generating XML data for flash
evecrawler.sql.7z 189MB database file of a bunch of data
map.fla Raw map building flash file
map.swf Compiled map flash file
mysql.ssi.php Configuration file for database settings
rendered.xml Example of the rendered XML data
sample-crawl.htm An example of the kill mail pages circa 2009

What Works
==

I honestly don't remember what the state of the project is. I do know that the crawler
works just fine; I was able to grab a couple million records. The flash map stuff is
probably all broken and whatnot. Back in 2009 we didn't have any fancy canvas rendering,
a better developer would replace the flash with canvas.

The crawler wasn't as efficient as it could be. Most of the slowness would be due to
network latency, so paralell downloads should be used.

None of the interface development for clicking regions, doing heat maps, or scrolling
through time was ever implemented.

License
==

Released under the BSD License.
1 change: 1 addition & 0 deletions cache.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2000000
Binary file added ccp_map_data.zip
Binary file not shown.
182 changes: 182 additions & 0 deletions crawler.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
<?php
include("mysql.ssi.php");
#Naming conventions: http://support.eve-online.com/Pages/KB/Article.aspx?id=37
#Sample page format: http://www.eve-kill.net/?a=kill_detail&kll_id=1000000

$first_item = file_get_contents("cache.txt");
$last_item = 3300000;

for ($i = $first_item; $i <= $last_item; $i++) {
eve_crawl_url("http://www.eve-kill.net/?a=kill_detail&kll_id=$i", $i);
}


function eve_crawl_url($url, $iteration) {
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); #pretend we're IE
curl_setopt ($ch, CURLOPT_TIMEOUT, 20);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec ($ch);

if (strpos($page, "That kill doesn't exist.")) {
echo "ERROR: Kill Doesn't Exist.<br />\n";
write_cache($iteration);
return false;
}

$location_format = "#system_detail&amp;sys_id=([0-9]+)\">([a-zA-Z0-9- ']+)</a></b>#";
preg_match($location_format, $page, $matches);
$location = $matches[2];
if (empty($location)) {
echo "ERROR: #$iteration, LOCATION.<br />\n";
write_cache($iteration);
return false;
}
$location_id = item_to_id($location, "wt_systems");

$date_format = "#<td class=kb-table-cell>([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2})</td>#";
preg_match($date_format, $page, $matches);
$date = $matches[1];
if (empty($date)) {
echo "ERROR: #$iteration, DATE.<br />\n";
write_cache($iteration);
return false;
}


$loss_isk_format = "#<td class=kb-table-cell>([0-9,]+\.[0-9]{2})</td>#";
preg_match($loss_isk_format, $page, $matches);
$loss_isk = round(ereg_replace("[^0-9.]", "", $matches[1]));
if (empty($loss_isk)) {
echo "ERROR: #$iteration, LOSS ISK. Continuing...<br />\n";
write_cache($iteration);
$loss_isk = 0;
}

$victim_format = '#<td class=kb-table-cell><b><a href="\?a=pilot_detail&plt_id=([0-9]+)">([a-zA-Z0-9- \']+)</a></b></td>#';
preg_match($victim_format, $page, $matches);
$victim_name = $matches[2];
if (empty($victim_name)) {
echo "ERROR: #$iteration, VICTIM NAME.<br />\n";
write_cache($iteration);
return false;
}
$victim_id = item_to_id($victim_name, "wt_player");
$victim_corp_format = '#<td class=kb-table-cell><b><a href="\?a=corp_detail&crp_id=([0-9]+)">([a-zA-Z0-9 \'\-.]+)</a></b></td>#';
preg_match($victim_corp_format, $page, $matches);
$victim_corp_name = $matches[2];
if (empty($victim_corp_name)) {
echo "ERROR: #$iteration, VICTIM CORP NAME.<br />\n";
write_cache($iteration);
return false;
}
$victim_corp_id = item_to_id($victim_corp_name, "wt_corporation");

$victim_alliance_format = '#<b><a href="\?a=alliance_detail&all_id=[0-9]+">([a-zA-Z0-9 \'\-.]+)[</a>]*</b></td>#';
preg_match($victim_alliance_format, $page, $matches);
$victim_alliance_name = $matches[1];
if (empty($victim_alliance_name)) {
echo "ERROR: #$iteration, VICTIM ALLIANCE NAME.<br />\n";
write_cache($iteration);
return false;
}
$victim_alliance_id = item_to_id($victim_alliance_name, "wt_alliance");
$victim_ship_format = '#<td class=kb-table-cell><b><a href="\?a=invtype&id=([0-9]+)">([a-zA-Z0-9 \'\-.]+)</a></b></td>#';
preg_match($victim_ship_format, $page, $matches);
$victim_ship_name = $matches[2];
if (empty($victim_ship_name)) {
echo "ERROR: #$iteration, VICTIM SHIP NAME.<br />\n";
write_cache($iteration);
return false;
}
$victim_ship_id = item_to_id($victim_ship_name, "wt_ships");

$killer_name_format = '#<a href="\?a=pilot_detail&plt_id=[0-9]+"><b>([0-9a-zA-Z \']+)[ \(Final Blow\)]*</b></a></td>#';
preg_match_all($killer_name_format, $page, $matches);
#print_r($matches[1]);
$killer_names = $matches[1];
$killer_corp_format = '#<a href="\?a=corp_detail&crp_id=[0-9]+">([0-9a-zA-Z -.\']+)</a></td>#';
preg_match_all($killer_corp_format, $page, $matches);
#print_r($matches[1]);
$killer_corps = $matches[1];

$killer_ship_format = '#1px;"><b><a href="\?a=invtype&id=[0-9]+">([0-9a-zA-Z ]+)</a></b></td>#';
preg_match_all($killer_ship_format, $page, $matches);
#print_r($matches[1]);
$killer_ships = $matches[1];

$killer_alliance_format = '#style="padding-top: 1px; padding-bottom: 1px;"><a href="\?a=alliance_detail&all_id=[0-9]+">([0-9a-zA-Z-. \']+)</a></td>#';
preg_match_all($killer_alliance_format, $page, $matches);
#print_r($matches[1]);
$killer_alliances = $matches[1];
$size_killer_names = sizeof($killer_names);
$size_killer_corps = sizeof($killer_corps);
$size_killer_ships = sizeof($killer_ships);
$size_killer_alliances = sizeof($killer_alliances);
if ($size_killer_names != $size_killer_corps || $size_killer_ships != $size_killer_alliances || $size_killer_names != $size_killer_ships) {
echo "ERROR: #$iteration, KILLER ARRAY MISMATCH [n:$size_killer_names, c:$size_killer_corps, s:$size_killer_ships, a:$size_killer_alliances].<br />\n";
write_cache($iteration);
return false;
}
$victim_fit_format = '# <td class="kb-table-cell">([0-9a-zA-Z -\'/.]+)</td>#';
preg_match_all($victim_fit_format, $page, $matches);
#print_r($matches[1]);
$victim_fits = $matches[1];

runQuery("INSERT INTO wt_player_sighting SET player_id = '$victim_id', corp_id = '$victim_corp_id', alliance_id = '$victim_alliance_id', ship_id = '$victim_ship_id', time='$date', system_id = '$location_id'");
$player_sighting_id = mysql_insert_id();
runQuery("INSERT INTO wt_ship_loss SET cost = '$loss_isk', player_sighting_id = '$player_sighting_id'");
$ship_loss_id = mysql_insert_id();

for ($i = 0; $i < sizeof($killer_names); $i++) {
$killer_player_id = item_to_id($killer_names[$i], "wt_player");
$killer_corp_id = item_to_id($killer_corps[$i], "wt_corporation");
$killer_ship_id = item_to_id($killer_ships[$i], "wt_ships");
$killer_alliance_id = item_to_id($killer_alliances[$i], "wt_alliance");
runQuery("INSERT INTO wt_player_sighting SET player_id = '$killer_player_id', corp_id = '$killer_corp_id', alliance_id = '$killer_alliance_id', ship_id = '$killer_ship_id', time='$date', system_id = '$location_id'");
$killer_sighting_id = mysql_insert_id();
runQuery("INSERT INTO wt_ship_kill SET player_instance_id = '$killer_sighting_id', ship_loss_id = '$ship_loss_id'");
}

for($i = 0; $i < sizeof($victim_fits); $i++) {
$item_id = item_to_id($victim_fits[$i], "wt_items");
runQuery("INSERT INTO wt_item_to_ship_loss SET ship_loss_id = '$ship_loss_id', item_id = '$item_id'");
}
write_cache($iteration);

set_time_limit(10);

return true;
}

function item_to_id($value, $table, $column = 'name') {
# We are given a string to add, and the table to add it to.
# If the value already exists, we return it's ID.
# Otherwise, we make it and return its ID.
$value = addslashes($value);
$sql = "SELECT id FROM $table WHERE $column = '$value' LIMIT 1";
$result = runQuery($sql);
if (mysql_num_rows($result)) {
$row = mysql_fetch_assoc($result);
return $row['id'];
} else {
$sql = "INSERT INTO $table SET $column = '$value'";
runQuery($sql);
return mysql_insert_id();
}
}

function write_cache($iteration) {
$fp = fopen("cache.txt", 'w');
fwrite($fp, $iteration);
fclose($fp);
}

1 change: 1 addition & 0 deletions downloader.php
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<style>* { font-size: 10px; font-family: verdana;}</style><?php$base_url = "http://www.eve-kill.net/?a=kill_detail&kll_id=";$folder = "./cache/" . getDomain($base_url) . "/";$ch = curl_init();for ($i = 1; $i <= 3237300; $i++) { $url = $base_url . $i; echo "$url<br />\n"; curl_setopt ($ch, CURLOPT_URL,$url); curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); #pretend we're IE curl_setopt ($ch, CURLOPT_TIMEOUT, 20); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); $page = curl_exec ($ch); echo strlen($page) . " bytes<br />\n"; $filename = "$folder$i.htm"; echo "$filename<br />\n"; $fp = fopen($filename, 'w'); fwrite($fp, $page); fclose($fp); echo "<br />\n"; if ($i > 100) exit();}function getDomain($url) { if(filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED) === FALSE) { return false; } /*** get the url parts ***/ $parts = parse_url($url); /*** return the host domain ***/ return str_replace("www.", "", $parts['host']);}
Expand Down
Binary file added eve-map-bg.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions eve-map.xml.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
<?php
define("WIDTH", 800);
define("HEIGHT", 600);
define("MARGIN", 20);
$region = 'Curse';

include("mysql.ssi.php");

$sql = "SELECT id, name, x/1e17 AS x, z/1e17 AS y
FROM wt_systems_map
WHERE region_id = (
SELECT id
FROM `wt_regions`
WHERE name = '$region')";

$result = runQuery($sql);
$mm['max_x'] = -10;
$mm['max_y'] = -10;
$mm['min_x'] = 10;
$mm['min_y'] = 10;
$rows = array();

while ($row = mysql_fetch_assoc($result)) {
if ($row['x'] > $mm['max_x'])
$mm['max_x'] = $row['x'];
if ($row['y'] > $mm['max_y'])
$mm['max_y'] = $row['y'];
if ($row['x'] < $mm['min_x'])
$mm['min_x'] = $row['x'];
if ($row['y'] < $mm['min_y'])
$mm['min_y'] = $row['y'];
$rows[] = $row;
}

$effective_width = WIDTH - MARGIN * 2;
$effective_height = HEIGHT - MARGIN * 2;

echo "<systems count='" . count($rows) . "'>\n";
foreach($rows AS $system) {
$system['x'] = ($system['x'] + -$mm['min_x']) / (-$mm['min_x'] + $mm['max_x']);
$system['x'] = round($system['x'] * $effective_width + MARGIN); # may want to unload this to Flash eventually
$system['y'] = ($system['y'] + -$mm['min_y']) / (-$mm['min_y'] + $mm['max_y']);
$system['y'] = round($system['y'] * $effective_height + MARGIN); # may want to unload this to Flash eventually
echo "\t<system systemId='{$system['id']}' systemName='{$system['name']}' xPixel='{$system['x']}' yPixel='{$system['y']}' />\n";
}
echo "</systems>\n";

$sql = "SELECT from_system_id AS sFrom, to_system_id AS sTo FROM wt_systems_jumps WHERE from_region_id = (
SELECT id
FROM `wt_regions`
WHERE name = '$region' )";
$result = runQuery($sql);
echo "<connections count='" . mysql_num_rows($result) . "'>\n";
while ($row = mysql_fetch_assoc($result)) {
echo "\t<connection fromSystemId='{$row['sFrom']}' toSystemId='{$row['sTo']}' />\n";
}
echo "</connections>\n";

#print_r($rows);
Binary file added evecrawler.sql.7z
Binary file not shown.
Binary file added map.fla
Binary file not shown.
Binary file added map.swf
Binary file not shown.
14 changes: 14 additions & 0 deletions mysql.ssi.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<?php
function runQuery($query) {
$mySqlServer = 'localhost';
$mySqlUser = 'root';
$mySqlPass = '';
$mySqlDB = 'evecrawler';
$connect = mysql_connect($mySqlServer, $mySqlUser, $mySqlPass);
if (!$connect) {
die("<div class=\"error\">" . mysql_error() . "</div>");
}
mysql_select_db($mySqlDB, $connect);
$result = mysql_query($query, $connect);
return $result;
}
Loading

0 comments on commit 533c2cb

Please sign in to comment.