-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PerfEnforce on Myria #868
PerfEnforce on Myria #868
Conversation
|
||
public PerfEnforceStatisticsEncoding( | ||
final String table_name, final long table_size, final List<String> selectivityList) { | ||
this( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just merge these two constructors since the second one is never called except by the first one.
*/ | ||
@JsonCreator | ||
public PerfEnforceTableEncoding( | ||
@JsonProperty(value = "relationKey", required = true) final RelationKey relationKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason of using @JsonProperty
here but not in PerfEnforceStatisticsEncoding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. I just didn't use the JSON encoding for PerfEnforceStatisticsEncoding
. I will add it to keep it all consistent.
final Operator child, | ||
final TupleWriter tupleWriter, | ||
final DataSink dataSink, | ||
boolean includeColumnHeader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final boolean
currentWorkerFeatures = | ||
currentWorkerFeatures.substring(currentWorkerFeatures.indexOf("cost")); | ||
currentWorkerFeatures = currentWorkerFeatures.replace("\"", " "); | ||
String[] cmd = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced by String.replace()
to save launching a process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is better
* @param keys the set of columns to concatenate | ||
* @param schema the schema to create | ||
*/ | ||
public static Schema getAttributeKeySchema(final Set<Integer> keys, final Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return schema.getSubSchema(Ints.toArray(keys));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this existed! So much nicer now
* @param keys the set of columns to concatenate | ||
* @param schema the schema of the relation | ||
*/ | ||
public static String getAttributeKeyString(final Set<Integer> keys, final Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joiner.on(",").join(getAttributeKeySchema(keys, schema).getColumnNames())
String explainQuery = "EXPLAIN " + sqlQuery; | ||
List<String> featuresList = new ArrayList<String>(); | ||
String maxFeature = ""; | ||
double maxCost = Integer.MIN_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check: is Integer.MIN_VALUE
small enough? (Is cost
always positive?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that the minimum cost for query execution from Postgres is 0.00. I can't find exact document to back this up, but I can't imagine there would be a query cost less than 0. In addition, I'm also only running SPJ queries. I'm not running indexes, stored procedures, or anything outside the ordinary :)
if (queryRuntime != 0) { | ||
String[] parts = result.split(","); | ||
result = | ||
parts[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Joiner would save some space.
Process p = Runtime.getRuntime().exec(arrayCommand); | ||
|
||
BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); | ||
while ((reader.readLine()) != null) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended? If you only want to launch a process I believe there exist ways to discard its output. (There is another place above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I ran into an issue where the process would just hang even when I tried to flush the error stream. I ended up following suggestions from http://stackoverflow.com/questions/5483830/process-waitfor-never-returns.
while ((currentLine = streamReader.readLine()) != null) { | ||
nextQueryPrediction = Double.parseDouble((currentLine.split(",")[0]).split(":")[1]); | ||
} | ||
streamReader.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it in finally
is safer. (Also there are several places with the similar issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try-with-resources, not finally
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I need to update my knowledge on it :)
@jortiz16 I made a pass, although I have to admit that I didn't run any test :) But if it works on your side then I believe it's okay. BTW this branch has conflicts to be resolved. |
Also just curious: is it going to run in Windows? |
Another thing is: many Javadocs are missing. If you use Eclipse, I think it checks them automatically if the option is turned on. |
0d07b7b
to
756bb12
Compare
I had to do a rebase, but the only new commits are the last two ("addressing comments" and "modify based on rebase"). I still need to test to make sure that the new changes did not introduce bugs. For example, I had to change the PerfEnforceDataPreparation because I wasn't using the new |
756bb12
to
a70ed21
Compare
Ok ready for another pass. I tested these changes on my own cluster deployment. Thanks! |
Hey @jingjingwang would you happen to know why the coveralls test fails by any chance? I looked at some of the classes they reference, but it did not seem relevant to the changes I made... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only nitpicking on HTTP methods.
public Response prepareData(final List<PerfEnforceTableEncoding> tableList) | ||
throws PerfEnforceException, DbException, Exception { | ||
server.getPerfEnforceDriver().preparePSLA(tableList); | ||
return Response.noContent().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this maps to HTTP return code 204. I'm not sure if 201 is more appropriate, if this POST creates any resource then 201 sounds better.
*/ | ||
@POST | ||
@Path("/setTier") | ||
@Consumes(MediaType.MULTIPART_FORM_DATA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using MULTIPART_FORM_DATA
usually means that the input form contains some files, which seems to be not your case. (Same in other places)
@Path("/findSLA") | ||
@Consumes(MediaType.MULTIPART_FORM_DATA) | ||
public Response findSLA(@FormDataParam("querySQL") final String querySQL) throws Exception { | ||
server.getPerfEnforceDriver().findSLA(querySQL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only getting some data but not modifying any server state, then GET
is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding, but I have this as a POST
since /findSLA
requires data to be submitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/api/QueryResource.java#L238 is an example of using GET with String parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
LGTM! |
This PR adds the PerfEnforce research component to Myria.
I tried to keep PerfEnforce as self-contained as possible, but there are a couple features I did have to add to Myria in order to get it to work:
The entry point for the PerfEnforce code is through the
PerfEnforceDriver
class. This class focuses on two steps:PerfEnforceDataPreparation
class.PerfEnforceOnlineLearning
class.To actually see this work, the user will need to specify the "PerfEnforce" flag through myria-cluster, which will create a 12-node cluster. The web interface will be sending calls to the PerfEnforce API found in the
PerfEnforceResource
class.