Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerfEnforce on Myria #868

Merged
merged 24 commits into from
Mar 7, 2017
Merged

PerfEnforce on Myria #868

merged 24 commits into from
Mar 7, 2017

Conversation

jortiz16
Copy link
Contributor

This PR adds the PerfEnforce research component to Myria.

I tried to keep PerfEnforce as self-contained as possible, but there are a couple features I did have to add to Myria in order to get it to work:

  • Ability to create materialized views
  • Ability to execute a SQL command directly on Postgres (uses the DbExecute operator)
  • Ability to execute a SQL command and return the results in an array
  • For TupleSink, I added the option to remove the column header

The entry point for the PerfEnforce code is through the PerfEnforceDriver class. This class focuses on two steps:

  1. Ingesting data, collecting statistics and generating the PSLA. This is all enclosed in the PerfEnforceDataPreparation class.
  2. The learning model that will schedule the incoming queries at different cluster sizes in order to meet the SLAs. This can be seen in the PerfEnforceOnlineLearning class.

To actually see this work, the user will need to specify the "PerfEnforce" flag through myria-cluster, which will create a 12-node cluster. The web interface will be sending calls to the PerfEnforce API found in the PerfEnforceResource class.


public PerfEnforceStatisticsEncoding(
final String table_name, final long table_size, final List<String> selectivityList) {
this(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just merge these two constructors since the second one is never called except by the first one.

*/
@JsonCreator
public PerfEnforceTableEncoding(
@JsonProperty(value = "relationKey", required = true) final RelationKey relationKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason of using @JsonProperty here but not in PerfEnforceStatisticsEncoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. I just didn't use the JSON encoding for PerfEnforceStatisticsEncoding. I will add it to keep it all consistent.

final Operator child,
final TupleWriter tupleWriter,
final DataSink dataSink,
boolean includeColumnHeader) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final boolean

currentWorkerFeatures =
currentWorkerFeatures.substring(currentWorkerFeatures.indexOf("cost"));
currentWorkerFeatures = currentWorkerFeatures.replace("\"", " ");
String[] cmd = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by String.replace() to save launching a process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is better

* @param keys the set of columns to concatenate
* @param schema the schema to create
*/
public static Schema getAttributeKeySchema(final Set<Integer> keys, final Schema schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return schema.getSubSchema(Ints.toArray(keys));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know this existed! So much nicer now

* @param keys the set of columns to concatenate
* @param schema the schema of the relation
*/
public static String getAttributeKeyString(final Set<Integer> keys, final Schema schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joiner.on(",").join(getAttributeKeySchema(keys, schema).getColumnNames())

String explainQuery = "EXPLAIN " + sqlQuery;
List<String> featuresList = new ArrayList<String>();
String maxFeature = "";
double maxCost = Integer.MIN_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check: is Integer.MIN_VALUE small enough? (Is cost always positive?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that the minimum cost for query execution from Postgres is 0.00. I can't find exact document to back this up, but I can't imagine there would be a query cost less than 0. In addition, I'm also only running SPJ queries. I'm not running indexes, stored procedures, or anything outside the ordinary :)

if (queryRuntime != 0) {
String[] parts = result.split(",");
result =
parts[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Joiner would save some space.

Process p = Runtime.getRuntime().exec(arrayCommand);

BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
while ((reader.readLine()) != null) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended? If you only want to launch a process I believe there exist ways to discard its output. (There is another place above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I ran into an issue where the process would just hang even when I tried to flush the error stream. I ended up following suggestions from http://stackoverflow.com/questions/5483830/process-waitfor-never-returns.

while ((currentLine = streamReader.readLine()) != null) {
nextQueryPrediction = Double.parseDouble((currentLine.split(",")[0]).split(":")[1]);
}
streamReader.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in finally is safer. (Also there are several places with the similar issue.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try-with-resources, not finally :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently I need to update my knowledge on it :)

@jingjingwang
Copy link
Contributor

@jortiz16 I made a pass, although I have to admit that I didn't run any test :) But if it works on your side then I believe it's okay. BTW this branch has conflicts to be resolved.

@jingjingwang
Copy link
Contributor

Also just curious: is it going to run in Windows?

@jingjingwang
Copy link
Contributor

Another thing is: many Javadocs are missing. If you use Eclipse, I think it checks them automatically if the option is turned on.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 26.497% when pulling 756bb12 on perfenforce-merge into 6c56e3e on master.

@jortiz16
Copy link
Contributor Author

I had to do a rebase, but the only new commits are the last two ("addressing comments" and "modify based on rebase"). I still need to test to make sure that the new changes did not introduce bugs. For example, I had to change the PerfEnforceDataPreparation because I wasn't using the new BroadcastDistributeFunction.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 26.457% when pulling a70ed21 on perfenforce-merge into d282c44 on master.

@jortiz16
Copy link
Contributor Author

Ok ready for another pass. I tested these changes on my own cluster deployment. Thanks!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 26.44% when pulling 0eced3c on perfenforce-merge into d282c44 on master.

@jortiz16
Copy link
Contributor Author

Hey @jingjingwang would you happen to know why the coveralls test fails by any chance? I looked at some of the classes they reference, but it did not seem relevant to the changes I made...

Copy link
Contributor

@jingjingwang jingjingwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only nitpicking on HTTP methods.

public Response prepareData(final List<PerfEnforceTableEncoding> tableList)
throws PerfEnforceException, DbException, Exception {
server.getPerfEnforceDriver().preparePSLA(tableList);
return Response.noContent().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this maps to HTTP return code 204. I'm not sure if 201 is more appropriate, if this POST creates any resource then 201 sounds better.

*/
@POST
@Path("/setTier")
@Consumes(MediaType.MULTIPART_FORM_DATA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using MULTIPART_FORM_DATA usually means that the input form contains some files, which seems to be not your case. (Same in other places)

@Path("/findSLA")
@Consumes(MediaType.MULTIPART_FORM_DATA)
public Response findSLA(@FormDataParam("querySQL") final String querySQL) throws Exception {
server.getPerfEnforceDriver().findSLA(querySQL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only getting some data but not modifying any server state, then GET is more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding, but I have this as a POST since /findSLA requires data to be submitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.4%) to 25.656% when pulling f5984fc on perfenforce-merge into d282c44 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.4%) to 25.697% when pulling f5984fc on perfenforce-merge into d282c44 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.4%) to 25.682% when pulling e4d22a8 on perfenforce-merge into d282c44 on master.

@jingjingwang
Copy link
Contributor

LGTM!

@jingjingwang jingjingwang reopened this Mar 7, 2017
@jingjingwang jingjingwang merged commit b413944 into master Mar 7, 2017
@jingjingwang jingjingwang deleted the perfenforce-merge branch March 7, 2017 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants