-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New feature: database API #6
Comments
Thank you so much for your awesome words. I love your idea but I really don't have enough time to do it. It is a huge works you should know. |
Hi @thieu1995 I'm glad to hear that you like the idea. I am willing to do this myself in a pull request. I would just need you to supervise it and give feedback on it. Would you please take the time to comment on each of these and give feedback, after which I will complete the database based on the set of functions that are already available in the opfunu library. The shape of the datastructureI think the current shape (a list of dictionaries) is a good start. It can easily be transformed into a pandas dataframe. Where to put this codeI have a few options for this one
The fields to include and how to represent each of themAre the fields I've added correct / useful? I added these fields based on what I found useful, but also based on what I could consistently add. Here I have listed them for you along with how I believe they ought to be represented (they are in alphabetical order)
The correctness of each fieldAlthough I can manually add these fields, or even write a webscraper to get many of these fields - both of those methods may result in errors. I don't know how to assert that they are correct (other than double checking), but I guess that's open-source for you. People can point out the mistakes and fix them. |
Oh after I re-read all of what you have written here. I remember one I've had thought about writing documents for opfunu, with all of the properties as you listed above. I tried with readthedocs, because it will reduce the time to write the documents. But I failed like 2 or 3 times. You can actually see its document here: I did use comment format as required but it was't showing the document on the website, so I gave up. But now I know what caused the problem because I've successfuly built another library with a completed documents You can use search to find anything you want in there, any algorithm or tag or properties. So what do you think? Instead of writing such a field to opfunu, we can just re-update the comment and fix the bug with readthedocs. I still don't know what can we do with pandas functionalities when adding a module db.py to each module (type_based, multi_model,...)? Because if users want to know about function characteriestics, they can search on the doc's website. However, opfunu is still missing an important functionality which is drawing. I also tried it so long ago but was not successful with 3D plotting. Now I found a really good repository where the author implements the drawing functions and codes in a very clear way. And another question, I would like to ask your suggestion. Currently, there are 2 types of programming in opfunu (functional and OOP-class). |
Greetings @thieu1995 . I appreciate this conversation and hope that it will benefit the repository. Docs vs the DatabaseI agree with your point on adding the details to the documentation, but I believe having it in physical code is also important. What it boils down to is being able to programmatically filter benchmark functions based on attributes, running simulations of each benchmark (with its own meta-parameters) and exporting results - all in one pipeline. Having all of the details - including the physical implementation - of each method will allow the users to programmatically run experiments and draw conclusions (something that I wish I had when working on my projects and writing papers). But I think we should have both the database and the docs. Even better, we can have the database and the docs can be populated from it (thus we'd only need to update the database and the docs would automatically be updated). 3D plottingI was actually thinking of adding 3D plotting to opfunu next. I made lots of plots in my previous projects for my course on Computational Intelligence: It was some simple matplotlib code, but it is rather specific so having it be built-in for the user's convenience would be good. As for the library https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/main/pybenchfunction/function.py, I agree, it does seem very useful. We can even incorporate the code for 3D plotting into opfunu (or invite them to add it). I also think that the details that @AxelThevenot added to each method will be instrumental in speeding up adding new fields to the database and asserting its correctness. Functional vs OOPI would love to contribute my suggestion, but unfortunately, I don't have enough information. Could you perhaps post examples comparing the two? |
Docs vs DatabaseI get it now and agree with you. I'm not sure how to do it with one pipeline, but I think with your imagination we can do it. 3D plotting.I think I can spend time to re-structure opfunu as that guy did in his repository. So any function can pull out its 2D or 3D figures. Functional vs OOPYou can see it in the readme.md file, I give an example of how to call function or class in opfunu. For example, the CEC-2014 module
Anyway, it is just a way to structure the code and the way to call the function out. Where to put this codeI think we can start with option 3. The fields to include and how to represent each of them
|
Docs vs DatabaseI'm glad we can agree on this. I'll implement the database first and then we can look at automatically generating the docs from there. 3D plottingI agree with using his 3d plotting, but I am sceptical about restructuring the package to use OOP (see next point) Functional vs OOPThe implementation of my database approach essentially creates a dictionary for each benchmark function where the dictionary contains "metadata" of the benchmark along with the actual benchmark python implementation which can simply be called using the If you wish to take the OOP approach, then my database implementation will introduce redundancy since the values in each dictionary (such as the latex formula and attributes such as class Adjiman:
name = 'Adjiman'
latex_formula = r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}' the database implementation would be data = [
dict(
name='Adjiman',
latex_formula=r'f(x, y)=cos(x)sin(y) - \frac{x}{y^2+1}',
),
...
] Thus, both would have the fields If you wish to use OOP for each benchmark, we can convert the database to loading the classes in memory and calling the metadata that python reveals for us such as >>> # retreiving from the class itself
... a.__dict__
mappingproxy({'__module__': '__main__',
'name': 'Adjiman',
'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
'__dict__': <attribute '__dict__' of 'Adjiman' objects>,
'__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
'__doc__': None})
>>> # retrieving from an instance
... a = Adjiman()
... a.__class__.__dict
mappingproxy({'__module__': '__main__',
'name': 'Adjiman',
'latex_formula': 'f(x, y)=cos(x)sin(y) - \\frac{x}{y^2+1}',
'__dict__': <attribute '__dict__' of 'Adjiman' objects>,
'__weakref__': <attribute '__weakref__' of 'Adjiman' objects>,
'__doc__': None}) of course, this approach is ugly as it also includes things such as In this approach of using OOP, it would be desirable for the classes to inherit from a base class (e.g. Continuous Development plan if we take the OOP approachTo conclude, I am open to the idea of using OOP. Personally, it doesn't matter to me. I do think OOP does give us more control over customizing a benchmark function, so perhaps we should go for it. We'd have to take a continuous development approach:
|
Ah, I see. Then I think we should keep it as functional style. And now because we are using database, I think we don't need to split benchmark functions as multiple module as I did (type_based, or dimension_based). I think we can group them into 1 single module. |
@thieu1995 , I updated my message and I believe I finished the edit after you already started formulating a response. I just want to confirm, is your message made with respect with the latest version of my message that contains the "Continuous Development plan if we take the OOP approach" section? |
To be clear, I don't know whether functional approach or OOP is better. The easiest would be to just stick to opfunu's current functional approach and then add the database. The question is whether OOP offers a benefit. Does OOP give us some desired control over benchmark methods? Such as say parameterized versions of benchmarks? |
Lol, I only read the above part that you wrote. So that is the point of my suggestion, because the repo above, he already implemented as OOP style and can search the functions with some properties as you wish with the database.
Continuous Development plan if we take the OOP approachRight now, I only consider the non-parameterized benchmark functions. But we may think to create a new module for parameterized functions in the future. My suggestionYes, we should stick to the current functional style. But I still want re-name the function as a public function. For example: Also, can you try the database with type_based and dimension_based modules first? Leave the cec for later. I want to see how the database works with them first before moving to cec functions. |
Continuous Development plan if we take the OOP approachI agree. Let's only consider parameterized benchmarks for a future OOP overhaul. My suggestionI'll leave the name change to you for later. Right now I will write my suggested database approach with references to benchmarks in |
What about my above question? Do you suggest grouping type_based and dimension_based into 1 module? Because like I said, there are several functions have been duplicated in both modules. And please create a new branch when you want to push something new. The branch name should be "dev/feature_name" or "dev/your_name", I don't mind. |
@thieu1995, sorry for missing your question. I think we can combine them, yes. Should I do it the PR or leave it for later? I was thinking for later. Understood, will name the branch accordingly. |
I guess it depends on you. Do you want to create the database first and then group them or do you want to group them into a single module first and then design the database? Besides, to not waste your time, you should try to create the database for some functions only and then test the pipeline or whatever you want first. If it works as you expected then you can apply for the rest of the functions. |
We want to group the functions in any case, so let's group them in a separate PR (or you can do it yourself). If we group them in this PR and end up scrapping this PR, then we have to group them together again or do some commit picking black magic to extract the grouping part of the PR. I agree. I will make the database for the two files we discussed and then make some notebooks showing off use cases to ensure that they work in the way I believe the user would desire. |
Yeah, then let's leave it for later and for another PR. |
I am populating the fields of each benchmark using the following criteria (in order)
QuestionIf I cannot find a tag such as
Looking at Thevenot's implementation of EggHolder, I see that it has convex set to False: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective/blob/91c37d9d0f1f3366064004fdb3dd23e5c2681712/pybenchfunction/function.py#L981. For now, I will assume the answer to this question is "yes". |
Yes, if not convex, you can tag it non-convex. We can change it later if it is convex. Just do what you think is good. |
Hello 👋 I saw you were speaking about refactoring your project like mine in some ways About the question on if EggHolder is convex or not, It is possible I made a mistake It was a hard work so maybe there are more than one mistake so do not take my parameters as they were perfect :) |
Hi @AxelThevenot , Yes, we spoke about refactoring the project. But now we decide to keep it with this current style. And test the new features first. |
Progress UpdateI've realised how much manual labour this is and have written a web scraper to get the data from
I've successfully crawled the data from those two websites. My next goal is to parse the data from the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns. After that, I will combine the cleanest combination of the data and then test the database in some notebooks where I run experiments to show how the database can be used. If some of the data disagree with each other, I will flag it here where you can advise. (.e.g. one claims that a method is convex while the other says it is non-convex). |
Yeah, I appreciate the heads-up! |
Progress UpdateBeen doing other work the last week, but yesterday I finished crawling the markdown files in https://github.com/mazhar-ansari-ardeh/BenchmarkFcns/tree/gh-pages/benchmarkfcns. I'm currently matching all the functions across the different sources. So the next step is to combine to find how I can best combine them (for instance, how do I decide which source's input domain to keep?). I'm making good progress. |
Here is a preview of the data I've collected so far. There is also a jupyter notebook in the same directory showcasing how I collected the data. Each item in the list is a dictionary where the keys are Some dictionaries don't contain data for things like Have a look when you get the chance. |
These are just data that overlap with each other from different sources, I still need to add the data that is from individual sources and then still map the data to benchmark functions that you've implemented. Then I need to find a way to concisely list them as a database. |
That is a really great job. But I think, you should test with a few functions first, then build a database, functionalities, and pipeline that you want. Don't spend too much time correcting each function's properties right now. When your database and functionalities that your design work as you expected, then we can come back and finish all other functions. |
You are 100% correct. I'm currently switching gears and building notebooks to show off what we can do. Here's the main things I want to showcase using the database:
Will keep you posted. |
Any new news on your progress? |
Hi @thieu1995 I haven't made any updates since my last comment - been busy with work and other hobby projects. But I've hit an obstacle with some of it, so I think it would be good to take my mind off of it and continue with my work here |
Thanks for letting me know. |
I really like this repository and used it in my Computational Intelligence course back in university. However, I wish that it also included a database for each benchmark method.
This would be really useful if someone would like to use and compare benchmark methods, but want to know how to draw meaningful conclusions from their tests. For instance, knowing what tags a benchmark method has (e.g. continuous vs discontinuous, non-convex vs convex etc) and knowing the dimension of the benchmark on hand would speed up the process of say concluding whether an algorithm performs better in lower dimensions, on convex landscapes etc.
Would you be interested if I attempted to add such a thing? This can either be by csv or by physically adding a list of dictionaries.
Here is a preview of what I suggest:
api.py:
whereafter one would use the
data
list of dictionaries to build a dataframe for instance, usingThe text was updated successfully, but these errors were encountered: