Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data from wind-turbine-models.com #120

Open
maurerle opened this issue Nov 26, 2021 · 1 comment
Open

Add Data from wind-turbine-models.com #120

maurerle opened this issue Nov 26, 2021 · 1 comment

Comments

@maurerle
Copy link
Contributor

maurerle commented Nov 26, 2021

Hello everybody,

I wrote a small script to parse the data from wind-turbine-models.com.
As the Turbine-data already contains records from wind-turbine-models.com (https://github.com/wind-python/windpowerlib/blob/master/windpowerlib/oedb/turbine_data.csv) I hope that the legal issues discussed in OpenEnergyPlatform/data-preprocessing#28 (comment) are cleared and the data can be used.

It would be very good if the power curves could be additionally integrated into the OEP-Database.

The below code is available under the MIT License and free for anyone to use:

from bs4 import BeautifulSoup # parse html
import requests
import json5 # parse js-dict to python
import json
import pandas as pd
from tqdm import tqdm # fancy for loop

# create list of turbines with available powercurves
page = requests.get('https://www.wind-turbine-models.com/powercurves')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='chosen-select')

wind_turbines_with_curve = []
for i in name_list.find_all('option'):    
    wind_turbines_with_curve.append(i.get('value'))

def downloadTurbineCurve(turbine_id,start = 0, stop=25):
    url = "https://www.wind-turbine-models.com/powercurves"
    headers = dict()
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    data = {'_action': 'compare', 'turbines[]': turbine_id, 'windrange[]': [start, stop]}

    resp = requests.post(url, headers=headers, data=data)
    strings = resp.json()['result']
    begin = strings.find('data:')
    end = strings.find('"}]', begin)
    relevant_js = '{'+strings[begin:end+3]+'}}'
    curve_as_dict = json5.loads(relevant_js)
    x = curve_as_dict['data']['labels']
    y = curve_as_dict['data']['datasets'][0]['data']
    label = curve_as_dict['data']['datasets'][0]['label']
    url = curve_as_dict['data']['datasets'][0]['url']
    df = pd.DataFrame(y, index=x, columns=[label])
    df.index.name = 'wind_speed'
    return df

curves = []
for turbine_id in tqdm(wind_turbines_with_curve):
    curve = downloadTurbineCurve(turbine_id)
    curves.append(curve)
c = pd.concat(curves,axis=1)
d = c[c.any(axis=1)]
    
with open('down.csv','w') as f: 
    d.to_csv(f)
@Ludee
Copy link
Collaborator

Ludee commented Dec 14, 2021

Dear @maurerle,
thanks for pushing this collaborative database.
From my point of view, the scraping and collecting of the website is a grey area.
It can be considered legal, if only a part is beeing collected.

Im Regelfall ist Web Scraping für die empirische Forschung rechtlich zulässig. Die Nutzungsbedingungen, die häufig verwendet werden, ändern daran nichts. Anders sieht es mit technischen Sperren aus, die nicht umgangen werden dürfen.
Wer sicher gehen will, kann den Hersteller der Datenbank um Erlaubnis fragen und sich diese – am besten in Textform (zum Beispiel per E-Mail) – geben lassen. In Zweifelsfällen beraten die Rechtsabteilungen der Forschungseinrichtungen.

Grenzen des "Web Scrapings" - https://www.forschung-und-lehre.de/recht/grenzen-des-web-scrapings-2421/

But the publication under an open license is definitely not possible.
I contacted the website owners some time ago but there was no interest in collaboration.

This is why I started to gather the original sources and start a new open database under an appropriate open data license!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants