Add Data from wind-turbine-models.com #120

maurerle · 2021-11-26T15:50:11Z

Hello everybody,

I wrote a small script to parse the data from wind-turbine-models.com.
As the Turbine-data already contains records from wind-turbine-models.com (https://github.com/wind-python/windpowerlib/blob/master/windpowerlib/oedb/turbine_data.csv) I hope that the legal issues discussed in OpenEnergyPlatform/data-preprocessing#28 (comment) are cleared and the data can be used.

It would be very good if the power curves could be additionally integrated into the OEP-Database.

The below code is available under the MIT License and free for anyone to use:

from bs4 import BeautifulSoup # parse html
import requests
import json5 # parse js-dict to python
import json
import pandas as pd
from tqdm import tqdm # fancy for loop

# create list of turbines with available powercurves
page = requests.get('https://www.wind-turbine-models.com/powercurves')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='chosen-select')

wind_turbines_with_curve = []
for i in name_list.find_all('option'):    
    wind_turbines_with_curve.append(i.get('value'))

def downloadTurbineCurve(turbine_id,start = 0, stop=25):
    url = "https://www.wind-turbine-models.com/powercurves"
    headers = dict()
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    data = {'_action': 'compare', 'turbines[]': turbine_id, 'windrange[]': [start, stop]}

    resp = requests.post(url, headers=headers, data=data)
    strings = resp.json()['result']
    begin = strings.find('data:')
    end = strings.find('"}]', begin)
    relevant_js = '{'+strings[begin:end+3]+'}}'
    curve_as_dict = json5.loads(relevant_js)
    x = curve_as_dict['data']['labels']
    y = curve_as_dict['data']['datasets'][0]['data']
    label = curve_as_dict['data']['datasets'][0]['label']
    url = curve_as_dict['data']['datasets'][0]['url']
    df = pd.DataFrame(y, index=x, columns=[label])
    df.index.name = 'wind_speed'
    return df

curves = []
for turbine_id in tqdm(wind_turbines_with_curve):
    curve = downloadTurbineCurve(turbine_id)
    curves.append(curve)
c = pd.concat(curves,axis=1)
d = c[c.any(axis=1)]
    
with open('down.csv','w') as f: 
    d.to_csv(f)

The text was updated successfully, but these errors were encountered:

Ludee · 2021-12-14T09:47:52Z

Dear @maurerle,
thanks for pushing this collaborative database.
From my point of view, the scraping and collecting of the website is a grey area.
It can be considered legal, if only a part is beeing collected.

Im Regelfall ist Web Scraping für die empirische Forschung rechtlich zulässig. Die Nutzungsbedingungen, die häufig verwendet werden, ändern daran nichts. Anders sieht es mit technischen Sperren aus, die nicht umgangen werden dürfen.
Wer sicher gehen will, kann den Hersteller der Datenbank um Erlaubnis fragen und sich diese – am besten in Textform (zum Beispiel per E-Mail) – geben lassen. In Zweifelsfällen beraten die Rechtsabteilungen der Forschungseinrichtungen.

Grenzen des "Web Scrapings" - https://www.forschung-und-lehre.de/recht/grenzen-des-web-scrapings-2421/

But the publication under an open license is definitely not possible.
I contacted the website owners some time ago but there was no interest in collaboration.

This is why I started to gather the original sources and start a new open database under an appropriate open data license!

Ludee self-assigned this Dec 14, 2021

Ludee added enhancement question wontfix labels Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Data from wind-turbine-models.com #120

Add Data from wind-turbine-models.com #120

maurerle commented Nov 26, 2021 •

edited

Ludee commented Dec 14, 2021

Add Data from wind-turbine-models.com #120

Add Data from wind-turbine-models.com #120

Comments

maurerle commented Nov 26, 2021 • edited

Ludee commented Dec 14, 2021

maurerle commented Nov 26, 2021 •

edited