Discussing the ongoing digitalization of the agricultural industry with a friend over Christmas, I started to wonder if I would be able to get access to some agricultural production data myself. Luckily, my partner's uncle is in fact a dairy farmer and has been using a milking robot for the past couple of years on his small dairy farm in Heusden, the Netherlands.
Using a backup of a database with data generated by a Delaval milking system, I set up a Docker instance on my Macbook to run a local SQL server in order to be able to access this database. Initially I used Azure Data Studio to explore this database and find the right data tables to use in an Exploratory Data Analysis, or EDA. In this project however, we will be connecting directly to the database running on this SQL-server using the package sqlalchemy.
Using an Amazon S3 bucket and RDS, I managed to host the database in the cloud. Any further analyses will be done by connecting this database in the cloud.
This data science project will contain several components which we will add along the way, such as:
- A simple EDA, getting to know the data in the milking robot database
- Estimating a linear regression model to see what factors drive milk production and predict future prediction
- Running a k-means clustering algorithm in order to find segments of cows in the livestock
- Applying the XGBoost-algorithm in order to be able to predict Invalid Yield