My name is Guillermo and I am a Data Scientist, I love explaining data with Statistics, Artificial Intelligence Models, and Visualization Tools. I have worked on analyzing hundreds of Data-sets comprising billions of registries in a wide variety of topics from Social-networks, Real-estate, Video games, and E-commerce businesses to Academic papers and Experimental data obtained in research. Right now I’m a Ph.D. candidate (I finished my research and I’m waiting for my dissertation) with expected graduation in March 2021.
My Ph.D. research was based on new methods of characterization for hydrodynamic phenomena in Pumps working as Turbines using CFD simulations, Artificial Intelligence, and Experimental Techniques. I believe that there is always something that can be done better in an organization and my skills allow me to find just that. My biggest interest is applying artificial intelligence tools to predict the behavior of data and to produce tools that can be used in improving processes.
Below you can see the projects I have worked on. The majority are private Repositories because they are part of products that can be purchased, or are being published in a research paper. I'm actively working on making a website where my projects will have a description of the code that I have employed and show my results. If you want to know more about my work please contact me.
These projects are about data extraction from websites or APIs, using popular Python libraries such as Scrapy, Selenium, Beautiful Soup or Requests.
- E-commerce web scraping (ebay, Amazon, mercadolibre): Web scraping of products according to a keyword in Ebay, Amazon and Mercadolibre.
- Real-Estate web scraping (Finca-raiz, Ciencuadras, Metrocuadrado): Web scraping of all properties on sale in Colombia.
- Social media web scraping (Facebook, Twitter, Linked-in, Instagram): Web scraping of social media information in Facebook, Twitter, Linked-in and Instragram.
- Job Postings web Scraping (elempleo,computrabajo): Web scraping of all Job postings in El-empleo and Computrabajo to analyze all the salaries in Colombia.
- PetStores web scraping (Tierragro, Kanu): Product surveillance for a new petstore located in medellin
- Lingoda web scraping: Web scraping of all courses material
- League of Legends data mining: Datamining of matches and players in popular game
- SIPRA web scraping: webscraping of geo-location information about crops in Colombia
- DANE data-mining: data mining of IPC data in Colombia
Here are the Tools that I have made over the years for solving different kind of problems or software that I have sold.
- NotiJob: Notification tool for Upwork that notifies when a new job is posted based in keywords. Made in Python.
- CFX_Design: Maker of CFX files to automatize simulations. Made in Python.
- Arbox: Real Estate Software Manager. Made in Visual Basic.
- JProperties: Bridge between Real-Estate databases for zona-prop.com.co. Made in Java.
I've done a lot of data analysis across the years, but the more recent ones are:
- Job Market Salaries in Colombia: Analysis for the job salaries in Colombia based in their salaries in Computrabajo.com and Elempleo.com.
- Analysis for LOL Amateur Team: Analysis of the player data of an amateur league of legends team. The product is a report of what they are doing wrong and where they can improve.
- Analysis of probe simulation: Design of Experiments analysis for simulations made in COMSOL.
- Design of experiments analysis of CFX simulations: Design of Experiments analysis for simulations of a pump working as turbine.
I have worked with many types of applications to analyze images but the one I have the most experience is the Python Open Cv library:
- Tracking a ball: Script that tracks a tenis ball in a video.
- Measuring Viscosity in the Lab: Script that tracks a ball in a tube, this measures the velocity of the ball in the fluid and then calculate the viscocity of the fluid.
- Vortex segmentation and 3d reconstruction: Image processing of more than 200.000 photos of a vortex in a tube. With operations on the images it was possible to perform a segmentation of the vortices.
- Flow visualization in a pump working as turbine: Tracking of small particles of polymer in a pump, this tracking allowed to create a visualization of the flow in the pump.
For now, I have only used classification and clustering models in machine learning, but I'm looking forward to being able to use more models in solving new problems:
- ADR - Prediction of projects: This project was the last project of my course in data science by Correlation One and Mintic. In this work, 2 products were made: a dashboard for ADR management to know statistically how the projects are composed, and an android app that predicts if a project will be financed by ADR.
- Real State Prices: This project was the last project of my Course in data science by IBM. In this work, I predicted the prices of properties based on their location and number of rooms.
NLP is an area that can solve a lot of problems for an organization. I analyzed tweets, reviews and scientific papers to test my skills.
- Using NLP to review Scientific Papers: Review of scientific papers is something very time-consuming and expensive. In my Ph.D. I had to read more than 5000 papers, in order to do it efficiently, I created an algorithm based on Statistics and NLP to be able to review all the papers quickly.
- Sentiment analysis of Tweets: Review when a tweet about an organization is neutral, positive or negative.
- Sentiment analysis of reviews from Amazon: Analysis of reviews when they are neutral, positive or negative.
I have worked on 2 projects concerning Deep Learning using TensorFlow
- Image Segmentation of a Vortex: With a dataset of more than 200000 images I made an application that segments these images based on the Unet neural network.
- Phenomena prediction in Hydraulics: With TensorFlow, I predicted when a hydraulic phenomenon is occurring in a pump working as a turbine, analyzing a dataset of more than 1 billion registries.