pandas-cli

Pandas CLI! Note that this project is still in it's infant stages, and there are not many bells and whistles.. yet..

Big thanks to pandas for such an incredible tool, and PyInquirer for your innovative solutions to a beautiful CLI.

The Problem

I am personally really passionate about finding ways to bridge the gap between software engineering and data science. Being a part of the data science community while working as a data analyst/software engineer, I see this gap becoming more evident all the time.

So what am I trying to solve?

First and foremost.. Create a tool that holds your hand through the dirty work of cleaning data.

This is always such a chore, consuming ~70% of your time as a data scientist. Rather than spinning up jupyter notebook and playing around with the data for hours, I want a nice interface that holds my hand through the process. How does the data need to be read in? CSV? JSON? Excel? What are the arguments for those functions? What does my data look like?

On reading in the data.. Can I get a nice breakdown of the columns I specify? Is there a way to automatically determine column types? How about renaming? Subsetting?

These are all parts of the process that are required and redundant. I hope to at least relieve some of the pains of this process with pandas-cli.

Save my process.

A current challenge in data science is version control, especially through jupyter notebook. No one wants to read through a 2000+ diff on an .ipynb just because an svg has changes and it is saved in the json. As I slice and dice my data, I want to keep track of my progress, have the ability to replicate it, and allow other developers to easily contribute to it.

Import to jupyter notebook / console.

At the end of the day.. I still love jupyter. I still want to create some graphs, find correlations, and do some ml. The ability to go from pandas-cli to a jupyter notebook and seamlessly pick up where I started is a must have.

What next?

This project lives on!! I welcome any and all help. Please reach out for any questions you may have about the useage, and please please reach out for suggestions on how to make this better. Thanks in advance!

Running Locally

git clone <project url>
Create virtualenv of your choice, and pip install -r requirements.txt (will make package later)
python main.py

Current Features

Create
- Read in a file and save it as a dataframe.
- You are able to specify file type
- Arguments are shows as a dropdown, specific to the method chosen (example, pd.read_csv)
- Args are required, kwargs are optional

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
logic		logic
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
style.py		style.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandas-cli

The Problem

What next?

Running Locally

Current Features

About

Releases

Packages

Languages

taylorperkins/pandas-cli

Folders and files

Latest commit

History

Repository files navigation

pandas-cli

The Problem

What next?

Running Locally

Current Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages