Automate Data Processing in Python with Mito

| | ,

Home » Automate Data Processing in Python with Mito

A spreadsheet can be an excellent way to slice, dice, and manipulate data depending on what one needs to do. Spreadsheets are a popular and commonly used form of software that is used to help manage various bits of information and data and can be used to manipulate that data to create charts, graphs, and other visual representations of the information that are important for understanding and comparing data. Microsoft Excel is one of the most commonly used spreadsheets. There are certain kinds of information in a dataset that might not look so easy at first glance but a good spreadsheet user can find ways to sort through it and turn it into something more useful or understandable. This blog will talk about Mito and its features and walk you through how you can automate data processing in Python with Mito.

The MITO library is a Jupyter notebook extension and Python library for reading, writing, manipulating, and displaying tabular data. It makes it super easy to manipulate data and build dashboards with rich visualizations in a spreadsheet environment with GUI support,  which is quite similar to pivot tables in Excel. Mito can even generate full-fleshed Python code for every single edit we make in its interface.

Installation and Configuration

First of all, Mito requires you to have Python 3.6 or above to even install it. You can check your Python by opening a command line and running the following command:

python --version  #Windows
python3 --version  #Mac/Linux

If you already have a sufficient version of Python you can continue. But if you don’t, you’ll have to install and or update Python on your system. You can find links to it here.  

Let’s create a virtual environment for our project to avoid any plugin conflict in the future. If you don’t have virtualenv installed already, you can easily install it using the pip package.

#Install virtualenv
pip install virtualenv

#Create a virtual environment
virtualenv mito_env

#Activate the environment
source mito_env/bin/activate

Now we can install Mito, and if you noticed,  Mito is also a Python library so it can be installed via the pip package manager. First, we’ll have to download the Mito installer:

python -m pip install mitoinstaller

Then, run the installer with:

python -m mitoinstaller install

After this finishes, it would either open up a Jupyter lab on its own or ask you to open one yourself something like this:

If this happens just run the following command and it will open a new Jupyter lab window in your default browser.

Jupyter lab

MitoSheets Interface

In the new Jupyter lab window, you would see an interface like this:

import mitosheet
mitosheet.sheet()

Here Mito has already imported the mitosheet package, you can run this window with the little Play button on top or by pressing shift+enter then it will start the Mito interface. 

For the first time, it will ask you to create a user account on Mito, so fill out the form and continue with the rest of the process. After that is done, you will be redirected to the GUI spreadsheet interface of mito. It is worth taking a moment to discuss how changes in the GUI are converted to Python equivalent code.

Loading Dataset:

Loading a dataset into MitoSheets couldn’t be easier. Click “Import,” then choose whether you would like to add a file from the current folder or by clicking on the “path of the file”. After importing the dataset you will see the spreadsheet something like this here you can edit the spreadsheet as you will on ms excel and it will give you the live Python code for it.

Generated code:

from mitosheet import *; register_analysis('UUID-c4d5c3c0-5fb8-49ae-a24a-6084d9194f25')

# Imported movies.csv
import pandas as pd
movies = pd.read_csv(r'movies.csv')

# Set column Film at index 6 in movies to WALL-EE
movies.at[6, 'Film'] = "WALL-EE"

Creating a graph:

You can create a graph by going to the graph section on the menu bar and then selecting the value for both the x-axis and y-axis after that it will generate a graph for you it’s a straightforward process. And you can also copy the code for the graph from here.

Copied code from for the graph:

# Import plotly and create a figure
import plotly.graph_objects as go
fig = go.Figure()



# Add the bar chart traces to the graph
for column_header in ['Lead Studio']:
    fig.add_trace(
        go.Bar(
            x=movies[column_header],
            y=movies['Worldwide Gross'],
            name=str(column_header)
        )
    )

# Update the title and stacking mode of the graph
# See Plotly documentation for customizations: https://plotly.com/python/reference/bar/
fig.update_layout(
    xaxis_title="Lead Studio",
    yaxis_title="Worldwide Gross",
    title="Lead Studio, Worldwide Gross bar chart",
    barmode='group',
)
fig.show(renderer="iframe")

Output:

Merging Datasets:

Merging datasets in mito is easy, follow along the steps to see how to do that and in the end, it will also give you the code for doing that.

  1. Import both the files in the mito.
  2. After they are in the mito you can see them both in the footer section and you can switch between both of them to see what column you want to merge by comparing them.
  3. Press on the merge button and it will open a new tab, there you can see that it is automatically detected, that the two of the columns are the same as each other hence mito has automatically merged both of the datasets. You can see the new dataset in the footer section named df3.

And you can see that the code is also created for all processes we have done here 

from mitosheet import *; register_analysis('UUID-18ffd57b-0e96-421a-9969-91983d3f4ae3')

# Imported movies_names.csv
import pandas as pd
movies_names = pd.read_csv(r'movies_names.csv')

# Imported movies.csv
import pandas as pd
movies = pd.read_csv(r'movies.csv')

# Merged movies_names and movies
temp_df = movies.drop_duplicates(subset='Film') # Remove duplicates so lookup merge only returns first match
df3 = movies_names.merge(temp_df, left_on=['Film'], right_on=['Film'], how='left', suffixes=['_movies_names', '_movies'])

Pivoting Data:

To pivot data follow the process given below

  1. Import the dataset
  2. Press on the pivot button present on the top menu.
  3. It will open a new tab where you can select the row, column, and value then it will automatically create a new pivot table in the new file, you can see the file in the footer section.

Generated code:

from mitosheet import *; register_analysis('UUID-90e1618c-7405-465c-a29c-29da8a616bf2')

# Imported movies.csv
import pandas as pd
movies = pd.read_csv(r'movies.csv')

# Pivoted movies into df2
unused_columns = movies.columns.difference(set(['Lead Studio']).union(set([])).union(set({'Rotten Tomatoes %'})))
tmp_df = movies.drop(unused_columns, axis=1)
pivot_table = tmp_df.pivot_table(
    index=['Lead Studio'],
    values=['Rotten Tomatoes %'],
    aggfunc={'Rotten Tomatoes %': ['sum']}
)
pivot_table.columns = [flatten_column_header(col) for col in pivot_table.columns.values]
movies_pivot = pivot_table.reset_index()

Shorting and filtring Data:

Follow the steps given bellow

  1. Press on the edit filter button.
  2. It will open a new tab, here for shorting select the ascending or descending button in what order you want to display the data.
  3. For filtering press on add filter button then select the type of filter you want.

Generated code:

from mitosheet import *; register_analysis('UUID-90e1618c-7405-465c-a29c-29da8a616bf2')

# Imported movies.csv
import pandas as pd
movies = pd.read_csv(r'movies.csv')

# Sorted Rotten Tomatoes % sum in movies_pivot in descending order
movies_pivot = movies_pivot.sort_values(by='Rotten Tomatoes % sum', ascending=False, na_position='last')

# Sorted Rotten Tomatoes % sum in movies_pivot in ascending order
movies_pivot = movies_pivot.sort_values(by='Rotten Tomatoes % sum', ascending=True, na_position='first')

# Filtered Rotten Tomatoes % sum in movies_pivot
movies_pivot = movies_pivot[movies_pivot['Rotten Tomatoes % sum'] > 200]

Final Words

In this blog post, we learned about how to use the Mito add-on for JupyterLab for Exploratory Data Analysis. The tool is powerful, but it is missing some of the features. However, it has a great community behind it, so we may see more features and expansion with its capabilities as time goes on. Thank you for reading this blog post.

Spread the love
 
   
Previous

Getting started with Apache Cassandra in Python

How to Create a Car Racing Game in Python

Next

Leave a Comment