Working With Excel Spreadsheets in Python

Gaurav Kumar
8 min readJun 21, 2024

Outline: Working With Excel Spreadsheets in Python

1. Introduction

  • Overview of Excel’s importance in data management and analysis
  • Benefits of using Python for Excel tasks
  • Brief introduction to libraries used in Python for Excel

2. Setting Up Your Environment

  • Installing Python and essential libraries (pandas, openpyxl, xlrd, xlwt, xlsxwriter)
  • Setting up a virtual environment
  • Verifying installations

3. Reading Excel Files

  • Using pandas to read Excel files
  • Reading multiple sheets
  • Handling different file formats (.xls, .xlsx)
  • Examples and code snippets

4. Writing to Excel Files

  • Creating new Excel files
  • Writing DataFrames to Excel using pandas
  • Writing to multiple sheets
  • Formatting cells and sheets
  • Examples and code snippets

5. Modifying Existing Excel Files

  • Adding new data to existing files
  • Modifying cell values
  • Inserting and deleting rows/columns
  • Examples and code snippets

6. Advanced Operations

  • Using formulas in Excel through Python
  • Creating charts and graphs
  • Conditional formatting
  • Examples and code snippets

7. Automating Excel Tasks

  • Automating repetitive tasks using scripts
  • Scheduling scripts with cron jobs or task scheduler
  • Examples and practical use cases

8. Error Handling and Debugging

  • Common errors and how to handle them
  • Debugging tips
  • Logging errors

9. Integration with Other Tools

  • Integrating Excel with databases
  • Exporting Excel data to other formats (CSV, JSON)
  • Examples and code snippets

10. Best Practices and Tips

  • Writing clean and efficient code
  • Managing large datasets
  • Ensuring data integrity and security

11. Conclusion

  • Summary of key points
  • Encouragement to explore further
  • Additional resources for learning

Excel spreadsheets are a key tool for managing and analyzing data. Whether you’re handling financial reports or scientific data, Excel’s versatility makes it a favorite across many fields. However, working with large datasets or performing repetitive tasks manually in Excel can be time-consuming and prone to errors. Python, with its powerful libraries and straightforward syntax, offers a way to automate and simplify these tasks. This article will guide you through using Python to work with Excel spreadsheets efficiently.

Setting Up Your Environment

Before we dive into working with Excel files, let’s set up our Python environment.

Installing Python and Essential Libraries

First, make sure Python is installed on your computer. You can download it from the official Python website. After installing Python, you need to install some libraries that will help us work with Excel files. Open your command prompt or terminal and run:

pip install pandas openpyxl xlrd xlwt xlsxwriter

Setting Up a Virtual Environment

It’s a good idea to create a virtual environment for your project. This helps manage dependencies and keeps your project organized. Run these commands:

python -m venv excel_env
source excel_env/bin/activate # On Windows use `excel_env\Scripts\activate`

Verifying Installations

To ensure everything is set up correctly, create a Python script and add the following code:

import pandas as pd
import openpyxl
import xlrd
import xlwt
import xlsxwriter

print("Libraries installed successfully")

Run the script. If you see the message “Libraries installed successfully,” you’re ready to go!

Reading Excel Files

One of the most common tasks is reading data from Excel files. The pandas library makes this easy.

Using pandas to Read Excel Files

To read an Excel file, use the read_excel function from pandas:

import pandas as pd

# Read a single sheet
df = pd.read_excel('sample.xlsx', sheet_name='Allocation')
print(df.head())
# Read multiple sheets
sheets = pd.read_excel('sample.xlsx', sheet_name=['Allocation', 'Holding'])
print(sheets)

Handling Different File Formats

You can read both .xls and .xlsx files using the same function:

# Reading .xls file
df_xls = pd.read_excel('sample.xls')

# Reading .xlsx file
df_xlsx = pd.read_excel('sample.xlsx')

Writing to Excel Files

Writing data to Excel files is straightforward with pandas.

Creating New Excel Files

Here’s how to create a new Excel file and write data to it:

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)

# Write DataFrame to Excel
df.to_excel('output.xlsx', index=False)

Writing to Multiple Sheets

You can write data to multiple sheets using the ExcelWriter object:

with pd.ExcelWriter('output_multi_sheets.xlsx') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)
df.to_excel(writer, sheet_name='Sheet2', index=False)

Formatting Cells and Sheets

For more advanced formatting, you can use the xlsxwriter library:

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)

# Write DataFrame to Excel with formatting
with pd.ExcelWriter('formatted_output.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)

# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']

# Add a format.
format1 = workbook.add_format({'num_format': '0.00'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

Modifying Existing Excel Files

You can also modify existing Excel files by reading them into a DataFrame, making changes, and saving them back.

Adding New Data to Existing Files

Here’s how to add new data to an existing Excel file:

# Load existing file
df_existing = pd.read_excel('output.xlsx')

# New data
new_data = {'Name': ['Chris', 'Sarah'], 'Age': [22, 30]}
df_new = pd.DataFrame(new_data)

# Append new data
df_combined = df_existing._append(df_new, ignore_index=True)

# Save to the same file
df_combined.to_excel('output.xlsx', index=False)

Modifying Cell Values

To modify specific cell values, locate the cell and update its value:

# Load existing file
df = pd.read_excel('output.xlsx')

# Modify a cell value
df.at[1, 'Age'] = 30

# Save the changes
df.to_excel('output.xlsx', index=False)

Inserting and Deleting Rows/Columns

You can insert or delete rows and columns as needed:

# Load existing file
df = pd.read_excel('excels files/output.xlsx')

# Insert a new column
df['New_Column'] = ['Value1', 'Value2', 'Value3', 'Value4', 'Value5', 'Value6']

# Save the changes
df.to_excel('excels files/output.xlsx', index=False)
df.drop(columns=['New_Column'], inplace=True)

Advanced Operations

Python also allows for more advanced Excel operations like using formulas, creating charts, and applying conditional formatting.

Using Formulas in Excel Through Python

You can add formulas to your Excel sheets:

# Create a DataFrame
data = {'Number1': [10, 20, 30], 'Number2': [1, 2, 3]}
df = pd.DataFrame(data)

# Write DataFrame to Excel with formulas
with pd.ExcelWriter('formulas.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)

workbook = writer.book
worksheet = writer.sheets['Sheet1']

# Write a formula
worksheet.write_formula('C2', '=A2+B2')

Creating Charts and Graphs

You can create charts using xlsxwriter:

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32]}
df = pd.DataFrame(data)

# Write DataFrame to Excel and add a chart
with pd.ExcelWriter('charts.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)

workbook = writer.book
worksheet = writer.sheets['Sheet1']

# Create a chart object.
chart = workbook.add_chart({'type': 'column'})

# Configure the chart with data series.
chart.add_series({
'categories': ['Sheet1', 1, 0, 4, 0],
'values': ['Sheet1', 1, 1, 4, 1],
})

# Insert the chart into the worksheet.
worksheet.insert_chart('E2', chart)

Conditional Formatting

You can apply conditional formatting to highlight data:

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Score': [85, 78, 92, 88]}
df = pd.DataFrame(data)

# Write DataFrame to Excel with conditional formatting
with pd.ExcelWriter('conditional_formatting.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1', index=False)

workbook = writer.book
worksheet = writer.sheets['Sheet1']

# Apply conditional formatting
worksheet.conditional_format('B2:B5', {'type': '3_color_scale'})

Automating Excel Tasks

Automating repetitive tasks can save you a lot of time.

Automating with Scripts

You can write Python scripts to automate tasks:

import pandas as pd

# Load data
df = pd.read_excel('monthly_report.xlsx')

# Perform some operations
df['Total'] = df['Quantity'] * df['Price']

# Save the updated file
df.to_excel('updated_report.xlsx', index=False)

Scheduling Scripts with Cron Jobs or Task Scheduler

You can schedule these scripts to run automatically at specific times using Task Scheduler (Windows).

Example Task Scheduler (Windows)

  1. Open Task Scheduler.
  2. Create a new task.
  3. Set the trigger to daily at your desired time.
  4. Set the action to start a program and choose your Python interpreter and script path.

Error Handling and Debugging

Proper error handling ensures your scripts run smoothly.

Common Errors and How to Handle Them

Some common errors include file not found, incorrect sheet names, or data type mismatches. Handle these errors using try-except blocks:

try:
df = pd.read_excel('non_existent_file.xlsx')
except FileNotFoundError as e:
print(f"Error: {e}")

Debugging Tips

Use logging to keep track of errors and debug your scripts:

import logging

# Configure logging
logging.basicConfig(filename='app.log', level=logging.ERROR)

try:
df = pd.read_excel('non_existent_file.xlsx')
except FileNotFoundError as e:
logging.error(f"Error: {e}")

Integration with Other Tools

Python can integrate Excel with various other tools for a seamless workflow.

Integrating Excel with Databases

You can read from and write to databases using libraries like SQLAlchemy:

from sqlalchemy import create_engine

# Create an engine
engine = create_engine('sqlite:///my_database.db')

# Read from database
df = pd.read_sql('SELECT * FROM my_table', engine)

# Write to Excel
df.to_excel('from_database.xlsx', index=False)

df

Exporting Excel Data to Other Formats

You can easily export Excel data to CSV or JSON:

# Load data from Excel
df = pd.read_excel('output.xlsx')

# Export to CSV
df.to_csv('output.csv', index=False)

# Export to JSON
df.to_json('output.json', orient='records')

Best Practices and Tips

Adopting best practices ensures efficient and maintainable code.

Writing Clean and Efficient Code

  • Use meaningful variable names.
  • Write functions for repetitive tasks.
  • Comment your code for clarity.

Managing Large Datasets

  • Use the chunksize parameter to process large files in smaller chunks.
  • Optimize DataFrame operations to avoid memory issues.

Ensuring Data Integrity and Security

  • Validate data before processing.
  • Handle sensitive data carefully to avoid leaks.

In case you want to get all the codes and excel resources to try your hands on with the libraries and the codes, you can clone the repo here.

Conclusion

Python’s capabilities for working with Excel are vast and powerful. From basic reading and writing to advanced operations and automation, Python can significantly enhance your productivity and efficiency when dealing with Excel spreadsheets. By leveraging libraries like pandas, openpyxl, and xlsxwriter, you can automate tedious tasks, handle large datasets, and integrate Excel with other tools seamlessly. With the knowledge and examples provided in this article, you are well-equipped to explore and master Excel operations in Python.

Going forward, I would like to build a live dashboard in Excel with data input using python. A multi-page financial dashboard is something I’ve been planning to do since long. Let me know your interest for the same by commenting and sharing the story. Thanks !!!

--

--