Reading and Writing Excel Files in Python - Part 1
Excel Introduction
Excel is a spreadsheet software developed by Microsoft for Windows and macOS operating systems. With its intuitive interface, excellent calculation functions and chart tools, plus successful marketing, Excel has always been the most popular personal computer data processing software. Of course, Excel also has many competitors, such as Google Sheets, LibreOffice Calc, Numbers, etc. These competitors can basically be compatible with Excel, at least they can read and write newer versions of Excel files, but of course these are not the focus of our discussion. Mastering how to operate Excel files with Python programs can make daily office automation work easier and more pleasant, and in many commercial projects, importing and exporting Excel files are particularly common functions.
Python operations on Excel require the support of third-party libraries. If you want to be compatible with versions before Excel 2007, that is, Excel files in xls format, you can use the third-party libraries xlrd and xlwt. The former is used to read Excel files, and the latter is used to write Excel files. If you use newer versions of Excel, that is, Excel files in xlsx format, you can use the openpyxl library. Of course, this library can not only operate Excel, but also operate other spreadsheet files based on Office Open XML.
In this chapter, we will first explain how to operate Excel files based on xlwt and xlrd. You can first use the following command to install these two third-party libraries and the tool module xlutils used in conjunction.
pip install xlwt xlrd xlutils
Reading Excel Files
For example, if there is an Excel file named “Alibaba 2020 Stock Data.xls” in the current folder, if you want to read and display the contents of the file, you can complete it through the code shown below.
import xlrd
# Use the open_workbook function of the xlrd module to open the specified Excel file and get a Book object (workbook)
wb = xlrd.open_workbook('Alibaba 2020 Stock Data.xls')
# Get all sheet names through the sheet_names method of the Book object
sheetnames = wb.sheet_names()
print(sheetnames)
# Get the Sheet object (worksheet) through the specified sheet name
sheet = wb.sheet_by_name(sheetnames[0])
# Get the number of rows and columns of the sheet through the nrows and ncols attributes of the Sheet object
print(sheet.nrows, sheet.ncols)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
# Get the specified Cell object (cell) through the cell method of the Sheet object
# Get the value in the cell through the value attribute of the Cell object
value = sheet.cell(row, col).value
# Format data for rows other than the first row
if row > 0:
# Convert the xldate type in column 1 to a tuple first, then format it as "year-month-day" format
if col == 0:
# The second parameter of the xldate_as_tuple function has only two values: 0 and 1
# Where 0 represents a date based on 1900-01-01, and 1 represents a date based on 1904-01-01
value = xlrd.xldate_as_tuple(value, 0)
value = f'{value[0]}年{value[1]:>02d}月{value[2]:>02d}日'
# Process other columns of number type as floating-point numbers with two decimal places
else:
value = f'{value:.2f}'
print(value, end='\t')
print()
# Get the data type of the last cell
# 0 - empty, 1 - string, 2 - number, 3 - date, 4 - boolean, 5 - error
last_cell_type = sheet.cell_type(sheet.nrows - 1, sheet.ncols - 1)
print(last_cell_type)
# Get the values of the first row (list)
print(sheet.row_values(0))
# Get data in a specified row and column range (list)
# The first parameter represents the row index, the second and third parameters represent the start (inclusive) and end (exclusive) column indexes
print(sheet.row_slice(3, 0, 5))
Tip: The Excel file “Alibaba 2020 Stock Data.xls” used in the above code can be obtained through the Baidu Cloud link below. Link: https://pan.baidu.com/s/1rQujl5RQn9R7PadB2Z5g_g Extraction code: e7b4.
I believe that through the above code, you have already understood how to read an Excel file. If you want to know more about the xlrd module, you can read its official documentation.
Writing Excel Files
Writing to Excel files can be done by creating a workbook object through the Workbook class of the xlwt module. Through the workbook object’s add_sheet method, you can add worksheets. Through the worksheet object’s write method, you can write data to specified cells. Finally, through the workbook object’s save method, you can write the workbook to a specified file or memory. The following code implements the operation of writing the exam scores of 5 students in 3 courses to an Excel file.
import random
import xlwt
student_names = ['Guan Yu', 'Zhang Fei', 'Zhao Yun', 'Ma Chao', 'Huang Zhong']
scores = [[random.randrange(50, 101) for _ in range(3)] for _ in range(5)]
# Create a workbook object (Workbook)
wb = xlwt.Workbook()
# Create a worksheet object (Worksheet)
sheet = wb.add_sheet('Grade 1 Class 2')
# Add header data
titles = ('Name', 'Chinese', 'Math', 'English')
for index, title in enumerate(titles):
sheet.write(0, index, title)
# Write student names and exam scores to cells
for row in range(len(scores)):
sheet.write(row + 1, 0, student_names[row])
for col in range(len(scores[row])):
sheet.write(row + 1, col + 1, scores[row][col])
# Save the Excel workbook
wb.save('Exam Scores.xls')
Adjusting Cell Styles
When writing Excel files, we can also set styles for cells, mainly including font (Font), alignment (Alignment), border (Border), and background (Background) settings. xlwt has encapsulated corresponding classes to support these settings. To set cell styles, you first need to create an XFStyle object, and then set fonts, alignment, borders, etc. through the properties of this object. For example, in the above example, if you want to change the background color of the header cells to yellow, you can do it as follows.
header_style = xlwt.XFStyle()
pattern = xlwt.Pattern()
pattern.pattern = xlwt.Pattern.SOLID_PATTERN
# 0 - black, 1 - white, 2 - red, 3 - green, 4 - blue, 5 - yellow, 6 - pink, 7 - cyan
pattern.pattern_fore_colour = 5
header_style.pattern = pattern
titles = ('Name', 'Chinese', 'Math', 'English')
for index, title in enumerate(titles):
sheet.write(0, index, title, header_style)
If you want to set a specified font for the header, you can use the Font class and add the code shown below.
font = xlwt.Font()
# Font name
font.name = 'KaiTi'
# Font size (20 is the base unit, 18 means 18px)
font.height = 20 * 18
# Whether to use bold
font.bold = True
# Whether to use italic
font.italic = False
# Font color
font.colour_index = 1
header_style.font = font
Note: The font name specified in the above code (
font.name) should be a font available on the local system, for example, on my computer there is a font named “KaiTi”.
If you want the header to be vertically and horizontally centered, you can use the following code to set it.
align = xlwt.Alignment()
# Vertical alignment
align.vert = xlwt.Alignment.VERT_CENTER
# Horizontal alignment
align.horz = xlwt.Alignment.HORZ_CENTER
header_style.alignment = align
If you want to add a yellow dashed border to the header, you can use the following code to set it.
borders = xlwt.Borders()
props = (
('top', 'top_colour'), ('right', 'right_colour'),
('bottom', 'bottom_colour'), ('left', 'left_colour')
)
# Set border style and color for all four directions through a loop
for position, color in props:
# Use the setattr built-in function to dynamically assign values to specified attributes of the object
setattr(borders, position, xlwt.Borders.DASHED)
setattr(borders, color, 5)
header_style.borders = borders
If you want to adjust the cell width (column width) and header height (row height), you can do it according to the following code.
# Set row height to 40px
sheet.row(0).set_style(xlwt.easyxf(f'font:height {20 * 40}'))
titles = ('Name', 'Chinese', 'Math', 'English')
for index, title in enumerate(titles):
# Set column width to 200px
sheet.col(index).width = 20 * 200
# Set cell data and style
sheet.write(0, index, title, header_style)
Formula Calculation
For the previously opened “Alibaba 2020 Stock Data.xls” file, if you want to calculate the average of the closing price (Close field) for the whole year and the total trading volume (Volume field) for the whole year, you can use Excel’s formula calculation. We can first use xlrd to read the Excel file, then use the copy function provided by the xlutils third-party library to convert the read Excel file into a Workbook object for write operations. When calling the write method, you can write a Formula object to a cell.
The code to implement formula calculation is shown below.
import xlrd
import xlwt
from xlutils.copy import copy
wb_for_read = xlrd.open_workbook('Alibaba 2020 Stock Data.xls')
sheet1 = wb_for_read.sheet_by_index(0)
nrows, ncols = sheet1.nrows, sheet1.ncols
wb_for_write = copy(wb_for_read)
sheet2 = wb_for_write.get_sheet(0)
sheet2.write(nrows, 4, xlwt.Formula(f'average(E2:E{nrows})'))
sheet2.write(nrows, 6, xlwt.Formula(f'sum(G2:G{nrows})'))
wb_for_write.save('Alibaba 2020 Stock Data Summary.xls')
Note: The above code has some minor flaws. Interested readers can explore on their own and think about how to solve them.
Summary
Mastering the method of operating Excel with Python programs can solve many tedious tasks of processing Excel spreadsheets in daily office work. The most common is merging multiple Excel files with the same data format into one file and extracting specified data from multiple Excel files or sheets. Of course, if you want to process table data, using pandas, one of Python’s data analysis tools, may be more convenient.