What can Python do for Excel? If you’ve experienced unexpected workbook crashes, calculation errors, and the need to perform tedious manual operations, then you’ll want to know the answer to this question. This book is a comprehensive yet concise guide to getting started with Python for worksheet software users. Don’t shy away from learning to program because you’re afraid to do so; Felix provides an excellent entry point to Python that even experienced programmers can benefit from. At the same time, he has organized the book in a way that makes it easier for Excel users like you to understand and apply. It’s a guide to maximizing the power of Excel with the help of Python. If you want to know what the potential of Excel combined with Python is, then Felix is the right person to answer that question. I hope you’ll enjoy this masterclass as much as I did.
Excel is not going away; it will continue to be a versatile desktop tool in businesses and homes. This book bridges the gap between these two worlds. It explains how you should integrate Python into Excel, and how to free yourself from the giant workbooks, thousands of formulas, and odd VBA code that you can’t hide from. This book is probably the most useful book I’ve ever read on Excel, and is a must-read for every advanced Excel user. Highly recommended!
Excel has always been a fundamental tool in the financial world, but there are a large number of Excel applications that are not very useful. This book does a great job of teaching the reader how to build better, more robust Excel applications with the help of xlwings.
Whenever worksheet tools hit a bottleneck, Excel users start to question them. It’s not uncommon for Excel workbooks to become slower and even crash when they hold too much data and formulas. But before things get serious, it might be a good idea to think about the way you work. Let’s say you’re dealing with a very important workbook – an error could cause financial or reputational damage; or you spend hours every day manually updating your Excel workbook. If you encounter any of these situations, you should learn how to use a programming language to automate these operations. Automation prevents human error and allows you to spend more time on more productive tasks-not on copying and pasting data into Excel worksheets.
Now that you know why Python can be a “good partner” for Excel, it’s time to configure your environment and start
write your first line of Python code!
See the surprise at the end, and get this booklet at the end of this article!
Development Environment
2.1.1 Installation
Go to the Anaconda homepage and download the latest version of the Anaconda installer (Individual Edition). Be sure to download the 64-bit graphical installer for Python 3.x. Once the download is complete, double-click the installer to start the installation, making sure to leave all options at their default values. Refer to the official Anaconda documentation for more details on the installation process.
Once Anaconda is installed, you can launch Anaconda Prompt to start learning. Here’s a look at what it is and how it works.
2.1.2 Anaconda Prompt
Anaconda Prompt is actually a command prompt in Windows or a terminal in macOS, except that it is configured with a Python interpreter and third-party packages. command-line tools provided by various packages.
2.1.3 Python REPL: Interactive Python Sessions
In Anaconda Prompt, an interactive Python session can be started by executing the python command.
(base) C:\Users\felix>python
Python 3.8.5 (default, Sep 3 2020, 21:29:08) […] :: Anaconda, Inc. on win32
Type “help”, “copyright”, “credits” or “license” for more information.
>>>>
The text displayed in the macOS terminal may be a little different, but the reasoning is the same. This book is written for Python version 3.8, so if you want to use a newer version, be sure to read the instructions on the book’s home page.
2.1.4 Package managers: Conda and pip
I mentioned earlier Python’s package manager, pip, which is responsible for downloading, installing, updating, and uninstalling Python packages and their dependencies and subdependencies. While Anaconda also works with pip, it also has a built-in package manager called Conda, which has the advantage of being able to install not only Python packages, but also multiple versions of the Python interpreter. In a nutshell: packages can add features to Python that aren’t available in the standard library. One such package is pandas, which you’ll see in Chapter 5. Since the Anaconda Python distribution already comes with these package managers pre-installed, there is no need to install them manually.
2.1.5 The Conda Environment
You’re probably wondering what the (base) at the beginning of each line of the Anaconda Prompt is. It is the name of the currently active Conda environment, which is an isolated “Python world” with a specific version of Python and a set of installed packages. Why do you have to do this? When you’re working on multiple projects at once, each project will have different needs: one project might need Python 3.8 and pandas 0.25.0, while another might need Python 3.9 and pandas 1.0.0. .0, you can’t just update Python and pandas and leave the code intact. Configuring a Conda environment for each project will ensure that they run with the correct dependencies; while Conda environments are proprietary to the Anaconda distribution, virtual environments are common to all Python distributions. The Conda environment is more powerful in comparison, because it allows you to manage not only multiple versions of packages, but also different versions of the Python interpreter with ease.
Getting Started with Python
3.1 Data Types
Like other programming languages, Python treats data like numbers, text, and booleans differently; Python does this by assigning different data types to them. The most common data types are integer, floating point, boolean, and string. This section will introduce each of them with some examples. To understand what a data type is, it is necessary to first explain what an object is.
3.2 Indexing and Slicing
Indexes and slices allow you to access specified elements of a sequence. Strings are sequences of characters, and we can learn this mechanism with strings. Other sequences that support indexing and slicing, such as lists and tuples, are described in the next section.
3.3 Data Structures
Python provides powerful data structures to facilitate working with collections of objects. This section introduces lists, dictionaries, tuples, and collections. While each data structure has its own characteristics, they all have the common feature of being able to store multiple objects. In VBA, you may have used collections or arrays to store multiple values. vba also provides a data structure called a dictionary, which is the same as a dictionary in Python, but still only available in Windows. Let’s start with the most commonly used data structure, the list.
3.4 Control Flow
This section will introduce if statements, for loops, and while loops. The if will only execute specific code when certain conditions are met, and the for and while loops will repeatedly execute the code in a block of code. At the end of this section, I will also introduce the list derivative, which can be used instead of the for loop to complete the construction of a list. This section begins with the definition of a code block and introduces one of Python’s most notable features: whitespace with special meaning.
3.5 Organizing Code
In this section, we learn how to make code into a maintainable structure: first, we introduce the nuts and bolts of functions, and then we show you how to break up your code into different Python modules. At the end of this section, we’ll apply what we’ve learned by looking at the datetime module in the standard library.
3.6 PEP 8: Python Style Guide
You’re probably wondering why I sometimes underscore variable names and sometimes capitalize them all. In this section, I’ll explain my formatting choices while introducing Python’s official style guide, which uses so-called Python Enhancement Proposals (PEPs) to discuss the introduction of new language features. the style guide for Python code is one of them. These proposals are generally represented by numbers, and the code style guide is called PEP 8. PEP 8 is a series of style recommendations provided to the Python community. If everyone using the same code follows the same style of code, the code written will be more readable. In the open source world, where there are many programmers working on the same project who don’t know each other, it’s especially important to follow the same style of code.
This short Python file in Example 3-2 demonstrates the most important programming conventions.
Example 3-2 pep8_sample.py
“””This script shows some of the rules of PEP 8
“””
import datetime as dt
TEMPERATURE_SCALES = (“fahrenheit”, “kelvin”,
“celsius”)
class TemperatureConverter:
pass # Do nothing for now
def convert_to_celsius(degrees, source=”fahrenheit”):
“””This function converts degrees Fahrenheit or Kelvin to Celsius
“””
if source.lower() == “fahrenheit”:
return (degrees-32) * (5/9)
elif source.lower() == “kelvin”:
return degrees – 273.15
else:
return f “Don’t know how to convert from {source}”
celsius = convert_to_celsius(44, source=”fahrenheit”)
non_celsius_scales = TEMPERATURE_SCALES[:-1]
print(“Current time: ” + dt.datetime.now().isoformat())
print(f “The temperature in Celsius is: {celsius}”)
1. Use a docstring at the top of the file to explain what the script or module does. A docstring is a special kind of string that is quoted in 3 quotes. In addition to serving as documentation for code, it can be used to write strings that span multiple lines. If you have a string with many double or single quotes, then you can also use document strings to avoid escaping. We will see in Chapter 11 that document strings also work well when writing SQL queries that span multiple rows.
2. All import statements should be placed at the top of the file, one line at a time. Imports from the standard library go first, followed by third-party packages, and finally modules written by yourself. But only the standard library is used in this example.
3. use capital letters and underscores to indicate constants. The length of each line should not exceed 79 characters. Use parentheses, brackets, or brackets implicitly across lines whenever possible. 4.
4. Separate classes, functions, and other code with two blank lines. 5.
Although many classes are named using lowercase letters like datetime, you should also use capitalizedWords for your own classes. See Appendix C for more information on classes. 6.
In-line comments should be separated from the code by at least two spaces. Code blocks should be indented with 4 spaces. 7.
Functions and arguments should be named with lowercase letters and underscores where readability can be improved. Do not use spaces between parameter names and default values. 8.
8. The documentation string for the function should list the function parameters and explain their meaning. I didn’t do this to keep the example short, but as we will see in the companion code base in Chapter 8, excel.py has a full documentation string. 9.
9. Do not use spaces before and after colons.
10. You can use spaces before and after arithmetic operators. If operators with different priorities are used at the same time, you should consider adding spaces before and after the operator with the lowest priority. In this case, the multiplication sign has the lowest priority, so it is preceded and followed by spaces.
Use lowercase letters for variable names. Use underscores where readability can be improved. When assigning a value to a variable, add spaces before and after the equal sign. When calling functions, however, do not use spaces before and after keyword arguments. When indexing and slicing, do not use spaces before and after square brackets. This is just a brief introduction to PEP 8, and after you start using Python in earnest, you should read the original PEP 8. PEP 8 makes it clear that these rules are just suggestions, and that your own programming style should take precedence. After all, uniformity is the most important thing. If you’re interested in other publicly available programming style guides, you can also take a look at Google’s Python style guide, which is closer to PEP 8. Most Python programmers don’t actually follow PEP 8 to the letter, and the most common mistake is to exceed 79 characters per line. It can be difficult to keep the formatting straight when writing code, but you can use tools to have it automatically check if the code follows a certain programming style. The next section will show you how to use VS Code for automatic formatting.
Getting Started with pandas
4.1 NumPy arrays
As described in Chapter 3, if you want to perform array operations on nested lists, you can use loops to do so. For example, to add 1 to each element in a nested list, you can use the following derivation of a nested list.
In [1]: matrix = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
In [2]: [[i + 1 for i in row] for row in matrix]
Out[2]: [[2, 3, 4], [5, 6, 7], [8, 9, 10]]
But the readability of such code is very low. More critically, traversing the entire array can be very slow when dealing with larger arrays. If you have the right use case and the right array size, then using NumPy arrays can be hundreds of times faster than Python lists. To achieve this high performance, NumPy takes advantage of code written in C and Fortran (both compiled languages, which are much faster than Python) NumPy arrays are N-dimensional arrays that hold homogenous data. “Homogenous” means that all data in the array must be of the same type. The most common case is dealing with
Here’s how to create a one-dimensional array and a two-dimensional array, both of which will be used throughout this chapter.
In [3]: # First import NumPy
import numpy as np
In [4]: # Construct a one-dimensional array using a list
array1 = np.array([10, 100, 1000.])
In [5]: # Construct a two-dimensional array using a nested list
array2 = np.array([[1., 2., 3.],
[4., 5., 6.]])
Array dimensionality
It is important to note the difference between one-dimensional arrays and two-dimensional arrays. One-dimensional arrays have only one axis and therefore do not distinguish between row and column arrays. This is similar to arrays in VBA, but if you are coming from a language such as MATLAB (where 1D arrays distinguish between row and column arrays), it may take some time to get used to the NumPy approach.
Even though array1 is all integers except for the last element (a floating point number), the data type of the array is still float64, which is sufficient to hold all the elements due to NumPy’s isomorphism requirement. To find out the data type of an array, you can access its dtype property.
In [6]: array1.dtype
Out[6]: dtype(‘float64’)
dtype returns float64 instead of the float described in Chapter 3. As you may have guessed, NumPy uses its own numeric data types, which are more fine-grained than Python’s data types. This is usually not a problem, because most of the time the different data types in Python and NumPy can be converted automatically. If you need to explicitly convert a NumPy data type to a Python base data type, just use the corresponding constructor (I’ll go into more detail on how to access the elements of an array later)
In [7]: float(array1[0])
Out[7]: 10.0
A complete list of NumPy data types can be found in the NumPy documentation. We will see right away that with NumPy arrays, it is possible to perform array operations in a concise way.
5.2 Data manipulation
Real-world data doesn’t just fall out of the sky, and it needs to be cleaned up and made easier to understand before it can be used. Start this section with a look at how to select data from a DataFrame, how to modify it, and how to handle missing and duplicate data. Then do some operations on the DataFrame to see how to work with text data. By the end of this section, you’ll understand when pandas returns a view and when it returns a copy of the data. Many of the concepts in this section are related to what we saw in Chapter 4, NumPy arrays.
5.3 Combining DataFrames
Combining different data sets in Excel can be a hassle, and often requires the use of many VLOOKUP formulas. Fortunately, the combination of DataFrames is the “magic bullet” of pandas, and the data alignment mechanism makes it very easy to implement related functions, thus reducing the possibility of errors. There are many ways to combine and merge DataFrames, and this section will cover the most common ones, concat, join, and merge. While there is overlap in the functions, each of these functions can make a particular class of work easier. I will first introduce the concat function, then explain the different options of the join function, and finally introduce the merge function, which is the most versatile.
5.4 Descriptive statistics and data aggregation
One way to make large datasets more organized is to compute descriptive statistics, such as sums or means, over the entire dataset or subsets. This section first describes how to compute these statistics in pandas, and then describes two ways to aggregate data into subsets: the groupby method and the pivot_table function.
5.5 Plotting
Plotting allows you to visualize the results of your data analysis, which is probably the most important step in the whole data analysis process. We will need to use two libraries for plotting, first looking at Matplotlib, the default plotting library for pandas, and then looking at another modern plotting library, Plotly, which we can use for a better interactive experience in Jupyter notebooks.
5.6 Importing and exporting DataFrames
So far, we have constructed DataFrames in various ways: nested lists, dictionaries, and NumPy arrays. It is necessary to know these tricks, but many times our data is ready and you just need to enter it into a DataFrame. to do this, pandas provides you with various read functions. But even if you want to access a dedicated system and pandas doesn’t provide a built-in reader, you usually have a Python package to connect to that system, and once you have the data, it’s easy to enter it into a DataFrame. In Excel, data import is usually a job for Power Query. After analyzing and modifying a data set, you may want to push the results back to the database or export it to a CSV file, or, as the book’s title suggests, put it into an Excel workbook for your superiors to see. To export a pandas DataFrame, you can use the export methods provided by DataFrame. Table