I found that many of my friends write Python scripts very randomly, either without functions or with functions defined everywhere, so that at first glance it is impossible to see where the first line of code to be executed is located anyway.
def main():
# do something
print(“do something.”)
if __name__ == “__main__”:
main()
You may object: I’ll write it how I like, why should I listen to you and write more if __name__…?
Don’t worry, let me say three reasons.
First, it makes the role of Python files more clear
First, you need to understand what __name__ does. When a script is executed directly by the Python interpreter, its value is “__main__”, and when it’s imported by another Python program, its value is the corresponding Python script filename, which can be verified in the Python interpreter, assuming there’s some_script.py whose The contents are as follows.
print(“some_script.py”)
print(__name__)
Import it in the Python interpreter.
❯ vim some_script.py
❯ python
Python 3.8.5 (v3.8.5:580fbb018f, Jul 20 2020, 12:11:27)
[Clang 6.0 (clang-600.0.57)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import some_script
some_script.py
some_script
>>>>
You can see that the value of __name__ is the file name of the Python script, some_script.
That means that if __name__ == “__main__”: the code that follows won’t run when you import.
With this in mind, if __name__ == “__main__”: can be used as a marker to distinguish between a script and a library, so that when we see if __name__ == “__main__”:, we think it’s a script that can be run directly, and when we don’t see this line of code, we think it’s a library that can be referenced by other programs. Explicit is better than implicit., isn’t it?
Another example.
Suppose you write a script without if __name__ == “__main__”:, called bad_script.py, which reads
def useful_function(x):
return x * x
class UsefulClass:
def __init__(self, x):
self.x = x
# You tested it yourself.
for i in range(7):
print(useful_function(i))
Someone else has written a useful.py that references your useful_function.
from bad_script import useful_function
def main():
print(f'{useful_function(3)=}’)
if __name__ == ‘__main__’:
main()
As soon as it runs, it finds that it prints the unpredictable content, as seen in red in the following figure.
If you find out that your script output is the cause, will people scold you?
If you define a global variable in your script, and someone else imports * in an inappropriate place, they will import your global variable as well, resulting in variable overwriting, which can easily lead to bugs.
Second, it makes Python files more readable and IDE-friendly
With if __name__ == “__main__”: it’s like having an entry function for Python programs, where all variables are defined and used, and we can clearly know where the logic of the program starts (and of course we need to consciously put all the logic of the program here)
In fact, this is what PyCharm recommends. When you create a new project, the main.py it creates by default will look like this:
There is also a green run button on the far left of the line where if __name__ == “__main__”:, click it and the program will run from that line.
Why do many good programming languages like C, Java, Golang, and C++ have a main entry function? I think an important reason is that the program entry is unified and easy to read.
Third, multi-process scenario, you must use if main
For example, let’s say you use multiprocessing to do parallel computing and write code like this: import multiprocessing as mp
import multiprocessing as mp
def useful_function(x):
return x * x
print(“processing in parallel”)
with mp.Pool() as p:
results = p.map(useful_function, [1, 2, 3, 4])
print(results)
When you run it, you will find that the program keeps creating processes and also keeps reporting RuntimeError, even if you Ctrl C, you can’t terminate the program. If you add if __name__ == “__main__”: the program will proceed as expected.
import multiprocessing as mp
def useful_function(x):
return x * x
if __name__ == ‘__main__’:
print(“processing in parallel”)
with mp.Pool() as p:
results = p.map(useful_function, [1, 2, 3, 4])
print(results)
Why is this?
In fact, I understand it this way: Python’s multiprogramming means that it starts multiple Python interpreters, each of which imports your script and makes a copy of the global variables and functions for the child process. . Otherwise, the code that creates multiple processes will be imported and will be executed, thus infinitely recursive to create child processes, Python3 will report RuntimeError, the order is to create the process first, and then report the error, so there will be non-stop creation of processes, non-stop error reporting, Ctrl C can not be terminated, only kill the whole terminal. Here is an official explanation [1]
The last words
if __name__ == “__main__”: It’s not mandatory, but I highly recommend it for the three reasons mentioned above, and it’s a Python community convention that corresponds to the Python Zen of clarity over obscurity. Just as _ as a variable name is meant to tell the person reading the code that the variable is not important and will not be used later. When you see a Python script with if __name__ == “__main__”:, you realize that this is an executable script, and that this part of the code will not be executed when imported by another program, which is required in a multiprocessed program.