Introduction to multi-threading
1.What is a thread?
- A thread, also called a lightweight process, is the smallest unit of computing that the operating system can schedule, and it is encapsulated in the process, the actual unit of operation in the process.
- A thread does not own system resources, but only a few resources that are essential for its operation, but it can share all the resources owned by the process with other threads that are part of the same process.
- A thread can create and cancel another thread, and multiple threads in the same process can execute concurrently with each other.
2.Why should we use multi-threading?
Threads are independent, concurrent streams of execution in a program. Compared to separated processes, threads in a process are less isolated from each other, and they share memory, file handles, and state that is expected of other processes.
Because threads are divided on a smaller scale than processes, it makes multi-threaded programs highly concurrent. Processes have separate memory units during execution, while multiple threads share memory, which greatly increases the efficiency of running programs.
Threads have higher performance than processes due to the commonality of threads in the same process, where multiple threads share the virtual space of a process. The shared environment of threads includes process code segments, common data of processes, etc. Using these shared data, communication between threads is easily achieved.
While the operating system must allocate a separate memory space for the re-process and allocate a large amount of related resources when creating a process, creating threads is much simpler. Therefore, using multiple threads to achieve concurrency is much higher performance than using multiple processes.
3.To summarize, using multi-threaded programming has several advantages as follows.
- Memory cannot be shared between processes, but it is very easy to share memory between threads.
- When the operating system creates a process, it needs to reallocate system resources for that process, but the cost of creating threads is much smaller. Therefore using multiple threads for concurrent execution of multiple tasks is more efficient than using multiple processes.
- The python language simplifies multi-threading programming in python by having built-in support for multi-threading functionality, rather than simply as a way to schedule the underlying operating system.
Code example:
import threading
import time,os
”
1、Ordinary creation method
”’
# def run(n):
# print(‘task’,n)
# time.sleep(1)
# print(‘2s’)
# time.sleep(1)
# print(‘1s’)
# time.sleep(1)
# print(‘0s’)
# time.sleep(1)
# print(‘0s’) # time.sleep(1)
# if __name__ == ‘__main__’:
# target is the name of the function to be executed (not the function), and args are the corresponding arguments to the function, in the form of a tuple
# t1 = threading.Thread(target=run,args=(‘t1’,))
# t2 = threading.Thread(target=run,args=(‘t2′,))
# t1.start()
# t2.start()
”’
2. Custom threads: Inherit from threading.
The essence is to refactor the run method in the Thread class
”’
# class MyThread(threading:)
# def __init__(self,n):
# super(MyThread,self). __init__() # Refactor run function must be written
# self.n = n
#
# def run(self):
# print(‘task’,self.n)
# time.sleep(1)
# print(‘2s’)
# time.sleep(1)
# print(‘1s’)
# time.sleep(1)
# print(‘0s’)
# time.sleep(1)
# print(‘0s’) # time.sleep(1)
# if __name__ == ‘__main__’:
# t1 = MyThread(‘t1’)
# t2 = MyThread(‘t2′)
# t1.start()
# t2.start()
”’
3. Guardian threads
In the example below, all the child threads are turned into daemons of the main thread using setDaemon(True).
So when the main thread ends, the child threads will also end, so when the main thread ends, the whole program exits.
The so-called ‘thread guarding’ means that the main thread does not care about the execution of the daemon thread, as long as the other non-daemon sub-threads finish and the main thread finishes executing, the
The main thread will be closed. In other words: the main thread does not wait for the execution of the daemon thread to finish before closing it.
The main thread is considered finished when the other non-guardian threads are finished (the daemon thread is recycled at this point).
Because the end of the main thread means the end of the process, the resources of the process as a whole will be reclaimed.
And the process must ensure that the non-daemon threads are all running before it can end.
”’
# def run(n):
# print(‘task’,n)
# time.sleep(1)
# print(‘3s’)
# time.sleep(1)
# print(‘2s’)
# time.sleep(1)
# print(‘1s’)
#
# if __name__ == ‘__main__’:
# t=threading.Thread(target=run,args=(‘t1’,))
# t.setDaemon(True)
# t.start()
# print(‘end’)
”’
As you can see from the execution result, after setting the daemon, when the main thread ends, the child thread will also end immediately and will no longer execute
”’
”’
4. The main thread waits for the end of the child thread
In order to let the main thread end after the execution of the daemon thread, we can use the join method to let the main thread wait for the execution of the daemon thread to finish before ending.
”’
# def run(n):
# print(‘task’,n)
# time.sleep(2)
# print(‘5s’)
# time.sleep(2)
# print(‘3s’)
# time.sleep(2)
# print(‘1s’)
# if __name__ == ‘__main__’:
# t=threading.Thread(target=run,args=(‘t1’,))
# t.setDaemon(True) # set the child thread as a daemon, must be set before start()
# t.start()
# t.join() # Set the main thread to wait for the child thread to finish
# print(‘end’)
”’
5. Multi-threads share global variables
A thread is the execution unit of a process, and a process is the smallest execution unit for the system to allocate resources, so multiple threads in the same process are sharing resources.
”’
# g_num = 100
# def work1():
# global g_num
# for i in range(3):
# g_num+=1
# print(‘in work1 g_num is : %d’ % g_num)
#
# def work2():
# global g_num
# print(‘in work2 g_num is : %d’ % g_num)
#
# if __name__ == ‘__main__’:
# t1 = threading.Thread(target=work1)
# t1.start()
# time.sleep(1)
# t2 = threading.
# t2.start()
”’
6.Mutual exclusion locks (Lock)
Since threads are scheduled randomly among themselves, dirty data may occur when multiple threads modify the same data at the same time.
So there is a thread lock, that is, only one thread is allowed to perform certain operations at the same time.
Thread locks are used to lock resources and multiple locks can be defined, like the following code, when a resource needs to be exclusive
Any lock can lock this resource, just like you can lock this same door with different locks.
Since threads are scheduled randomly among themselves, if more than one thread operates on an object at the same time
If the object is not well protected, the result will be unpredictable, which is also called “thread insecurity”.
In order to prevent the above situation, there is a mutual exclusion lock (Lock)
”’
# def work():
# global n
# lock.acquire()
# temp = n
# time.sleep(0.1)
# n = temp-1
# lock.release()
# temp = n
# temp = n
# if __name__ == ‘__main__’:
# lock = threading.Lock()
# n = 100
# l = []
# for i in range(100):
# p = Thread(target=work)
# l.append(p)
# p.start()
# for p in l:
# p.join()
”’
7.Recursive locks: The RLcok class is exactly the same as the Lock class, but it supports nesting.
The RLock class represents a Reentrant Lock.
For reentrant lock, it can be locked multiple times in the same thread.
It can also be released multiple times. If RLock is used, then the acquire() and release() methods must occur in pairs.
If you call acquire() n times to add a lock, you must call release() n times to release the lock.
It follows that RLock locks are reentrant. That is, the same thread can reload an RLock lock that has already been locked.
The RLock object maintains a counter to keep track of the nested calls to the acquire() method.
A thread must explicitly call the release() method to release the lock after each acquire() call.
So, a method protected by a lock can call another method protected by the same lock.
”’
# def func(lock):
# global gl_num
# lock.acquire()
# gl_num += 1
# time.sleep(1)
# print(gl_num)
# lock.release()
# lock.release()
#
# if __name__ == ‘__main__’:
# gl_num = 0
# lock = threading.RLock()
# for i in range(10):
# t = threading.Thread(target=func,args=(lock,))
# t.start()
”’
8.Semaphore (BoundedSemaphore class)
Interlock allows only one thread to change the data at the same time, while the Semaphore is a certain number of threads are allowed to change the data at the same time.
For example, if there are 3 pits in the toilet, only 3 people are allowed to go to the toilet, and the people behind them can only wait for someone to come out before they can go in.
”’
# def run(n,semaphore):
# semaphore.acquire() # add lock
# time.sleep(3)
# print(‘run the thread:%s\n’ % n)
# semaphore.release() # release
# release
# release
# if __name__== ‘__main__’:
# num=0
# semaphore = threading.BoundedSemaphore(5) # allow up to 5 threads to run simultaneously
# for i in range(22):
# t = threading.Thread(target=run,args=(‘t-%s’ % i,semaphore))
# t.start()
# while threading.active_count() ! =1:
# pass
# else:
# print(‘———-all threads done———–‘)
9.python thread events
Used for the main thread to control the execution of other threads, events are a simple thread synchronization object that provides the following methods.
clear sets the flag to False
set sets the flag to True
is_set determines whether the flag is set or not
wait will always listen to flag, if no flag is detected it will always be in a blocking state
Event handling mechanism: A flag is defined globally.
When the value of the flag is False, event.wait() will block.
When the flag value is True, event.wait() will no longer block
”’
event = threading.
def lighter():
count = 0
event.set() # initializer is green
while True:
if 5 < count <= 10:
event.clear() # red light, clear flag bit
print(“\33[41;lmred light is on… \033[0m]”)
elif count > 10:
event.set() # green light, set flag bit
count = 0
else:
print(‘\33[42;lmgreen light is on… \033[0m’)
time.sleep(1)
count += 1
def car(name):
while True:
if event.is_set(): # Determine if the flag bit is set
print(‘[%s] running …..’ %name)
time.sleep(1)
else:
print(‘[%s] sees red light, waiting…’ %name)
event.wait()
print(‘[%s] green light is on, starting going…’ %name)
# startTime = time.time()
light = threading.Thread(target=lighter,)
light.start()
car = threading.Thread(target=car,args=(‘MINT’,))
car.start()
endTime = time.time()
# print(‘Time taken:’,endTime-startTime)
”’
GIL global interpreter
In non-python environments, only one task can be executed at a time in the single-core case.
In multi-core, multiple threads can be supported for simultaneous execution.
But in python, no matter how many cores there are, only one thread can be executed at the same time.
The reason for this is due to the existence of GIL.
GIL is a global interpreter, which is a decision made by the design of python for the sake of data security.
If a thread wants to execute, it must first get the GIL, and we can
We can think of the GIL as a “pass”, and there is only one GIL in a python process.
If you can’t get a pass for a thread, and there is only one GIL in a python process, you can think of it as a “pass”.
Threads that do not get a pass are not allowed to execute on the CPU.
GIL is only available in cpython, because cpython calls the native threads of the c language.
So he cannot directly manipulate the CPU, but can only use GIL to ensure that only one thread can get the data at the same time.
And in pypy and jpython there is no GIL. python calls the native procedure of c language when using multi-threading.
”’
”’
The efficiency of python execution is also different for different types of code
1. CPU-intensive code (various loop processing, calculations, etc.), in which case, due to the amount of computational work
ticks technique will soon reach the threshold, and then set off for GIL release and re-competition
(switching back and forth between multiple threads is of course resource-consuming).
So multi-threading under python is not friendly to CPU-intensive code.
2, IO-intensive code (file processing, web crawlers and other design file read and write operations).
Multi-threading can effectively improve efficiency (single-threaded IO operations will be IO-waiting, resulting in unnecessary time waste.
cause unnecessary time waste, and open multi-threaded can automatically switch to thread B when waiting for thread A.
(which can improve the execution efficiency of the program by not wasting CPU resources).
So python’s multithreading is more friendly to IO-intensive code.
”’
”’
It mainly depends on the type of task, which we classify as I/O-intensive and computation-intensive.
And multithreading in switching is divided into I/O switching and time switching.
If the task is I/O-intensive.
If we do not use multithreading, when we perform I/O operations, we must wait for the previous I/O task to complete before the next I/O task can be performed.
During this waiting process, the CPU is in a waiting state, and at this time, if multi-threading is used, the CPU can switch to perform another I/O task.
If multi-threading is used, it is possible to switch to another I/O task. This way, the CPU can be fully utilized to avoid CPU idle state and improve efficiency.
However, if the multi-threaded tasks are computational, the CPU will keep working until a certain time.
After a certain period of time, the CPU will be working until it switches between threads in a multi-threaded time switch, at which point the CPU will be in a working state.
In this case, it does not improve the performance, but may also cause a waste of time and resources when switching multi-threaded tasks, resulting in a decrease in performance.
This is what causes the above two types of multi-threading. This is the explanation for why the above two multi-threading results do not work.
Conclusion: For I/O-intensive tasks, multi-threading is recommended, and multi-processing + co-processing can also be used
(example: crawlers mostly use multi-threaded processing of crawled data).
For computation-intensive tasks, python is not applicable at this time.
”’