According to the Stackoverflow survey of 2019, Python programming language garnered 73.1% approval among developers. It ranks second to Rust and continues to dominate in Data Science and Machine Learning(ML).
Python is a developers’ favorite. It is a high-level language known for its robustness and its core philosophy―simplicity over complexity. However, Python application’s performance is another story. Just like any other application, it has its share of performance issues.
Most of the time, APM tools such as Retrace can help solve application performance issues. But, what if your Python application has been running for four hours and the server is out of memory? That is a specific problem involving memory resources.
It is called a memory leak. Developers need to find the culprit. That is when Python memory profilers comes in.
Let’s explore further.
Profiling applications always involve issues such as CPU, memory, etc. However, Python applications are prone to memory management issues. This is primarily because Python is applied to Data Science and ML applications and works with vast amounts of data. Also, Python relies on its Memory Management system by default, instead of leaving it to the user.
As Python code works within containers via a distributed processing framework, each container contains a fixed amount of memory. If the code execution exceeds the memory limit, then the container will terminate. This is when development experiences memory errors.
However, it is not always the case. There are instances where developers don’t know what’s going on. Maybe an object is hanging to a reference when it’s not supposed to be and builds up over time. Once it reaches its peak, memory problems occur.
The quick-fix solution is to increase the memory allocation. However, it is not practical as this may result in a waste of resources. Also, it may jeopardize the stability of the application due to unpredictable memory spikes.
Hence, we need the help of Python memory profilers. The purpose of Python memory profilers is to find memory leaks and optimize memory usage in your Python applications. These types of Python memory profilers understand the space efficiency of the code and packages used.
Although Python automatically manages memory, it needs tools because long-running Python jobs consume a lot of memory. In most cases, these jobs will not return the memory to the operating system until the process ends, even if it properly executes garbage collection.
Here is a list of known Python memory profilers:
Jean Brouwers, Ludwig Haehne, and Robert Schuppenies built Pympler in August 2008. They introduced the process of pympling, wherein Pympler obtains details of the size and the lifetime of Python objects.
Pympler’s Python memory profiler analyzes the Python object’s memory behavior inside a running application. It provides a complete and stand-alone Python memory profiling solution. Also, it projects possible error in runtime behavior like memory bloat and other “pymples.”
There are three separate modules inside Pympler.
First, let’s use asizeof to investigate how much memory certain Python objects consume.
>>> from pympler import asizeof
>>> obj = [‘i’,3,2,1,(6,4)]
>>> print (asizeof.asized(obj, detail=1).format())
[‘i’, 3, 2, 1, (6, 4)] size=192 flat=48
(6, 4) size=64 flat=32
‘i’ size=32 flat=32
3 size=16 flat=16
2 size=16 flat=16
1 size=16 flat=16
Second, let’s implement the muppy module:
>>> from pympler import muppy
>>> allObjects = muppy.get_objects()
>>> from pympler import summary
>>> sum = summary.summarize(allObjects)
|types |||# objects |||total size|
|========================== |||=========== |||============|
|str |||13262 |||1.01 MB|
|dict |||2120 |||659.99 KB|
|code |||3362 |||343.73 KB|
|list |||2587 |||247.46 KB|
|type |||639 |||242.61 KB|
|tuple |||2069 |||58.83 KB|
|set |||86 |||44.57 KB|
|wrapper_descriptor |||1247 |||43.84 KB|
|builtin_function_or_method |||1014 |||35.65 KB|
|method_descriptor |||937 |||32.94 KB|
|abc.ABCMeta |||66 |||32.38 KB|
|weakref |||818 |||28.76 KB|
|int |||1612 |||24.72 KB|
|getset_descriptor |||555 |||17.34 KB|
|frozenset |||90 |||16.87 KB|
Here, you can view all Python objects in a heap using the muppy module. You can call another summary and compare it to check if some arrays have memory leaks. Learn more about the muppy module here.
The third module in the Pympler profiler is the Class Tracker. It tracks the lifetime of objects of certain classes. Thus, it provides insight into instantiation patterns and helps developers understand how specific objects contribute to the memory footprint in the long run.
>>> tr = classtracker.ClassTracker()
>>> tr.create_snapshot(description=’Snapshot 1′)
>>> doc = create_document()
>>> tr.create_snapshot(description=’Snapshot 2′)
—- SUMMARY ——————————————————————
Snapshot 1 active 0 B average pct
Snapshot 2 active 0 B average pct
To learn more about Class Tracker, click here.
Guppy3 (also known as Heapy) is a Python programming environment and a heap analysis toolset. It is a package that contains the following sub-packages:
Guppy3 is a fork of Guppy-PE and was built by Sverker Nilsson for Python 2.
Note: Using this Python memory profiler requires Python 3.5, 3.6, 3.7, or 3.8. This package works for CPython only. Hence, PyPy and other Python compiler implementations are not supported. Also, to use the graphical browser, it needs Tkinter. Plus, threading must be available when using a remote monitor.
Here is how to take advantage of this Python memory profiler. You can take a snapshot of the heap before and after a critical process. Then compare the total memory and pinpoint possible memory spikes involved within common objects.
>>> from guppy import hpy
Partition of a set of 34090 objects. Total size = 2366226 bytes.
|Index||Count||%||Size||%||Cumulative||%||Kind (class / dict of class)|
|6||448||1||130964||6||1872193||79||dict of type|
|7||94||0||83532||4||1955725||83||dict of module|
|8||242||1||56524||2||2012249||85||dict (no owner)|
<118 more rows. Type e.g. ‘_.more’ to view.>
Memory Profiler is a pure Python module that uses the psutil module. It monitors the memory consumption of a Python job process. Also, it performs a line-by-line analysis of the memory consumption of the application.
The line-by-line memory usage mode works in the same way as the line_profiler.
In the following example, let’s have a simple function called my_func. This function creates a list with a specified range.
for i in range(1000):
|Line #||Mem Usage||Increment||Line Contents|
|1||13.859 MiB||13.859 MiB||@profile|
|3||13.859 MiB||0.000 MiB||a=|
|4||13.859 MiB||0.000 MiB||for i in range(1000):|
|5||13.859 MiB||0.000 MiB||a.append(i)|
The first column is the line number of the profiled code. Mem usage is the memory usage of the Python interpreter after every code execution. The third column (Increment) represents the difference in memory of the current line to the last one. The last column (Line Contents) displays the profiled codes.
To see how this Python memory profiler works, let’s change the range value to 1000000 in the function above and execute it. Here is the output:
|Line #||Mem usage||Increment||Line Contents|
|1||13.844 MiB||13.844 MiB||@profile|
|3||13.844 MiB||0.000 MiB||a=|
|4||33.387 MiB||0.016 MiB||for i in range(1000000):|
|5||33.387 MiB||0.293 MiB||a.append(i)|
Line 4 and 5 show an increase in memory usage, proving that this profiler performs a line-by-line analysis of memory consumption.
Fil profiler is an open-source Python memory profiler. It is suitable for data processing and scientific computing applications. Currently, it is still in the development stage and runs on Linux and macOS only.
Most Data Scientists and Python developers face memory problems with the Python data pipeline. When it uses too much memory, it is difficult to pinpoint where exactly all the memory is going.
For example, let’s cite two scenarios:
As servers are running non-stop, memory leaks are often the cause of performance failure. Developers neglect small amounts of memory leakage as most servers process small amounts of data at a time. However, these can add up to tens of thousands of calls. As a result, this might create severe production issues over time.
When processing large chunks of data, spikes in memory usage bring huge threats to data pipelines. For example, if your application uses 1GB RAM for quite some time and then suddenly needs 16GB RAM. There is a great need to identify what causes sudden memory spikes.
That is Fil’s main goal―to diagnose memory usage spikes, regardless of the amount of data being processed. It pinpoints where exactly the peak memory usage is and what code is responsible for that spike.
Although there are existing Python memory profilers that measure memory usage, it has limitations. One of which is dealing with vast amounts of data―batch processing. Python applications are mostly batch processing applications wherein they constantly read data, process it, and output the result.
That problem is answered by our next profiler.
For a highly dynamic language like Python, most developers experience memory issues during deployment. This leads to some confusion as to what happens to memory usage. Developers tend to perform optimizations but don’t have the right tools to use.
Blackfire is a proprietary Python memory profiler (maybe the first. It uses Python’s memory manager to trace every memory block allocated by Python, including C extensions. Blackfire is new to the field and aims to solve issues in memory leaks such as:
With these use cases, Blackfire assures users that it has a very limited overhead and does not impact end-users because it measures the Python application’s memory consumption at the function call level.
Blackfire Python memory profiler uses PyMem_SetAllocator API to trace memory allocations like tracemalloc. At present, Blackfire supports Python versions 3.5 and up. You can visit its site to learn more.
If you’re working with Python, you somehow experience that it doesn’t immediately release memory back to the operating system. Therefore, you run it in a separate process to ensure that memory is released after executing a piece of code. This is done through a useful approach called “small test case.” This process allows running only the memory leakage code in question.
When dealing with large amounts of data, use a subset of the randomly sampled data. Also, run memory-intensive tasks in separate processes and use debuggers to add references to objects. However, consider that using a breakpoint debugger such as pdb allows any objects created and referenced manually from the debugger will remain in the memory profile. This will result in a false sense of memory leaks since objects are not released on time. Additionally, consider looking into packages that can be leaky. There are Python libraries that could potentially have memory leaks.
By now, you already know how Python memory profilers work and the common memory problems with Python. But tools like Retrace with centralized logging, error tracking, and code profiling can help you diagnose Python issues on a larger scale. Retrace from Stackify will help you deal with any kinds of performance pitfalls and keep your code running well.
Start your 14-day FREE Retrace trial today!
If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]