Python Memory Profilers

Top 5 Python Memory Profilers

Iryne Somera Developer Tips, Tricks & Resources

According to the Stackoverflow survey of 2019, Python programming language garnered 73.1% approval among developers. It ranks second to Rust and continues to dominate in Data Science and Machine Learning(ML).

Python is a developers’ favorite. It is a high-level language known for its robustness and its core philosophy―simplicity over complexity. However, Python application’s performance is another story. Just like any other application, it has its share of performance issues. 

Most of the time, APM tools such as Retrace can help solve application performance issues. But, what if your Python application has been running for four hours and the server is out of memory? That is a specific problem involving memory resources. 

It is called a memory leak. Developers need to find the culprit. That is when Python memory profilers comes in. 

Let’s explore further. 


New call-to-action

What is are Python memory profilers?

Profiling applications always involve issues such as CPU, memory, etc. However, Python applications are prone to memory management issues. This is primarily because Python is applied to Data Science and ML applications and works with vast amounts of data. Also, Python relies on its Memory Management system by default, instead of leaving it to the user.

As Python code works within containers via a distributed processing framework, each container contains a fixed amount of memory. If the code execution exceeds the memory limit, then the container will terminate. This is when development experiences memory errors. 

However, it is not always the case. There are instances where developers don’t know what’s going on. Maybe an object is hanging to a reference when it’s not supposed to be and builds up over time. Once it reaches its peak, memory problems occur.

The quick-fix solution is to increase the memory allocation. However, it is not practical as this may result in a waste of resources. Also, it may jeopardize the stability of the application due to unpredictable memory spikes. 

Hence, we need the help of Python memory profilers. The purpose of Python memory profilers is to find memory leaks and optimize memory usage in your Python applications. These types of Python memory profilers understand the space efficiency of the code and packages used.  

Top Python Memory Profilers

Although Python automatically manages memory, it needs tools because long-running Python jobs consume a lot of memory. In most cases, these jobs will not return the memory to the operating system until the process ends, even if it properly executes garbage collection

Here is a list of known Python memory profilers:

Pympler

Jean Brouwers, Ludwig Haehne, and Robert Schuppenies built Pympler in August 2008. They introduced the process of pympling, wherein Pympler obtains details of the size and the lifetime of Python objects.

Pympler’s Python memory profiler analyzes the Python object’s memory behavior inside a running application. It provides a complete and stand-alone Python memory profiling solution. Also, it projects possible error in runtime behavior like memory bloat and other “pymples.” 

There are three separate modules inside Pympler. 

  • The asizeof module provides the Python object’s size information.
  • The muppy module caters to the on-line monitoring of a Python application.
  • The Class Tracker module provides off-line analysis of the lifetime of selected Python objects.

First, let’s use asizeof to investigate how much memory certain Python objects consume. 

>>> from pympler import asizeof

>>> obj = [‘i’,3,2,1,(6,4)]

>>> asizeof.asizeof(obj)

192

>>> print (asizeof.asized(obj, detail=1).format())

[‘i’, 3, 2, 1, (6, 4)] size=192 flat=48

    (6, 4) size=64 flat=32

    ‘i’ size=32 flat=32

    3 size=16 flat=16

    2 size=16 flat=16

    1 size=16 flat=16

Second, let’s implement the muppy module: 

>>> from pympler import muppy

>>> allObjects = muppy.get_objects()

>>> len(allObjects)

36189

>>> from pympler import summary

>>> sum = summary.summarize(allObjects)

>>> summary.print_(sum)

types |# objects |total size
========================== |=========== |============
str                        |13262 |1.01 MB
dict                       |2120 |659.99 KB
code                       |3362 |343.73 KB
list                       |2587 |247.46 KB
type                       |639 |242.61 KB
tuple                      |2069 |58.83 KB
set                        |86 |44.57 KB
wrapper_descriptor         |1247 |43.84 KB
builtin_function_or_method |1014 |35.65 KB
method_descriptor          |937 |32.94 KB
abc.ABCMeta                |66 |32.38 KB
weakref                    |818 |28.76 KB
int                        |1612 |24.72 KB
getset_descriptor          |555 |17.34 KB
frozenset                  |90 |16.87 KB

Here, you can view all Python objects in a heap using the muppy module. You can call another summary and compare it to check if some arrays have memory leaks. Learn more about the muppy module here

The third module in the Pympler profiler is the Class Tracker. It tracks the lifetime of objects of certain classes. Thus, it provides insight into instantiation patterns and helps developers understand how specific objects contribute to the memory footprint in the long run.

>>> tr = classtracker.ClassTracker()

>>> tr.track_class(Document)

>>> tr.create_snapshot(description=’Snapshot 1′)

>>> doc = create_document()

>>> tr.create_snapshot(description=’Snapshot 2′)

>>> tr.stats.print_summary()

—- SUMMARY ——————————————————————

Snapshot 1                               active      0     B      average   pct

Snapshot 2                               active      0     B      average   pct

——————————————————————————-

To learn more about Class Tracker, click here.  

Guppy3

Guppy3 (also known as Heapy) is a Python programming environment and a heap analysis toolset. It is a package that contains the following sub-packages:

  • etc – This is a support module that has the Glue protocol module.
  • gsl – The subpackage that contains the Guppy Specification Language implementation. It creates documents and tests from a common source.
  • heapy – The heap analysis toolset provides object information about the heap and displays the information.
  • sets – This contains Bitsets and nodesets.

Guppy3 is a fork of Guppy-PE and was built by Sverker Nilsson for Python 2.

Note: Using this Python memory profiler requires Python 3.5, 3.6, 3.7, or 3.8. This package works for CPython only. Hence, PyPy and other Python compiler implementations are not supported. Also, to use the graphical browser, it needs Tkinter. Plus, threading must be available when using a remote monitor. 

Here is how to take advantage of this Python memory profiler. You can take a snapshot of the heap before and after a critical process. Then compare the total memory and pinpoint possible memory spikes involved within common objects.

>>> from guppy import hpy

>>> h=hpy()

>>> h.heap()

Partition of a set of 34090 objects. Total size = 2366226 bytes.

IndexCount%Size%Cumulative%Kind (class / dict of class)
010279306668732866687328str
14697142595761192644939bytes
22413725168411117813350types.CodeType
368252023808410141621760tuple
444811748687159108567type
5220861501446174122974function
644811309646187219379dict of type
7940835324195572583dict of module
82421565242201224985dict (no owner)
911333407882205303787types.WrapperDescriptorType

<118 more rows. Type e.g. ‘_.more’ to view.>

Memory Profiler

Memory Profiler is a pure Python module that uses the psutil module. It monitors the memory consumption of a Python job process. Also, it performs a line-by-line analysis of the memory consumption of the application. 

The line-by-line memory usage mode works in the same way as the line_profiler

  1. It decorates the function you would like to profile using @profile function.
  2. You can run the script with a special script. For example, use specific arguments to the Python interpreter.

In the following example, let’s have a simple function called my_func. This function creates a list with a specified range. 

@profile

def my_func():

    a=[]

    for i in range(1000):

        a.append(i)

my_func()

This outputs:

Line #Mem UsageIncrementLine Contents
113.859 MiB13.859 MiB@profile
2def my_func():
313.859 MiB0.000 MiBa=[]
413.859 MiB0.000 MiBfor i in range(1000):
513.859 MiB0.000 MiBa.append(i)

The first column is the line number of the profiled code. Mem usage is the memory usage of the Python interpreter after every code execution. The third column (Increment) represents the difference in memory of the current line to the last one. The last column (Line Contents) displays the profiled codes. 

To see how this Python memory profiler works, let’s change the range value to 1000000 in the function above and execute it. Here is the output:

Line #Mem usageIncrementLine Contents
113.844 MiB13.844 MiB@profile
2def my_func():
313.844 MiB0.000 MiBa=[]
433.387 MiB0.016 MiBfor i in range(1000000):
533.387 MiB 0.293 MiBa.append(i)

Line 4 and 5 show an increase in memory usage, proving that this profiler performs a line-by-line analysis of memory consumption.

Fil 

Fil profiler is an open-source Python memory profiler. It is suitable for data processing and scientific computing applications. Currently, it is still in the development stage and runs on Linux and macOS only. 

Most Data Scientists and Python developers face memory problems with the Python data pipeline. When it uses too much memory, it is difficult to pinpoint where exactly all the memory is going. 

For example, let’s cite two scenarios:

Servers

As servers are running non-stop, memory leaks are often the cause of performance failure. Developers neglect small amounts of memory leakage as most servers process small amounts of data at a time. However, these can add up to tens of thousands of calls. As a result, this might create severe production issues over time.

Data pipelines 

When processing large chunks of data, spikes in memory usage bring huge threats to data pipelines. For example, if your application uses 1GB RAM for quite some time and then suddenly needs 16GB RAM. There is a great need to identify what causes sudden memory spikes.

That is Fil’s main goal―to diagnose memory usage spikes, regardless of the amount of data being processed. It pinpoints where exactly the peak memory usage is and what code is responsible for that spike. 

Although there are existing Python memory profilers that measure memory usage, it has limitations. One of which is dealing with vast amounts of data―batch processing. Python applications are mostly batch processing applications wherein they constantly read data, process it, and output the result.

That problem is answered by our next profiler.

Blackfire

For a highly dynamic language like Python, most developers experience memory issues during deployment. This leads to some confusion as to what happens to memory usage. Developers tend to perform optimizations but don’t have the right tools to use.

Blackfire is a proprietary Python memory profiler (maybe the first. It uses Python’s memory manager to trace every memory block allocated by Python, including C extensions. Blackfire is new to the field and aims to solve issues in memory leaks such as: 

  • large objects in memory which are not released
  • reference cycles
  • invalid reference counting in C extensions causing memory leaks
  • sudden memory spikes

With these use cases, Blackfire assures users that it has a very limited overhead and does not impact end-users because it measures the Python application’s memory consumption at the function call level. 

Blackfire Python memory profiler uses PyMem_SetAllocator API to trace memory allocations like tracemalloc. At present, Blackfire supports Python versions 3.5 and up. You can visit its site to learn more. 

Profiling with Retrace

If you’re working with Python, you somehow experience that it doesn’t immediately release memory back to the operating system. Therefore, you run it in a separate process to ensure that memory is released after executing a piece of code. This is done through a useful approach called “small test case.” This process allows running only the memory leakage code in question. 

When dealing with large amounts of data, use a subset of the randomly sampled data. Also, run memory-intensive tasks in separate processes and use debuggers to add references to objects. However, consider that using a breakpoint debugger such as pdb allows any objects created and referenced manually from the debugger will remain in the memory profile. This will result in a false sense of memory leaks since objects are not released on time. Additionally, consider looking into packages that can be leaky. There are Python libraries that could potentially have memory leaks. 

By now, you already know how Python memory profilers work and the common memory problems with Python. But tools like Retrace with centralized logging, error tracking, and code profiling can help you diagnose Python issues on a larger scale. Retrace from Stackify will help you deal with any kinds of performance pitfalls and keep your code running well. 

Start your 14-day FREE Retrace trial today!  

About Iryne Somera

Iryne Somera is a professor in the Department of Computer Engineering. She loves to research and write articles related to computing technology such as Computer Hardware Fundamentals, Management Information Systems, Software Development and Project Management. In her spare time, she loves to experiment in her kitchen with her healthy breakfast ideas.