Stackify is now BMC. Read theBlog

Python Performance Tuning: 20 Simple Tips

By: Kevin Cunningham
  |  July 26, 2019
Python Performance Tuning: 20 Simple Tips

Python is a powerful and versatile higher-order programming language. Whether you’re developing a web application or working with machine learning, this language has you covered. Python does well at optimizing developer productivity. You can quickly create a program that solves a business problem or fills a practical need. However, the solutions you reach when developing quickly aren’t always optimized for python performance.

When you’re trying to shave seconds—or even minutes—from execution time, it’s good to get a reminder of strategies that might help. The strategies on this list can help you make your applications as fast as possible.

Some of the things on this list might be obvious to you, but others may be less so. Some will have a big impact on execution, and others will have smaller, more subtle effects. Check out this list, and consider bookmarking this page for future reference.

Highway

1. Use list comprehensions.

When you’re working in Python, loops are common. You’ve probably come across list comprehensions before. They’re a concise and speedy way to create new lists.

For example, let’s say you wanted to find the cubes of all the odd numbers in a given range. Using a for loop, that task might look like this:

cube_numbers = []
  for n in range(0,10):
    if n % 2 == 1:
      cube_numbers.append(n**3)

In contrast, a list comprehension approach would just be one line:

cube_numbers = [n**3 for n in range(1,10) if n%2 == 1]

The list comprehension approach is shorter and more concise, of course. More important, it’s notably faster when running in code. As with all these tips, in small code bases that have small ranges, using this approach may not make much of a difference. But in other situations, it may make all the difference when you’re trying to save some time.

2. Remember the built-In functions.

Python comes with a lot of batteries included. You can write high-quality, efficient code, but it’s hard to beat the underlying libraries. These have been optimized and are tested rigorously (like your code, no doubt). Read the list of the built-ins, and check if you’re duplicating any of this functionality in your code.

3. Use xrange() instead of range().

Python 2 used the functions range() and xrange() to iterate over loops. The first of these functions stored all the numbers in the range in memory and got linearly large as the range did. The second, xrange(), returned the generator object. When looping with this object, the numbers are in memory only on demand.

import sys
numbers = range(1, 1000000)
print(sys.getsizeof(numbers))

This returns 8000064, whereas the same range of numbers with xrange returns 40. If your application is in Python 2, then swapping these functions can have a big impact on memory usage. The good news is that Python 3 implements the xrange() functionality by default. So, while there’s no xrange() function, the range() function already acts like this.

4. Consider writing your own generator.

The previous tip hints at a general pattern for optimization—namely, that it’s better to use generators where possible. These allow you to return an item at a time rather than all the items at once. As mentioned, the xrange() function is a generator in Python 2, as is the range() function in Python 3.

If you’re working with lists, consider writing your own generator to take advantage of this lazy loading and memory efficiency. Generators are particularly useful when reading a large number of large files. It’s possible to process single chunks without worrying about the size of the files.

Here’s an example you might use when web scraping and crawling recursively.

import requests
import re

def get_pages(link):
  pages_to_visit = []
  pages_to_visit.append(link)
  pattern = re.compile('https?')
  while pages_to_visit:
    current_page = pages_to_visit.pop(0)
    page = requests.get(current_page)
    for url in re.findall('<a href="([^"]+)">', str(page.content)):
      if url[0] == '/':
        url = current_page + url[1:]
      if pattern.match(url):
        pages_to_visit.append(url)
    yield current_page
webpage = get_pages('http://www.example.com')
for result in webpage:
  print(result)

This example simply returns a page at a time and performs an action of some sort. In this case, you’re printing the link. Without a generator, you’d need to fetch and process at the same time or gather all the links before you started processing. This code is cleaner, faster, and easier to test.

5. Use “in” if possible.

To check if membership of a list, it’s generally faster to use the “in” keyword.

for name in member_list:
  print('{} is a member'.format(name))

6. Be lazy with your module importing.

When you started learning Python, you probably got advice to import all the modules you’re using at the start of your program. Maybe you still sort these alphabetically.

This approach makes it easier to keep track of what dependencies your program has. However, the disadvantage is that all your imports load at startup.

Why not try a different approach? You can load the modules only when you need them. This technique helps distribute the loading time for modules more evenly, which may reduce peaks of memory usage.

Laptop

7. Use sets and unions.

I’ve mentioned loops a few times in this list already. Most experts agree that too much looping puts unnecessary strain on your server. It’s rarely the most efficient approach.

Say you wanted to get the overlapping values in two lists. You could do this using nested for loops, like this:

a = [1,2,3,4,5]
b = [2,3,4,5,6]

overlaps = []
for x in a:
  for y in b:
    if x==y:
      overlaps.append(x)

print(overlaps)

This will print the list [2, 3, 4, 5]. The number of comparisons here will get very large, very quickly.

Another approach would be:

a = [1,2,3,4,5]
b = [2,3,4,5,6]

overlaps = set(a) & set(b)

print(overlaps)

This will print the dictionary {2, 3, 4, 5}. You’re leaning on the built-in functions and getting a big speed and memory bump as a result.

8. Remember to use multiple assignment.

Python has an elegant way to assign the values of multiple variables.

first_name, last_name, city = "Kevin", "Cunningham", "Brighton"

You can use this method to swap the values of variables.

x, y = y, x

This approach is much quicker and cleaner than:

temp = x 
x = y
y = temp

9. Avoid global variables.

Using few global variables is an effective design pattern because it helps you keep track of scope and unnecessary memory usage. Also, Python is faster retrieving a local variable than a global one. So, avoid that global keyword as much as you can.

Python Code

10. Use join() to concatenate strings.

In Python, you can concatenate strings using “+”. However, strings in Python are immutable, and the “+” operation involves creating a new string and copying the old content at each step. A more efficient approach would be to use the array module to modify the individual characters and then use the join() function to re-create your final string.

new = "This" + "is" + "going" + "to" + "require" + "a" + "new" + "string" + "for" + "every" + "word"
print(new)

That code will print:

Thisisgoingtorequireanewstringforeveryword

On the other hand, this code:

new = " ".join(["This", "will", "only", "create", "one", "string", "and", "we", "can", "add", "spaces."])
print(new)

will print:

This will only create one string and we can add spaces.

This is cleaner, more elegant, and faster.

11. Keep up-to-date on the latest Python releases.

The Python maintainers are passionate about continually making the language faster and more robust. In general, each new release of the language has improved python performance and security. Just be sure that the libraries you want to use are compatible with the newest version before you make the leap.

12. Use “while 1” for an infinite loop.

If you’re listening on a socket, then you’ll probably want to use an infinite loop. The normal route to achieve this is to use while True. This works, but you can achieve the same effect slightly faster by using while 1. This is a single jump operation, as it is a numerical comparison.

13. Try another way.

Once you’ve used a coding approach in your application, it can be easy to rely on that method again and again. However, experimenting can allow you to see which techniques are better. Not only will this keep you learning and thinking about the code you write, but it can also encourage you to be more innovative. Think about how you can creatively apply new coding techniques to get faster results in your application.

14. Exit early.

Try to leave a function as soon as you know it can do no more meaningful work. Doing this reduces the indentation of your program and makes it more readable. It also allows you to avoid nested if statements.

if positive_case:
  if particular_example: 
    do_something
else:
  raise exception

You can test the input in a few ways before carrying out your actions. Another approach is to raise the exception early and to carry out the main action in the else part of the loop.

if not positive_case:
  raise exception
if not particular_example:
  raise exception
do_something 

Now you can see what this block of code is trying to achieve at first glance. You don’t need to follow the chain of logic in the conditionals. Also, you can clearly see when this function would raise an exception.

15. Learn itertools.

It’s been called a gem. If you haven’t heard of it, then you’re missing out on a great part of the Python standard library. You can use the functions in itertools to create code that’s fast, memory efficient, and elegant.

Dive into the documentation, and look for tutorials to get the most out of this library. One example is the permutations function. Let’s say you wanted to generate all the permutations of [“Alice”, “Bob”, “Carol”].

import itertools
iter = itertools.permutations(["Alice", "Bob", "Carol"])
list(iter)

This function will return all possible permutations:

[('Alice', 'Bob', 'Carol'),
 ('Alice', 'Carol', 'Bob'),
 ('Bob', 'Alice', 'Carol'),
 ('Bob', 'Carol', 'Alice'),
 ('Carol', 'Alice', 'Bob'),
 ('Carol', 'Bob', 'Alice')]

It’s really useful and blazingly fast!

16. Try decorator caching.

Memoization is a specific type of caching that optimizes software running speeds. Basically, a cache stores the results of an operation for later use. The results could be rendered web pages or the results of complex calculations.

You can try this yourself with calculating the 100th Fibonacci number. If you haven’t come across these numbers, each one is the sum of the previous two numbers. Fibonacci was an Italian mathematician who discovered that these numbers cropped up in lots of places. From the number of petals on a flower to legs on insects or branches on a tree, these numbers are common in nature. The first few are 1, 1, 2, 3, 5.

One algorithm to calculate these is:

def fibonacci(n):
  if n == 0: # There is no 0'th number
    return 0
  elif n == 1: # We define the first number as 1
    return 1
  return fibonacci(n - 1) + fibonacci(n-2)

When I used this algorithm to find the 36th Fibonacci number, fibonacci(36), my computer sounded like it was going to take off! The calculation took five seconds, and (in case you’re curious) the answer was 14,930,352.

When you introduce caching from the standard library, however, things change. It takes only a few lines of code.

import functools

@functools.lru_cache(maxsize=128)
def fibonacci(n):
  if n == 0:
    return 0
  elif n == 1:
    return 1
  return fibonacci(n - 1) + fibonacci(n-2)

In Python, a decorator function takes another function and extends its functionality. We denote these functions with the @ symbol. In the example above, I’ve used the decorator functools.lru_cache function provided by the functools module. I’ve passed the maximum number of items to store in my cache at the same time as an argument. There are other forms of decorator caching, including writing your own, but this is quick and built-in. How quick? Well, this time the calculation took 0.7 seconds, and reassuringly, the answer was the same.

Keys

17. Use keys for sorts.

If you search for some examples of sorting, a lot of the code examples you find will work but could be outdated. Often these examples create a custom sort and cost time in the setup and in performing the sort. The best way to sort items is to use keys and the default sort() method whenever possible. I’ve mentioned already that the built-in functions are generally faster, and this is one of those times.

import operator
my_list = [("Josh", "Grobin", "Singer"), ("Marco", "Polo", "General"), ("Ada", "Lovelace", "Scientist")]
my_list.sort(key=operator.itemgetter(0))
my_list

This will sort the list by the first keys:

[('Ada', 'Lovelace', 'Scientist'),
 ('Josh', 'Grobin', 'Singer'),
 ('Marco', 'Polo', 'General')]

You can easily sort by the second key, like so:

my_list.sort(key=operator.itemgetter(1))
my_list

This will return the list below. You can see it’s sorted by the second names.

[('Josh', 'Grobin', 'Singer'),
 ('Ada', 'Lovelace', 'Scientist'),
 ('Marco', 'Polo', 'General')]

In each case, the list is sorted according to the index you select as part of the key argument. This approach works with numbers and strings, and it’s readable and fast.

18. Don’t construct a set for a conditional.

Sometimes you might find yourself wanting to optimize your code with something like this:

if animal in set(animals):

This idea seems to make sense. There might be a lot of animals, and de-duplicating them feels like it might be faster.

if animal in animals:

Even though there may be significantly more animals in the list to check, the interpreter is optimized so much that applying the set function is likely to slow things down. Checking “in” a long list is almost always a faster operation without using the set function.

19. Use linked lists.

The Python list datatype implements as an array. That means adding an element to the start of the list is a costly operation, as every item has to be moved forward. A linked list is a datatype that may come in handy. It differs from arrays, as each item has a link to the next item in the list—hence the name!

An array needs the memory for the list allocated up front. That allocation can be expensive and wasteful, especially if you don’t know the size of the array in advance.

A linked list lets you allocate the memory when you need it. Each item can be stored in different parts of memory, and the links join the items.

The gotcha here is that lookup times are slower. You’ll need to do some thorough profiling to work out whether this is a better method for you.

20. Use a cloud-based python performance tool.

When you’re working locally, you can use profiling tools that will give you insight into the bottlenecks in your application. If your application will be deployed to the web, however, things are different. Stackify will allow you to see how well your application performs under production load. It also provides code profiling, error tracking, and server metrics. Jump over to the Python section to find out how this could work with your application.

Conclusion

Hopefully, some of these tips will help your code run faster and allow you to get better python performance from your application.

Any list of tips is not going to do your thinking for you. However, this list points out some common pitfalls and poses questions for you to ask of your code. It also encourages you to ask questions about architecture and design that will make your applications run faster and more efficiently.

Start Free Trial

Improve Your Code with Retrace APM

Stackify's APM tools are used by thousands of .NET, Java, PHP, Node.js, Python, & Ruby developers all over the world.
Explore Retrace's product features to learn more.

Learn More

Want to contribute to the Stackify blog?

If you would like to be a guest contributor to the Stackify blog please reach out to [email protected]