top of page

Shallow and deep copy in Python

  • Michele Iarossi
  • Apr 23, 2021
  • 6 min read

Updated: Apr 30, 2021

There is no doubt that compared to traditional programming languages such as C or Java, Python is easy to learn and use: you don't need a compiler for building your program, you don't need to declare the type and size of variables, it has powerful built-in data types such as lists and dictionaries, it has powerful libraries, it has automatic garbage collection and memory management, it is open source and free. So far so good.


But as any other computer language, Python has also its own subtleties you'll stumble upon as soon as you start writing more complex programs. One of these subtleties concerns variables and the assignment statement when (a) shared references and (b) compound objects are involved.


Comparing assignments in C and Python


To illustrate, let's start with a simple example program written in C:


#include <stdio.h>
#include <string.h>

#define MAX_LENGTH 129u

typedef struct
{
    char attr1[MAX_LENGTH];
    unsigned int attr2;
} MyStruct;

typedef char VarName[MAX_LENGTH];

void printObj(VarName name, MyStruct obj)
{
    printf("\n>>> %s\n",name);
    printf("MyStruct('%s',%d)\n",obj.attr1,obj.attr2);
    return;
}

int main(void)
{
    MyStruct obj1, obj2;

    strcpy(obj1.attr1,"Mike");
    obj1.attr2 = 50u;
    printObj("obj1",obj1);

    obj2 = obj1;
    printObj("obj2",obj2);

    strcpy(obj1.attr1,"John");
    obj1.attr2 = 19u;
    printObj("obj1",obj1);
    printObj("obj2",obj2);
    
    return 0;
}

In the main() function we declare two variables obj1 and obj2 of a type MyStruct , and assign values to the fields of obj1 which we then print out:

    MyStruct obj1, obj2;

    strcpy(obj1.attr1,"Mike");
    obj1.attr2 = 50u;
    printObj("obj1",obj1);
>>> obj1
MyStruct('Mike',50)

We continue by assigning obj1 to obj2 directly, and it seems to work, because when printed obj2 holds the same values:

    obj2 = obj1;
    printObj("obj2",obj2);
>>> obj2
MyStruct('Mike',50)

Unhappy with 'Mike', we decide to change it to 'John' and update obj1 attributes accordingly:

    strcpy(obj1.attr1,"John");
    obj1.attr2 = 19u;
    printObj("obj1",obj1);
>>> obj1
MyStruct('John',19)

But what about obj2 ? Depending on your programming experience, you are probably not surprised by the fact that it holds the same values as before:

    printObj("obj2",obj2);
>>> obj2
MyStruct('Mike',50)

Although the code is pretty trivial, there are actually some key points to note about how C works, i.e.:

  • a variable is an identifier which denotes an object,

  • objects are guaranteed a certain amount of storage for the duration of their lifetime,

  • the amount of storage and the meaning of the values stored in it depend on the variable type,

  • in the simple assignment (=), the value of the right operand replaces the value stored in the object designated by the left operand.

With respect to our example, this means that during the lifetime of the main() function:

  • the variables obj1 and obj2 are linked to specific memory areas where the values of the objects denoted by them are stored,

  • the type MyStruct determines the amount of memory required for each object, i.e. for storing a 128 long string and an unsigned integer,

  • when obj1 is assigned to obj2, the data fields of obj1 are physically copied to the data fields of obj2 ,

  • when obj1 is changed, obj2 is unaffected because it denotes a different object stored at a different memory location than obj1 .

The following picture illustrates the memory state at the end of the program:








Let's code now the same program in Python.


This is the definition of the MyStruct class equivalent to the one in C above:

>>> class MyStruct:
	def __init__(self,a1,a2):
		self.attr1 = a1
		self.attr2 = a2
	def __repr__(self):
		return f'MyStruct({self.attr1!r},{self.attr2!r})'

We create an instance of MyStruct , assign it to obj1 and print it:

>>> obj1 = MyStruct('Mike',50)
>>> obj1
MyStruct('Mike',50)

As done in C, we assign obj1 directly to obj2 :

>>> obj2 = obj1
>>> obj2
MyStruct('Mike',50)

Having saved obj1, we now modify its values as done before in C:

>>> obj1.attr1 = 'John'
>>> obj1.attr2 = 19
>>> obj1
MyStruct('John',19)

And we verify that obj2 holds the same values as before:

>>> obj2
MyStruct('John',19)

But surprise, obj2 has changed as well!


This simple example shows what happens when shared references are involved and helps us illustrate the following key points about how Python works, i.e.:

  • all data in a Python program is represented by objects or by relations between objects,

  • every object has an identity, a type and a value,

  • variables (identifiers) are names,

  • assignment statements are used to bind or rebind names to values and to modify attributes or items of objects.

When we look at the Python code, this means that:

  • the data is represented by the object MyStruct('Mike',50) , which is stored in memory and has a type defined by the class MyStruct; the attribute attr1 is a string and holds the value 'Mike', whereas attribute attr2 is an integer and is assigned the value 50,

  • the variables obj1 and obj2 are just names, they don't have a type and are not tied to any specific object (storage area) in memory,

  • in the assignment of obj1 to MyStruct('Mike',50) , the name 'obj1' is bound to the object MyStruct('Mike',50) in memory, i.e. it becomes a reference to this object. Notice that the assignment in Python does not involve a replacement of the values of the left operand with those of the right operand: this is not possible because 'obj1' is just a name, it has no type and it does not denote any specific object (storage area) in memory where the values of the right operand could be copied to,

  • when obj2 is assigned obj1, obj2 is bound to the same object MyStruct('Mike',50) , i.e. obj1 and obj2 refer now to the same object in memory and share the same reference, which explains the side effect that we see in the example: irrespective of which variable is used, any change done to the attributes of MyStruct('Mike',50) by means of one of them, is seen also by the other, because the underlying object is the same.

The following picture illustrates the memory state at the end of the program:







Copying objects: shallow and deep copy


In order to avoid the shared reference issue above, we need a copy of the object referenced by obj1 and we need to have this copy referenced by obj2.


This can be done by using the function copy() from the module copy.py of the Python Standard Library, which performs a shallow copy of our object:

>>> import copy
>>> obj1 = MyStruct('Mike',50)
>>> obj2 = copy.copy(obj1)

Now obj1 and obj2 refer to different objects in memory:








Having copied the object MyStruct('Mike',50) referenced by obj1, we modify it as done before:

>>> obj1.attr1 = 'John'
>>> obj1.attr2 = 19
>>> obj1
MyStruct('John',19)

and we verify that the copy referenced by obj2 didn't change:

>>> obj2
MyStruct('Mike',50)

No surprise, since different objects are referenced:








The shallow copy is sufficient because MyStruct('Mike',50) does not contain complex objects but only simple values that can be copied right away.


Let's move on and see what happens if compound objects are involved. We can take advantage of Python being a dynamic typed language and have our instance of the class MyStruct store instances of itself:

>>> obj3 = MyStruct('Sandra',96)
>>> obj1 = MyStruct('Mike',obj3)
>>> obj1
MyStruct('Mike',MyStruct('Sandra',96))

The variable obj1 now refers to an object that has as its first attribute a string and as its second attribute a reference to the object MyStruct('Sandra',96) referenced by obj3.


As before, since we want to change obj1 later, we make a copy of it by means of copy.copy():

>>> obj2 = copy.copy(obj1)
>>> obj2
MyStruct('Mike',MyStruct('Sandra',96))

At this point we apply the changes:

>>> obj1.attr1 = 'John'
>>> obj1.attr2.attr2 = 74
>>> obj1
MyStruct('John',MyStruct('Sandra',74))

and hope that the object referenced by obj2 didn't change:

>>> obj2
MyStruct('Mike',MyStruct('Sandra',74))

But alas, it did change! The shallow copy indeed created a new object referenced by obj2 but it just copied the reference to obj3 from obj1 into obj2 as well, with the net result that all variables share now the same referenced object.


The following picture illustrates this:











In order to solve this issue, we need a deep copy of obj1 instead of a shallow one, i.e. a copy where a new reference to a copy of its contained object is created. This can be done by using the function deepcopy() from the module copy.py of the Python Standard Library.


Let's recreate obj1 from obj3 as done previously:

>>> obj3 = MyStruct('Sandra',96)
>>> obj1 = MyStruct('Mike',obj3)
>>> obj1
MyStruct('Mike',MyStruct('Sandra',96))

But now we make a deep copy by means of copy.deepcopy():

>>> obj2 = copy.deepcopy(obj1)
>>> obj2
MyStruct('Mike',MyStruct('Sandra',96))

At this point we apply the changes:

>>> obj1.attr1 = 'John'
>>> obj1.attr2.attr2 = 74
>>> obj1
MyStruct('John',MyStruct('Sandra',74))

and verify that the object referenced by obj2 didn't change:

>>> obj2
MyStruct('Mike',MyStruct('Sandra',96))

The following picture illustrates what happened:












As expected, copy.deepcopy() created a new object from obj1 including a new copy of its contained object obj3. Even if obj1 is changed, obj2 is unaffected.


Conclusion and references


From the examples shown so far, we would like to summarise the key points that we have investigated in this post.


Long story short, remember that in Python:

  • variables are just names without any type

  • variables are references to objects in memory which do have a type

  • variables are (re)bound to any type of object by means of the assignment statement

  • copies of objects are made by calling copy.copy() or copy.deepcopy(), the latter being required if a compound object is involved

Finally, the following references have been consulted for creating this post:

  1. Any working draft of the C language ISO standard available from here

  2. The massive tome Learning Python, 5th edition, written by Mark Lutz

  3. The official Python reference manual

  4. The Python Standard Library documentation about copy.py


Commentaires


©2021 by MathSophy. Proudly created with Wix.com

bottom of page