immutable and mutable type in Python

Mutable vs Immutable Objects
In general, data types in Python can be distinguished based on whether objects of the type are mutable or immutable. The content of objects of immutable types cannot be changed after they are created.

Some immutable types
int, float, complex
str
bytes
tuple
frozenset
bool
Some mutable types
array
bytearray
list
set
dict
Only mutable objects support methods that change the object in place, such as reassignment of a sequence slice, which will work for lists, but raise an error for tuples and strings.

It is important to understand that variables in Python are really just references to objects in memory. If you assign an object to a variable as below,

a = 1
s = ‘abc’
l = [‘a string’, 456, (‘a’, ‘tuple’, ‘inside’, ‘a’, ‘list’)]
all you really do is make this variable (a, s, or l) point to the object (1, ‘abc’, [‘a string’, 456, (‘a’, ‘tuple’, ‘inside’, ‘a’, ‘list’)]), which is kept somewhere in memory, as a convenient way of accessing it. If you reassign a variable as below

a = 7
s = ‘xyz’
l = [‘a simpler list’, 99, 10]
you make the variable point to a different object (newly created ones in our examples). As stated above, only mutable objects can be changed in place (l[0] = 1 is ok in our example, but s[0] = ‘a’ raises an error). This becomes tricky, when an operation is not explicitly asking for a change to happen in place, as is the case for the += (increment) operator, for example. When used on an immutable object (as in a += 1 or in s += ‘qwertz’), Python will silently create a new object and make the variable point to it. However, when used on a mutable object (as in l += [1,2,3]), the object pointed to by the variable will be changed in place. While in most situations, you do not have to know about this different behavior, it is of relevance when several variables are pointing to the same object. In our example, assume you set p = s and m = l, then s += ‘etc’ and l += [9,8,7]. This will change s and leave p unaffected, but will change both m and l since both point to the same list object. Python’s built-in id() function, which returns a unique object identifier for a given variable name, can be used to trace what is happening under the hood.
Typically, this behavior of Python causes confusion in functions. As an illustration, consider this code:

def append_to_sequence (myseq)
myseq += (9,9,9)
return myseq

tuple1 = (1,2,3) # tuples are immutable
list1 = [1,2,3] # lists are mutable

tuple2 = append_to_sequence(tuple1)
list2 = append_to_sequence(list1)

print 'tuple1 = ', tuple1 # outputs (1, 2, 3)
print 'tuple2 = ', tuple2 # outputs (1, 2, 3, 9, 9, 9)
print 'list1 = ', list1 # outputs [1, 2, 3, 9, 9, 9]
print 'list2 = ', list2 # outputs [1, 2, 3, 9, 9, 9]
This will give the above indicated, and usually unintended, output. myseq is a local variable of the append_to_sequence function, but when this function gets called, myseq will nevertheless point to the same object as the variable that we pass in (t or l in our example). If that object is immutable (like a tuple), there is no problem. The += operator will cause the creation of a new tuple, and myseq will be set to point to it. However, if we pass in a reference to a mutable object, that object will be manipulated in place (so myseq and l, in our case, end up pointing to the same list object).

Links:
3.1. Objects, values and types, The Python Language Reference, docs.python.org
5.6.4. Mutable Sequence Types, The Python Standard Library, docs.python.org
object:实际的数据, variable:object reference是对实际数据的一个引用,类似于指针
所有object内容都是独一无二的,且都有自己独一无的ID,可以用id()函数来获取
a=1
b=1
id(a)#结果:1454604336
id(b)#结果:1454604336
从上面可以看出,a和b指向的是同一个object,而不是有两个内容为1的object,分别有a和b指向。这样做可以节省内存。
immutable and mutable type in Python
object是不可以被改变的,只能往object池中添加或者拿出object,当object没有任何reference指向它时,python的垃圾回收机制会销毁它。
basic type的变量(int,str等)只包含一个object的id(类似于地址),collection type的变量(list,set等)其实是包含了多个object的id,并不直接包含object。mutable type都是collection type,python之所以这样设计背后的逻辑可能是:假设a,b指向同一个list,当改变a中元素的内容后,如果b不变,那就要为a,b各自保留object id列表(这是使用deepcopy时所达到的效果),如果这个list很长那需要占用的空间会较大,不利于节省内存,所以其背后的逻辑还是节省内存。

当你想用for item in items循环来改变interator中immutable元素(int,str)的值时,当你在循环体重改变item的值后,items中的原始值并不会发生改变
例如:
a=[1,2]
for i in a:
i += 1
运行之后a中的值并不会改变。其执行过程大致如下:
a调用next()函数把a中的元素返回给i,即将i指向a中的下一个元素
执行 i+=1时,新的object会产生,然后把i中的object id改为新的object,而a中object id并不会改变,该过程可以用id函数追踪到(https://softwareengineering.stackexchange.com/questions/341179/why-does-python-only-make-a-copy-of-the-individual-element-when-iterating-a-list)
所以当需要在循环体中改变list的immutable元素时,不要用for item in items的形式,要通过index来循环整个list,这样可以改变list中object id列表

The primary reason that this doesn’t work as you expect is because in Python, when you write:

i += 1
it is not doing what you think it’s doing. Integers are immutable. This can be seen when you look into what the object actually is in Python:

a = 0
print(‘ID of the first integer:’, id(a))
a += 1
print(‘ID of the first integer +=1:’, id(a))
The id function represents a unique and constant value for an object in it’s lifetime. Conceptually, it maps loosely to a memory address in C/C++. Running the above code:

ID of the first integer: 140444342529056
ID of the first integer +=1: 140444342529088
This means the first a is no longer the same as the second a, because their id’s are different. Effectively they are at different locations in memory.

With an object, however, things work differently. I have overwritten the += operator here:

class CustomInt:
def iadd(self, other):
# Override += 1 for this class
self.value = self.value + other.value
return self

def init(self, v):
self.value = v

ints = []
for i in range(5):
int = CustomInt(i)
print(‘ID={}, value={}’.format(id(int), i))
ints.append(int)

for i in ints:
i += CustomInt(i.value)

print("######")
for i in ints:
print(‘ID={}, value={}’.format(id(i), i.value))
Running this results in the following output:

ID=140444284275400, value=0
ID=140444284275120, value=1
ID=140444284275064, value=2
ID=140444284310752, value=3
ID=140444284310864, value=4

ID=140444284275400, value=0
ID=140444284275120, value=2
ID=140444284275064, value=4
ID=140444284310752, value=6
ID=140444284310864, value=8
Notice that the id attribute in this case is actually the same for both iterations, even though the value of the object is different (you could also find the id of the int value the object holds, which would be changing as it is mutating - because integers are immutable).

Compare this to when you run the same exercise with an immutable object:

ints_primitives = []
for i in range(5):
int = i
ints_primitives.append(int)
print(‘ID={}, value={}’.format(id(int), i))

print("######")
for i in ints_primitives:
i += 1
print(‘ID={}, value={}’.format(id(int), i))

print("######")
for i in ints_primitives:
print(‘ID={}, value={}’.format(id(i), i))
This outputs:

ID=140023258889248, value=0
ID=140023258889280, value=1
ID=140023258889312, value=2
ID=140023258889344, value=3
ID=140023258889376, value=4

ID=140023258889280, value=1
ID=140023258889312, value=2
ID=140023258889344, value=3
ID=140023258889376, value=4
ID=140023258889408, value=5

ID=140023258889248, value=0
ID=140023258889280, value=1
ID=140023258889312, value=2
ID=140023258889344, value=3
ID=140023258889376, value=4
A few things here to notice. First, in the loop with the +=, you are no longer adding to the original object. In this case, because ints are among the immutable types in Python, python uses a different id. Also interesting to note that Python uses the same underlying id for multiple variables with the same immutable value:

a = 1999
b = 1999
c = 1999

print(‘id a:’, id(a))
print(‘id b:’, id(b))
print(‘id c:’, id©)

id a: 139846953372048
id b: 139846953372048
id c: 139846953372048
tl;dr - Python has a handful of immutable types, which cause the behavior you see. For all mutable types, your expectation is correct.