Python remembers...
Issues coming from interpreted languages - Python
I don't like dynamically-typed languages much, and Python is no exception. The fact that I can pass whatever I feel like to a function and see it crash at runtime, or have to manually safeguard against it, is a big negative for me. So keeping this in mind, I want to take you through a small look at Python from the C perspective.
Misleading function signatures
Here we have a function declaration that shows one thing but subtly hides another. This function seems to have a "default argument", this means that the function, when it hasn't been given the desired argument will instead use another, in our case an empty list.
Function declarations are parsed by the Python interpreter at runtime and stored as Function objects on the heap. The kicker is that when they are created, the default values given to the parameters of the function are created in memory right then and there.
This means that at the function declaration for add_item, box is initialized already to an empty list. Every subsequent call will keep on adding to that exact same object at the address of box, which produces code that acts interestingly...
# function that adds an item to a list if supplied, and adds to an empty one if none were supplied.... or does it?
def add_item(item, box=[]):
box.append(item)
return box
print(add_item(1)) # [1]
print(add_item(1)) # [1, 1]
print(add_item(1)) # [1, 1, 1]
It should serve as a big warning that JavaScript doesn't even allow this kind of behaviour, if you define that this should run with an empty list, then it will in fact run with an EMPTY list.
And this gets better, because now if we want to store this data somewhere, and then have to add to a specific table you might think that you would need to either specify a table or it gives you an empty one.
a_box = add_item(1)
add_item(1, a_box)
add_item(1, a_box)
add_item(1)
print(a_box) # [1, 1, 1, 1]
But this becomes truly incredible when we have "more" than one list. For your own enjoyment or terror run this and reflect...
a_box = add_item(1)
add_item(2) # implicit add to a_box
add_item(3, a_box) # explicit add to a_box
b_box = add_item(4) # "created" another list which IS the a_box list
add_item(5)
add_item(6, b_box)
print(a_box)
print(b_box)
We are going to add type annotations, and then we will add some "safety" by introducing a conditional. This help with general documentation and it can then be passed to static code analyzers such as Mypy.
from typing import List, Optional
def add_item(item: int, box: Optional[List[int]] = None) -> List[int]:
if box is None:
box = []
box.append(item)
return box
print(add_item(1)) # [1]
print(add_item(1)) # [1]
print(add_item(1)) # [1]
There are two things happening here. We have changed the default value of our parameter to None, and we have added a type hint Optional which represents the existence, or lack thereof, of a value.
By adding the Optional hint around the box we have type "safety" because there are only two logical cases. Either there is no list given, so we make a new one and add the element, or there is already a list given, so we add to the current object. I say "safety" because even though it is defined now as an Optional, at runtime anything may still be passed. This is because types in Python are just suggestions, meaning they aren't enforced anywhere by Python itself, so we need to make manual checks.
Fun detour
If you try to see the id of the boxes created in this function only via function calls, ( meaning you aren't storing the returned box in a variable ) then python will assume you don't need it anymore and recycle it...
which means that these calls will all have the same box id, because Python realizes you don't store the memory anywhere, so the next call fills the SAME spot
print(id(add_item(1))) # 139279713620544
print(id(add_item(1))) # 139279713620544
print(id(add_item(1))) # 139279713620544
# and these calls will have a different id because the box space cannot be reused since on each call we create a new one
a = add_item(1)
print(id(a)) # 134502629954112
b = add_item(1)
print(id(b)) # 134502629952320
c = add_item(1)
print(id(c)) # 134502627583232Connection to C
Here we have function definitions parsed at runtime, and their default arguments stored on the heap. This is a huge difference because now you need to keep track of when things exist manually instead of being assured of that by the declaration. In C, every function call gets its own space to work in; it gets a frame on the stack it can use to work with its local variables, whereas Python keeps these default objects persistent in memory attached to the function definition.
The real cost of flexibility
In Python, everything is an object. This is nice in terms of flexibility because it implies that "everything" is just a vtable away, right? However, everything is fat in Python. There are no strategies to reduce the memory footprint like those employed by many compiled or statically typed languages. Languages like Java, C#, or Go have primitive types—raw memory representations—that hold exactly what they say they do, without the overhead of object headers pointing to class definitions or reference counting or any other fun stuff found in Python.
import sys
print(sys.getsizeof(69)) # 28 bytes
Here we have a few other elements that apply to all integers: a reference count (8 bytes), a type pointer (8 bytes), a variable length indicator (8 bytes), and finally the integer value (4 bytes), giving us a total of 28 bytes.
We have the reference count needed for garbage collection, our "vtable" (Type pointer) which points to the definition of the class, and then we have still another trick up Python's sleeve. It is able to do math on arbitrarily long integers, since Python treats all Integers like BigIntegers, needing that length field to be able to make math still work on the numbers.
Which means you can actually run something like this in your interpreter and it just won't complain...
102938576743887545323458483 * 256565789008798765453234584
This normally requires different types from the standard ones, usually explicitly defined for the types and magnitude of operations to be performed.
There is also a funny part here, since doubles (float in Python) aren't given the promotion to BigIntegers like integers are. To save space when working with "lesser" integers you may specify the actual type to be a double or a float to save memory. Any low-level programmer would have just died a bit inside hearing that.
To go back to our earlier example...
import sys
print(sys.getsizeof(69.00)) # 24 bytes
We have taken out the need for counting the number of digits (the variable length) and we have upgraded the capacity for the actual data stored from 4 bytes to 8. The limit defined by IEEE 754 on the precision of a double/float type is that 53 bits can be used for the integer part and then the rest is dedicated to the mantissa. This means that you can save space without issue if you can prove that all values that your variable will ever have to take are under 2^53, which is approximately nine quadrillion.
Puzzle
Make a function in Python or C that takes 2 integers and returns the largest one, however you cannot use if and else, or match cases.
Solution
static inline int32_t branchless_max(const int32_t a, const int32_t b) {
return (a>=b) * a + (a<b) * b;
}