statslink

linking statistics to all

Some Python interview questions (Part I)

Python is essential for data science, but a lot of Python is not required to be a successful data scientist. However, the unfortunate situation is that most employers test you on a high level of Python. I recommend reviewing an introduction course on Python with some additional review of how pandas integrates with the workflow.

For example, I had one interview where the three Python questions were:

  1. What is the difference between *args and *kwargs?
  2. What is a decorator?
  3. What is iterrows()used for?

The first two are concepts I’ve never had to implement as a data scientist in my two and a half years as a data scientist. The third question is specific to pandas, and I’ve used it maybe once but doesn’t come up as often because I use Apache PySpark which abstracts away row calculations.

Here are some resources to references

Answers according to ChatGPT:

  1. *args and **kwargs are used in function definitions to allow for passing a variable number of arguments to a function.
    • *args (arguments without keywords)
    • **kwargs (arguments specified with a keyword or name)
    • OK, then that begs the question what is a keyword argument? Don’t confuse keyword arguments with a keyword. A keyword is a reserved word that is part of the syntax, and has a special meaning to the Python interpreter, for example def, class, if, else. However a keyword (a.k.a. parameter name) argument refers to key, value pairs (in the form of key=value) when passing in parameters in a function.
#a) arguments without keywords
def print_me(*args):
    for mystring in args:
        print(mystring)
    print(type(args))
print_me('Hello', 'Stats-Link')

### output ###
Hello
Stats-Link
<class 'tuple'>

#b) arguments with keywords
def print_me(**kwargs):
    for mykey, myvalue in kwargs.items():
        print(mykey, myvalue)
    print(kwargs)
    print(type(kwargs))
print_me(a='Hello', b='Stats-Link')

### output ###
a Hello
b Stats-Link
{'a': 'Hello', 'b': 'Stats-Link'}
<class 'dict'>

The *args function is pretty straightforward, the only thing you need to know is that args is stored as a tuple (a collection of ordered items), in this case your two strings 'Hello' and 'World'. A key feature is that these are positional arguments since tuples are ordered, which means the order matters. If you passed in 'World' and then 'Hello' you would get a different output. The lines are printed separately because by nature the print function adds a carriage return ('\n') to every function call.

The **kwargs function is a bit more involved. First notice the two stars ** instead of one star, this is actually what defines the argument as a keyword argument not the name kwargs itself. As you can see from the output, kwargs is actually a dictionary (a collection of key-value pairs) not a tuple. Unlike the previous example, you need to pass in a key=value such as a='Hello' and b='World'. The two keys are a and b, and its corresponding values are 'Hello' and 'World'. Notice that instead of using position order to assign values, here values are assigned to keys because it is a dictionary not a tuple.