Python is essential for data science, but a lot of Python is not required to be a successful data scientist. However, the unfortunate situation is that most employers test you on a high level of Python. I recommend reviewing an introduction course on Python with some additional review of how pandas integrates with the workflow.
For example, I had one interview where the three Python questions were:
- What is the difference between
*args
and*kwargs
? - What is a decorator?
- What is
iterrows()
used for?
The first two are concepts I’ve never had to implement as a data scientist in my two and a half years as a data scientist. The third question is specific to pandas, and I’ve used it maybe once but doesn’t come up as often because I use Apache PySpark which abstracts away row calculations.
Here are some resources to references
Answers according to ChatGPT:
*args
and**kwargs
are used in function definitions to allow for passing a variable number of arguments to a function.*args
(arguments without keywords)**kwargs
(arguments specified with a keyword or name)- OK, then that begs the question what is a keyword argument? Don’t confuse keyword arguments with a keyword. A keyword is a reserved word that is part of the syntax, and has a special meaning to the Python interpreter, for example
def
,class
,if
,else
. However a keyword (a.k.a. parameter name) argument refers to key, value pairs (in the form ofkey=value
) when passing in parameters in a function.
#a) arguments without keywords
def print_me(*args):
for mystring in args:
print(mystring)
print(type(args))
print_me('Hello', 'Stats-Link')
### output ###
Hello
Stats-Link
<class 'tuple'>
#b) arguments with keywords
def print_me(**kwargs):
for mykey, myvalue in kwargs.items():
print(mykey, myvalue)
print(kwargs)
print(type(kwargs))
print_me(a='Hello', b='Stats-Link')
### output ###
a Hello
b Stats-Link
{'a': 'Hello', 'b': 'Stats-Link'}
<class 'dict'>
The *args
function is pretty straightforward, the only thing you need to know is that args
is stored as a tuple (a collection of ordered items), in this case your two strings 'Hello'
and 'World'
. A key feature is that these are positional arguments since tuples are ordered, which means the order matters. If you passed in 'World'
and then 'Hello'
you would get a different output. The lines are printed separately because by nature the print function adds a carriage return ('\n')
to every function call.
The **kwargs
function is a bit more involved. First notice the two stars **
instead of one star, this is actually what defines the argument as a keyword argument not the name kwargs
itself. As you can see from the output, kwargs
is actually a dictionary (a collection of key-value pairs) not a tuple. Unlike the previous example, you need to pass in a key=value
such as a='Hello'
and b='World'
. The two keys are a
and b
, and its corresponding values are 'Hello'
and 'World'
. Notice that instead of using position order to assign values, here values are assigned to keys because it is a dictionary not a tuple.
Leave a Reply