Kaggle micro-courses 之 Python
Kaggle官方教程,用于data science必备的Python知识。
一、syntax,varibles and numbers
代码范例一:
spam_amount = 0
print(spam_amount)
# Ordering Spam, egg, Spam, Spam, bacon and Spam (4 more servings of Spam)
spam_amount = spam_amount + 4
if spam_amount > 0:
print("But I don't want ANY spam!")
viking_song = "Spam " * spam_amount
print(viking_song)
注意点:
1.Variable assignment
spam_amount = 0
2.Function calls
print(spam_amount)
3.comment
# Ordering Spam, egg, Spam, Spam, bacon and Spam (4 more servings of Spam)
4.number types: int、float
5.string
6.arithmetic
二、functions and getting help
1.getting help
help(round)
2.Defining functions
def least_difference(a, b, c):
diff1 = abs(a - b)
diff2 = abs(b - c)
diff3 = abs(a - c)
return min(diff1, diff2, diff3)
3.Docstrings
def least_difference(a, b, c):
"""Return the smallest difference between any two numbers
among a, b and c.
>>> least_difference(1, 5, -5)
4
"""
diff1 = abs(a - b)
diff2 = abs(b - c)
diff3 = abs(a - c)
return min(diff1, diff2, diff3)
4.Default arguments
print(1, 2, 3)
print(1, 2, 3, sep=' < ')
def greet(who="Colin"):
print("Hello,", who)
greet()
greet(who="Kaggle")
# (In this case, we don't need to specify the name of the argument, because it's unambiguous.)
greet("world")
三、Booleans and Conditionals
1.Booleans
x = True
print(x)
print(type(x))
2.Comparison Operations
3.Combining Boolean Values: and, or, and not.
True or True and False
4.Conditionals
def inspect(x):
if x == 0:
print(x, "is zero")
elif x > 0:
print(x, "is positive")
elif x < 0:
print(x, "is negative")
else:
print(x, "is unlike anything I've ever seen...")
inspect(0)
inspect(-15)
5.Boolean conversion
print(bool(1)) # all numbers are treated as true, except 0
print(bool(0))
print(bool("asf")) # all strings are treated as true, except the empty string ""
print(bool(""))
# Generally empty sequences (strings, lists, and other types we've yet to see like lists and tuples)
# are "falsey" and the rest are "truthy"
四、lists
1.Lists: represent ordered sequences of values.
primes = [2, 3, 5, 7]
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
hands = [
['J', 'Q', 'K'],
['2', '2', '2'],
['6', 'A', 'K'], # (Comma after the last element is optional)
]
# (I could also have written this on one line, but it can get hard to read)
hands = [['J', 'Q', 'K'], ['2', '2', '2'], ['6', 'A', 'K']]
# A list can contain a mix of different types of variables
my_favourite_things = [32, 'raindrops on roses', help]
2.Indexing
planets[0]
# Elements at the end of the list can be accessed with negative numbers, starting from -1
planets[-1]
3.Slicing
# 左闭右开的区间
planets[0:3]
planets[:3]
planets[3:]
# All the planets except the first and last
planets[1:-1]
# The last 3 planets
planets[-3:]
4.List functions
# How many planets are there?
len(planets)
# The planets sorted in alphabetical order
sorted(planets)
primes = [2, 3, 5, 7]
sum(primes)
max(primes)
5.List methods
# Pluto is a planet darn it!
planets.append('Pluto')
planets.pop()
planets.index('Earth')
# we can use the in operator to determine whether a list contains a particular value
# Is Earth a planet?
"Earth" in planets
# to learn about all the methods and attributes attached to a particular object
help(planets)
6.Tuples
# Tuples are almost exactly the same as lists. They differ in just two ways.
# 1: The syntax for creating them uses parentheses instead of square brackets
t = (1, 2, 3)
t = 1, 2, 3 # equivalent to above
# 2: They cannot be modified (they are immutable).
t[0] = 100 #TypeError
五、Loops and List Comprehensions
1.for loops
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
for planet in planets:
print(planet, end=' ') # print all on same line
# The for loop specifies: the variable name to use (in this case, planet)、the set of values to loop over (in this case, planets)
# range() is a function that returns a sequence of numbers
for i in range(5):
print("Doing important work. i =", i)
2.while loops
i = 0
while i < 10:
print(i, end=' ')
i += 1
3.List comprehensions
squares = [n**2 for n in range(10)]
short_planets = [planet for planet in planets if len(planet) < 6]
loud_short_planets = [planet.upper() + '!' for planet in planets if len(planet) < 6]
# you might find the structure clearer when it's split up over 3 lines
# Continuing the SQL analogy, you could think of these three lines as SELECT, FROM, and WHERE
[
planet.upper() + '!'
for planet in planets
if len(planet) < 6
]
# compare the following two cells of code that do the same thing.
def count_negatives(nums):
"""Return the number of negative numbers in the given list.
>>> count_negatives([5, -1, -2, 0, 3])
2
"""
n_negative = 0
for num in nums:
if num < 0:
n_negative = n_negative + 1
return n_negative
def count_negatives(nums):
return len([num for num in nums if num < 0])
def count_negatives(nums):
# Reminder: in the "booleans and conditionals" exercises, we learned about a quirk of
# Python where it calculates something like True + True + False + True to be equal to 3.
return sum([num < 0 for num in nums])
六、Strings and Dictionaries
1.String syntax
# strings in Python can be defined using either single or double quotations
x = 'Pluto is a planet'
y = "Pluto is a planet"
x == y
# Double quotes are convenient if your string contains a single quote character
print("Pluto's a planet!")
print('My dog is named "Pluto"')
2.Strings are sequences
# Indexing
planet = 'Pluto'
planet[0]
# Slicing
planet[-3:]
# How long is this string?
len(planet)
# Yes, we can even loop over them
[char+'! ' for char in planet]
# But a major way in which they differ from lists is that they are immutable. We can't modify them.
3.String methods
# ALL CAPS
claim = "Pluto is a planet!"
claim.upper()
# all lowercase
claim.lower()
# Searching for the first index of a substring
claim.index('plan')
4.Going between strings and lists
words = claim.split()
words
//['Pluto', 'is', 'a', 'planet!']
datestr = '1956-01-31'
year, month, day = datestr.split('-')
'/'.join([month, day, year])
// '01/31/1956'
5.Building strings with format()
position = 9
"{}, you'll always be the {}th planet to me.".format(planet, position)
pluto_mass = 1.303 * 10**22
earth_mass = 5.9722 * 10**24
population = 52910390
# 2 decimal points 3 decimal points, format as percent separate with commas
"{} weighs about {:.2} kilograms ({:.3%} of Earth's mass). It is home to {:,} Plutonians.".format(
planet, pluto_mass, pluto_mass / earth_mass, population,
)
// "Pluto weighs about 1.3e+22 kilograms (0.218% of Earth's mass). It is home to 52,910,390 Plutonians."
# Referring to format() arguments by index, starting from 0
s = """Pluto's a {0}.
No, it's a {1}.
{0}!
{1}!""".format('planet', 'dwarf planet')
print(s)
// Pluto's a planet.
No, it's a dwarf planet.
planet!
dwarf planet!
6.Dictionaries: mapping keys to values
numbers = {'one':1, 'two':2, 'three':3}
numbers['one']
numbers['eleven'] = 11
numbers
numbers['one'] = 'Pluto'
numbers
#dictionary comprehensions
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial
// {'Mercury': 'M',
'Venus': 'V',
'Earth': 'E',
'Mars': 'M',
'Jupiter': 'J',
'Saturn': 'S',
'Uranus': 'U',
'Neptune': 'N'}
#The in operator tells us whether something is a key in the dictionary
'Saturn' in planet_to_initial
# A for loop over a dictionary will loop over its keys
for k in numbers:
print("{} = {}".format(k, numbers[k]))
# To read a full inventory of dictionaries' methods
help(dict)
七、External Libraries
1.Imports
import math
math.pi
import math as mt
mt.pi
from math import *
print(pi, log(32, 2))
from math import log, pi
from numpy import asarray
2.Three tools for understanding strange objects
# 1: type() (what is this thing?)
type(rolls)
//numpy.ndarray
2: dir() (what can I do with it?)
print(dir(rolls))
3: help() (tell me more)
help(rolls.ravel)
4.Operator overloading
# double-underscores are directly related to operator overloading
# Generally, names that follow this double-underscore format have a special meaning to Python
print(dir(list))
// ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']