代码量少,但能实现强大的算法或功能,主打一个程序设计之美
分块
给定具体的大小,定义一个函数以按照这个大小切割列表。
from math import ceil
def chunk(lst, size):
return list(map(lambda x: lst[x*size: x*size + size], list(range(0, ceil(len(lst) / size)))))
chunk([1,2,3,4,5],2)
# [[1,2],[3,4],5]
解包
将打包好的成对列表解开成两组不同的元组。
array = [['a', 'b'], ['c', 'd'], ['e', 'f']]
transposed = zip(*array)
print(transposed)
# [('a', 'c', 'e'), ('b', 'd', 'f')]
展开列表
通过递归的方式将列表的嵌套展开为单个列表。
def spread(arg):
ret = []
for i in arg:
if isinstance(i, list):
ret.extend(i)
else:
ret.append(i)
return ret
def deep_flatten(lst):
result = []
result.extend(spread(list(map(lambda x: deep_flatten(x) if type(x) == list else x, lst))))
return result
deep_flatten([1, [2], [[3], 4], 5]) # [1,2,3,4,5]
将两个列表转化为字典
把两个列表转化为单个字典。
def to_dictionary(keys, values):
return dict(zip(keys, values))
keys = ["a", "b", "c"]
values = [2, 3, 4]
print(to_dictionary(keys, values))
# {'a': 2, 'c': 4, 'b': 3}
元素频率
根据元素频率取列表中最常见的元素。
def most_frequent(list):
return max(set(list), key = list.count)
list = [1,2,1,2,3,2,1,4,2]
most_frequent(list)
Shuffle
打乱列表元素的顺序,它主要会通过 Fisher-Yates 算法对新列表进行排序。
from copy import deepcopy
from random import randint
def shuffle(lst):
temp_lst = deepcopy(lst)
m = len(temp_lst)
while (m):
m -= 1
i = randint(0, m)
temp_lst[m], temp_lst[i] = temp_lst[i], temp_lst[m]
return temp_lst
foo = [1,2,3]
shuffle(foo) # [2,3,1], foo=[1,2,3]
快排
高效的排序算法,时间复杂度:O(n log n)
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
Dijkstra算法
用于计算图中单源最短路径。其核心思想是通过贪心策略逐步扩展最短路径树。
import heapq
def dijkstra(graph, start):
queue = [(0, start)]
distances = {vertex: float('inf') for vertex in graph}
distances[start] = 0
while queue:
current_distance, current_vertex = heapq.heappop(queue)
if current_distance > distances[current_vertex]:
continue
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(queue, (distance, neighbor))
return distances
二分查找
一种在有序数组中查找元素的高效算法,其时间复杂度为O(log n)
def binary_search(arr, target):
low, high = 0, len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
K-means
一种简单且广泛使用的聚类算法,用于将数据点分组到K个簇中。
import numpy as np
def k_means(data, k, max_iters=100):
centroids = data[np.random.choice(range(len(data)), k, replace=False)]
for _ in range(max_iters):
clusters = {i: [] for i in range(k)}
for point in data:
distances = [np.linalg.norm(point - centroid) for centroid in centroids]
cluster_idx = np.argmin(distances)
clusters[cluster_idx].append(point)
for i in range(k):
centroids[i] = np.mean(clusters[i], axis=0)
return centroids, clusters
蒙特卡洛方法(Monte Carlo Method)
通过随机抽样方法估计圆周率π的值。
import random
def estimate_pi(num_samples):
inside_circle = 0
for _ in range(num_samples):
x, y = random.random(), random.random()
if x**2 + y**2 <= 1:
inside_circle += 1
return 4 * inside_circle / num_samples
霍夫曼编码(Huffman Coding)
一种用于无损数据压缩的算法,其核心思想是通过构建最优前缀码来减少数据的大小。
import heapq
from collections import defaultdict, Counter
def huffman_encoding(data):
freq = Counter(data)
heap = [[weight, [char, ""]] for char, weight in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
最长公共子序列(Longest Common Subsequence, LCS)
用于计算两个序列之间的最长公共子序列的长度。
def lcs(X, Y):
m = len(X)
n = len(Y)
L = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m):
for j in range(n):
if X[i] == Y[j]:
L[i + 1][j + 1] = L[i][j] + 1
else:
L[i + 1][j + 1] = max(L[i + 1][j], L[i][j + 1])
return L[-1][-1]