Inertia is a way to measure how well K-Means has grouped the data. It calculates the average distance between each point and the center of its group (centroid). Lower inertia means better clustering.
# Calculate the distance between two points import numpy as np def calculate_distance(a, b): return np.sqrt((a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2) point1 = [1, 2] point2 = [4, 6] print(calculate_distance(point1, point2)) # Output will be the distance between the two points
In unsupervised learning, we find patterns in data without predefined labels. K-Means clustering groups data into clusters. A good model has low inertia and an appropriate number of clusters, but finding the right number of clusters involves a tradeoff.
from sklearn.cluster import KMeans # Example data data_samples = [[1, 2], [2, 3], [5, 6], [8, 9]] # Create and fit K-Means model model = KMeans(n_clusters=2) model.fit(data_samples) labels = model.predict(data_samples) print(labels) # Output will show the cluster each sample belongs to
To find the best number of clusters (K) for K-Means, use the Elbow method. Plot inertia for different values of K and look for the point where adding more clusters no longer significantly improves the model. This point is called the ‘elbow’.
import matplotlib.pyplot as plt from sklearn.cluster import KMeans # Sample data data_samples = [[1, 2], [2, 3], [5, 6], [8, 9]] # Calculate inertia for different K values inertia = [] for k in range(1, 5): model = KMeans(n_clusters=k) model.fit(data_samples) inertia.append(model.inertia_) # Plot the Elbow graph plt.plot(range(1, 5), inertia, marker='o') plt.xlabel('Number of Clusters') plt.ylabel('Inertia') plt.title('Elbow Method') plt.show()
K-Means is a clustering algorithm that groups data into clusters. The algorithm works in steps: initially, centroids are placed randomly, then data points are assigned to the nearest centroid. The centroids are recalculated, and the process repeats until the centroids no longer move significantly.
from sklearn.cluster import KMeans # Sample data data_samples = [[1, 2], [2, 3], [5, 6], [8, 9]] # Create and fit K-Means model model = KMeans(n_clusters=2) model.fit(data_samples) print(model.cluster_centers_) # Output will show the center points of each cluster
Scikit-Learn provides an easy-to-use implementation of K-Means. You can use it to group your data into clusters by specifying the number of clusters you want, and the algorithm will handle the rest.
from sklearn.datasets import make_blobs from sklearn.cluster import KMeans # Generate sample data data_samples, _ = make_blobs(n_samples=300, centers=4) # Create and fit K-Means model model = KMeans(n_clusters=4) model.fit(data_samples) labels = model.predict(data_samples) print(labels) # Output will show the cluster assignments for each data point
Cross tabulation helps to compare the clustering results with actual categories or labels. It’s a way to see how well the clusters match known groups.
import pandas as pd # Sample data with predicted and actual labels pred_labels = [1, 0, 1, 0] user_labels = [1, 0, 1, 0] # Create a cross-tabulation cross_tab = pd.crosstab(pd.Series(pred_labels), pd.Series(user_labels)) print(cross_tab) # Output will show a table comparing predicted vs actual labels
Convergence in K-Means occurs when the centroids (cluster centers) no longer change significantly with each iteration. The algorithm stops when it reaches this state, meaning clusters are stable.
In the K-Means algorithm, each data point is assigned to the cluster whose centroid is closest. This is done by calculating the distance from the data point to each centroid and selecting the nearest one.
The first step in K-Means involves choosing initial positions for the centroids. These positions are updated iteratively to improve clustering accuracy.
Welcome to our comprehensive collection of programming language cheatsheets! Whether you're a seasoned developer or a beginner, these quick reference guides provide essential tips and key information for all major languages. They focus on core concepts, commands, and functions—designed to enhance your efficiency and productivity.
ManageEngine Site24x7, a leading IT monitoring and observability platform, is committed to equipping developers and IT professionals with the tools and insights needed to excel in their fields.
Monitor your IT infrastructure effortlessly with Site24x7 and get comprehensive insights and ensure smooth operations with 24/7 monitoring.
Sign up now!