Standard deviation measures how spread out the values in a dataset are. It tells us how much the data varies from the average. To find it, we calculate the square root of the variance. The result is in the same units as the original data.
import numpy as np values = np.array([1,3,4,2,6,3,4,5]) # calculate standard deviation of values std_dev = np.std(values) std_dev_sample = np.std(values, ddof=1) print(std_dev_sample) # Output: 1.60 (approx.)
Variance measures how much the data values differ from the average. In Python, you can calculate it using the NumPy `var()` function.
import numpy as np values = np.array([1,3,4,2,6,3,4,5]) # calculate variance of values variance_sample = np.var(values, ddof=1) print(variance_sample) # Output: 2.8 (approx.)
Standard deviation gives context to the average of a dataset. For example, if the dataset is [3, 5, 10, 14], with a standard deviation of 4.3, the mean is 8.0. This tells us that 14 is more than one standard deviation away from the average.
To find the standard deviation in Python, use the NumPy `std()` function. It provides a simple way to understand how data values spread out from the average.
Variance shows how spread out the data is. A high variance means the data points are far from the mean. A variance of 0 means all values are the same.
Variance measures the average of the squared differences from the mean. It shows how much the data varies. The result is in squared units of the original data.
Quantiles divide the dataset into equal-sized parts. For example, if we split data into 4 equal parts, the values dividing these parts are called quantiles.
import numpy as np data = [1, 3, 5, 9, 20] Q1 = np.quantile(data, 0.25) Q2 = np.quantile(data, 0.5) # Median Q3 = np.quantile(data, 0.75) print(f"Q1: {Q1}, Q2 (Median): {Q2}, Q3: {Q3}") # Output: Q1: 3.0, Q2 (Median): 5.0, Q3: 9.0
Quartiles split the data into four equal parts. The three points dividing these parts are called quartiles. For example, Q1, Q2 (median), and Q3 are quartiles.
In Python, you can use `numpy.quantile()` to find values that divide the data into parts. For example, `numpy.quantile(data, 0.25)` gives the value at the first quartile.
If you have n quantiles, your dataset will be split into n+1 groups of equal size.
The median is the middle value of a dataset, splitting it into two halves. It is also called the 50th percentile or second quartile.
The interquartile range (IQR) is the difference between the first quartile (Q1) and the third quartile (Q3). It shows the range where the middle 50% of the data lies.
The IQR is useful because it is not affected by extreme values (outliers). It gives a better sense of the spread of the middle 50% of the data.
Type I error is when we incorrectly reject a true hypothesis (a false positive). The acceptable rate for this error is usually 0.05 (5%) or 0.01 (1%).
Type II error is when we fail to reject a false hypothesis (a false negative). This means we miss detecting something that is actually there.
The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the population distribution.
In statistics, a p-value helps us understand whether our sample results are likely to occur under the null hypothesis. It helps determine the strength of the evidence against the null hypothesis.
Welcome to our comprehensive collection of programming language cheatsheets! Whether you're a seasoned developer or a beginner, these quick reference guides provide essential tips and key information for all major languages. They focus on core concepts, commands, and functions—designed to enhance your efficiency and productivity.
ManageEngine Site24x7, a leading IT monitoring and observability platform, is committed to equipping developers and IT professionals with the tools and insights needed to excel in their fields.
Monitor your IT infrastructure effortlessly with Site24x7 and get comprehensive insights and ensure smooth operations with 24/7 monitoring.
Sign up now!