If you work with datasets long enough, eventually you will have to deal with statistics. Ask the average person what statistics are, and they’ll probably throw out words like “numbers,” “figures,” and “research.”

Statistics is a science or branch of mathematics that involves the collection, classification, analysis, interpretation and presentation of numerical facts and data. This is particularly convenient when dealing with populations that are too numerous and extensive for specific, detailed measurements. Statistics are critical for drawing general conclusions about a set of data from a sample of data.

Statistics are further divided into two types: descriptive and inferential. Today we look at descriptive statistics, including a definition, types of descriptive statistics, and the differences between descriptive statistics and inferential statistics.

Professional Certificate Program in Data Science

The best ticket to the best data science jobsExplore the course

Descriptive statistics defined

Descriptive statistics describe, display, and summarize the main characteristics of a set of data found in a study, presented in a summary that describes the data sample and its measurements. It helps analysts better understand the data.

Descriptive statistics represent the available sample of data and do not involve theories, inferences, probabilities or conclusions. This is a job for inferential statistics.

Also Read: Difference Between Data Mining and Statistics

Examples of descriptive statistics

If you want a good example of descriptive statistics, look no further than a student’s grade point average (GPA). GPA collects the data points created by a large selection of grades, courses, and exams, then averages them together and presents an overall picture of a student’s average academic performance. Note that GPA does not predict future results, nor does it represent any conclusions. Instead, it provides a direct summary of students’ academic performance based on data-derived values.

Here’s an even simpler example. Assume that a data set of 2, 3, 4, 5, and 6 equals the sum of 20. The mean of the data set is 4, obtained by dividing the sum by the number of values ​​(20 divided by 5 equals 4 ).

FREE Python Data Science Course

Start learning Data Science with Python for FREEStart studying

Analysts often use charts and graphs to present descriptive statistics. If you stood outside a movie theater, asked 50 audience members if they liked the movie they saw, then plotted your findings on a pie chart, that would be descriptive statistics. In this example, descriptive statistics measure the number of yes and no responses and show how many people in that particular room liked or disliked the movie. If you try to draw any other conclusions, you’ll be wandering into the territory of inferential statistics, but we’ll get to that point later.

Finally, a political poll is considered descriptive statistics provided it simply presents the specific facts (respondents’ answers) without drawing any conclusions. The polls are relatively straightforward: “Who did you vote for president in the last election?”

Types of descriptive statistics

Descriptive statistics are divided into several types, characteristics or measures. Some authors say there are two types. Others say three or even four. In the spirit of working with averages, we will use three types.

• A distribution that deals with the frequency of each value
• Central tendency, which covers the average values
• Variability (or dispersion), which indicates how spread out the values ​​are

Distribution (also called frequency distribution)

Data sets consist of a distribution of scores or values. Statisticians use graphs and tables to summarize the frequency of each possible value of a variable represented as percentages or numbers. For example, if you ran a poll to determine people’s favorite Beatle, you would set up one column with all the possible variables (John, Paul, George, and Ringo) and another with the number of votes.

Statisticians plot frequency distributions either as a graph or as a table.

Measures of central tendency

Measures of central tendency estimate the mean or center of a set of data by finding the result using three methods: mean, mode, and median.

It means. The mean is also known as “M” and is the most common method of finding average values. You get the mean by adding all the response values ​​together, dividing the sum by the number of responses, or “N.” For example, say someone is trying to figure out how many hours a day they sleep for a week. So the data set will be the records for hours (eg 6,8,7,10,8,4,9) and the sum of these values ​​is 52. There are seven answers, so N=7. You divide the sum value of 52 by N, or 7, to find M, which in this case is 7.3.

Mode. The mode is just the most common value of the answer. Datasets can have any number of modes, including “zero”. You can find the mode by arranging the order of your data set from lowest to highest value and then looking for the most common answer. So, using our sleep study from the last part: 4,6,7,8,8,9,10. As you can see, the mode is eight.

Median. Finally, we have the median, defined as the value at the exact center of the data set. Sort the values ​​in ascending order (as we did for the mode) and look for the number in the middle of the set. In this case, the median is eight.

Variability (also called dispersion)

A measure of variability gives a statistician an idea of ​​how spread out the responses are. The spread has three aspects – range, standard deviation and variance.

Scope. Use a range to determine how far the extreme values ​​are. Start by subtracting the lowest value of the data set from its highest value. Once again we turn to our study of sleep: 4,6,7,8,8,9,10. We subtract four (the lowest) from ten (the highest) and get six. There’s your range.

Standard deviation. This aspect requires a bit more work. The standard deviation (s) is the average amount of variability of your data set, which tells you how far each result is from the mean. The larger your standard deviation, the greater the variability of your data set. Follow these six steps:

1. List the results and their averages.
2. Find the variance by subtracting the mean from each result.
3. Square any deviation.
4. Sum all squared deviations.
5. Divide the sum of the squares of the deviations by N-1.
6. Find the square root of the result.
 Raw number/data Deviation from the mean Deviation squared 4 4-7.3= -3.3 10.89 6 6-7.3= -1.3 1.69 7 7-7.3= -0.3 0.09 8 8-7.3= 0.7 0.49 8 8-7.3= 0.7 0.49 9 9-7.3=1.7 2.89 10 10-7.3= 2.7 7.29 M=7.3 Sum = 0.9 Square sums = 23.83

When you divide the sum of the squared deviations by 6 (N-1): 23.83/6, you get 3.971, and the square root of that result is 1.992. As a result, we now know that each score deviates from the mean by an average of 1.992 points.

Dispersion. Variance reflects the degree to which the data set is spread out. The greater the spread of the data, the greater the variance from the mean. You can get the variance by simply squaring the standard deviation. Using the above example, we square 1.992 and arrive at 3.971.

What is the difference between descriptive statistics and inferential statistics?

So what is the difference between the two statistical forms? We already touched on this when we mentioned that descriptive statistics do not make any conclusions or predictions, which implies that inferential statistics do.

Inferential statistics takes a random sample of data from a portion of the population and describes and makes inferences about the entire population. For example, if you ask 50 people if they liked the movie they just saw, inferential statistics will build on that and assume those results will hold for the rest of the moviegoing population as a whole.

Therefore, if you stand outside this movie theater and poll 50 people who have just seen Rocky 20: Enough already! and 38 of them didn’t like it (about 76 percent), you can extrapolate that 76% of the rest of the movie-going world won’t like it either, even though you don’t have the means, time, or opportunity to ask all those people.

Simply put: Descriptive statistics give you a clear picture of what your current data is showing. Inferential statistics make predictions based on this data.

Considering a career in Data Science? Then get certified with the Data Science Certification Training Course today!

Why not become a data scientist?

Whether you like descriptive or inferential statistics, you can find many opportunities in data analytics and data science. Simplilearn’s PG program in data science gives you broad access to key data science concepts and tools like Python, R, machine learning, and more. Hands-on labs and project work in this acclaimed program bring ideas to life with qualified trainers and assistants to guide you along the way.

The Bootcamp, held in partnership with Purdue University and in collaboration with IBM, features the perfect mix of theory, case studies and extensive hands-on practice. The Economic Times ranked this Data Science certification program at the top of its list.

According to Glassdoor.comdata scientists earn an average of \$113,309 per year. Payscale.com shows that a data scientist in India makes an average of ₹817,366 per year. Data science is a great career choice if you’re looking for a challenge in a secure profession and get paid well in the process!

Check out Simplilearn’s data science courses today and jump into this exciting new opportunity!

https://www.simplilearn.com/what-is-descriptive-statistics-article

Previous article5 steps to reduce Kubernetes costs