In statistics, a statistical population is a set of similar items or events which is of interest for some questions or experiments.[1] A statistical population can be a group of existing objects (e.g., all the stars within the Milky Way Galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization. For example, the set of all possible hands in a poker game.[2] A common aim of statistical analysis is to produce information about some chosen population.[3]

In statistical inference, a subset of the population, known as a statistical sample, is chosen to represent the population in a statistical analysis.[4]

Moreover, the statistical sample must be unbiased and accurately model the population, meaning every unit of the population has an equal chance of selection. The ratio of the size of this statistical sample to the size of the population is called the sampling fraction. Using the appropriate sample statistics, it is then possible to estimate the population parameters.

## Mean

The population mean, or population expected value, measures the central tendency of a probability distribution or random variable characterized by that distribution.[5] In a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value x of X and its probability p(x), and then adding all these products together, giving ${\displaystyle \mu =\sum xp(x)....}$.[6][7] An analogous formula applies to the case of a continuous probability distribution. Not every probability distribution has a defined mean e.g. the Cauchy distribution. In some distributions, the mean may be infinite.

For a finite population, the population mean of a property is equal to the arithmetic mean of the given property, while considering every member of the population. For example, the population mean height is equal to the sum of the heights of every individual—divided by the total number of individuals. The sample mean may differ from the population mean, especially for small samples. The law of large numbers states that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.[8]

## Subpopulation

A subpopulation, also referred to as a subclass, subgroup, or subsample, is a subset of the population characterized by one or more attributes. Analyzing a subpopulation can be done conditionally or unconditionally:[9]

• A conditional[10] analysis of a subpopulation is performed by restricting the data to the subpopulation of interest, such as performing a regression exclusively on Egyptian males within a broader sample.
• An unconditional analysis models the subpopulation explicitly, for instance, by including a gender-independent variable in a regression.

## Universe of data

A statistical population is one of the fundamental concepts in statistics and data science, referring to the complete set of items or events that share a common attribute, from which data can be gathered and analyzed. It is often considered the universe of data, encompassing all possible subjects of interest—every person, object, or event that fits the criteria being studied. For instance, the population in a study on global healthcare could include every person on Earth.

Populations can be finite or infinite. Finite populations are countable, such as the number of students in a school. Although it is possible for infinite populations to be countable as well, from an empirical perspective, infinite populations or large populations are generally thought of as uncountable[citation needed]; like the number of grains of sand on a beach or the number of stars in the universe. When studying a population, researchers can either collect data from every member (a census) or a subset (a sample). Censuses are more accurate but expensive and time-consuming. Sampling is more practical but introduces the need for statistical methods to ensure the sample represents the population well.

The target population is the entire group the researcher is interested in, while the accessible population is the portion that can actually be studied due to practical constraints. For example, a study might target all high school students in a country but only access students from a few schools.

The concept of a population is crucial for statistical inference. Inferences made from samples are generalized to the population. The goal is to make accurate conclusions about the population based on sample data, often involving probabilities and confidence intervals.

Populations can change over time. A dynamic population is one where the membership can change, such as the population of a city, which fluctuates due to births, deaths, and migration. This complicates studies, requiring adjustments in statistical models.

Populations are described by parameters like the mean (average), variance, and proportion. These parameters are often unknown and are estimated through sample statistics. For example, the mean income of a country’s population might be estimated from a sample of taxpayers.

In ecology, a population refers to a group of organisms of the same species living in a particular geographic area. Ecologists study population dynamics, including birth rates, death rates, and migration patterns, to understand species survival and ecosystem health[11].

In the era of big data, the concept of population is evolving. With access to massive datasets, researchers can sometimes work with entire populations of data (e.g., all Twitter users). This reduces reliance on sampling, but it also introduces challenges in data management and interpretation. When defining and studying populations, especially human populations, ethical considerations are paramount. Issues like informed consent, privacy, and the potential for bias in selecting samples must be carefully managed to ensure that research is both valid and respectful of participants' rights.

## Importance of Population

"It is important to understand the target population being studied, so you can understand who or what the data are referring to. If you have not clearly defined who or what you want in your population, you may end up with data that are not useful to you."

## References

1. ^ "Glossary of statistical terms: Population". Statistics.com. Retrieved 22 February 2016.
2. ^
3. ^ Yates, Daniel S.; Moore, David S; Starnes, Daren S. (2003). The Practice of Statistics (2nd ed.). New York: Freeman. ISBN 978-0-7167-4773-4. Archived from the original on 2005-02-09.
4. ^ "Glossary of statistical terms: Sample". Statistics.com. Retrieved 22 February 2016.
5. ^ Feller, William (1950). Introduction to Probability Theory and its Applications, Vol I. Wiley. p. 221. ISBN 0471257087.
6. ^ Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, p. 279
7. ^ Weisstein, Eric W. "Population Mean". mathworld.wolfram.com. Retrieved 2020-08-21.
8. ^ Schaum's Outline of Theory and Problems of Probability by Seymour Lipschutz and Marc Lipson, p. 141
9. ^ "Understanding Subpopulations: Definition, Importance, and Examples". Retrieved 31-08-2024. ((cite web)): Check date values in: |access-date= (help)
10. ^ Sanderson, Warren C.; Scherbov, Sergei; O'Neill, Brian C.; Lutz, Wolfgang (2004). "Conditional Probabilistic Population Forecasting". International Statistical Review / Revue Internationale de Statistique. 72 (2): 157–166. ISSN 0306-7734.
11. ^ "What factors affect the survival of a species within an ecosystem? | TutorChase". www.tutorchase.com. Retrieved 2024-08-30.
12. ^ "What Is a Population in Statistics?". ThoughtCo. Retrieved 2024-08-28.