Prepared by : Saleem.TK, MSN, Manipal College of Nursing, Manipal University
Last updated on 19-01-2010
-------------------------------------------------------------------------------------
Previous Page 1, 2, 3, 4, Next
------------------------------------------------------------------------------
Previous Page 1, 2, 3, 4 Next
Sampling
The process of selecting a fraction of the sampling unit (i.e. a collection with specified dimensions) of the target population for inclusion in the study is called sampling. Sampling can be probability sampling or non-probability sampling.
Probability Sampling or Random sampling
Probability sampling, also called random sampling, is a selection process that ensures each participant the same probability of being selected. Probability sampling is the process of selecting samples based on probability theory. Probability theory states that possibility that events occur by chance. Random sampling is the best method for ensuring that a sample is representative of the larger population. Random sampling can be simple random sampling, stratified random sampling, and cluster sampling.
Nonprobability sampling
Nonprobability sampling is the selection process in which the probability that any one individual or subject selected is not equal to the probability that another individual or subject may be chosen. The probability of inclusion and the degree to which the sample represents the population are unknown. The major problem with nonprobability sampling is that sampling bias can occur. Nonprobability sampling can be convenience sampling, purposive sampling or quota sampling.
Sampling Error (Standard Error)
Sampling error refers to the discrepancies that inevitably occur when a small group (sample) is selected to represent the characteristics of a larger group (population). It is defined as the deference between a parameter and an estimate of that parameter which is derived from a sample (Lindquist, 1968:8). The means and standard deviations calculated from the data collected on a given sample would not be the same as those calculations derived from data collected from the entire population. It is the discrepancy between the characteristics of the sample and the population that constitutes sampling error.
Descriptive statistics
Descriptive statistics are techniques which help the investigator to organize, summarize and describe measures of a sample. Here no predictions or inferences are made regarding population parameters. Descriptive statistics are used to summarize observations and to place these observations within context. The most common descriptive statistics include measures of central tendency and measures of variability.
Central tendency or “measures of the middle”
There are three commonly used measures of central tendency: the mean, the median, and the mode- are calculated to identify the average, the most typical and the most common values, respectively among the data collected. The mean is the arithmetic average, the median is the point representing the 50th percentile in a distribution, and the mode is the most common score. Sometimes each of these measures is the same; on other occasions, the mean, the median, and the mode can be different. The mean, median, and mode are the same when the distribution of scores is normal. Under most circumstances the mean, median, and mode will not be exactly the same. The mode is most likely to misrepresent the underlying distribution and is rarely used in statistical analysis. The mean and the median are the most commonly reported measures of central tendency.
The major consideration in choosing between them is how much weight should be given to extreme scores. The mean takes into account each score in the distribution; the median finds only the halfway point. As mean best represents all subjects and because of desirable mathematical properties, the mean is typically favored in statistical analysis. Despite the advantages of the mean, there are also some advantages to the median. In particular, the median disregards outlier cases, whereas the mean moves further in the direction of the outliers. Thus, the median is often used when the investigator does not want scores in the extreme of the distribution to have a strong impact. The median is also valuable for summarizing data for a measure that might be insensitive toward the higher ranges of the scale. For instance, a very easy test may have a ceiling effect but does not show the true ability of some test-takers. A ceiling effect occurs when the test is too easy to measure the true ability of the best students. Thus, if some scores stack up at the extreme, the median may be more accurate than the mean. If the high scores had not been bounded by the highest obtainable score, the mean may actually have been higher.
The mean, median, and mode are exactly the same in a normal distribution. However, not all distributions of scores have a normal or bell-shaped appearance. The highest point in a distribution of scores is called the modal peak. A distribution with the modal peak off to one side or the other is described as skewed. The word skew literally means "slanted."
The direction of skew is determined by the location of the tail or flat area of the distribution. Positive skew occurs when the tail goes off to the right of the distribution. Negative skew occurs when the tail or low point is on the left side of the distribution. The mode is the most frequent score in the distribution. In a skewed distribution, the mode remains at the peak whereas the mean and the median shift away from the mode in the direction of the skewness. The mean moves furthest in the direction of the skewness, and the median typically falls between the mean and the mode. Mode is the best measure of central tendency when nominal variables are used. Median is the best measure of central tendency when ordinal variables are used. Mean is the best measure of central tendency when interval or ratio scales are used.
Measures of Variability
If there is no variability within populations there would be no need for statistics: a single item or sampling unit would tell us all that is needed to know about the population as a whole. Three indices are used to measure variation or dispersion among scores: (1) range, (2) variance, and (3) standard deviation (Cozby, 2000). The range describes the deference between the largest and smallest observations made: the variance and standard deviation are based on average difference or deviation of observations from the mean.
Measures of central tendency, such as the mean and median, are used to summarize information. They are important because they provide information about the average score in the distribution. Knowing the average score, however, does not provide all the information required to describe a group of scores. In addition, measures of variability are required. The simplest method of describing variability is the range, which is simply the difference between the highest score and lowest score.
Another statistic, known as the interquartile range, describes the interval of scores bounded by the 25th and 75th percentile ranks; the interquartile range is bounded by the range of scores that represent the middle 50 percent of the distribution. In contrast to ranges, which are used infrequently in statistical analysis, the variance and standard deviation are used commonly. Since the mean is the average score in a distribution, the sum of the deviations around the mean will always equal zero. Yet, in order to understand the characteristic of a distribution of scores, some estimation of deviation around the mean is important. The sum of these deviations will always equal zero. However, the squared deviations around the mean can yield a meaningful index. The variance is the sum of the squared deviations around the mean divided by the number of cases.
Range
Range is the simplest method of examining variation among scores and refers to the difference between the highest and lowest values produced. It shows how wide the distribution is over which the measurements are spread. For continuous variables, the range is the arithmetic difference between the highest and lowest observations in the sample. In the case of counts or measurements, 1 should be added to the difference because the range is inclusive of the extreme observations.. The range takes account of only the most extreme observations. It is therefore limited in its usefulness, because it gives no information about how observations are distributed. Interquartile range is the area between the lowest quartile and the highest quartile, or the middle 50% of the scores
Variance
The variance is a very useful statistic and is commonly employed in data analysis. However, its calculation requires finding the squared deviations around the mean rather than the simple or absolute deviations around the mean. Thus, when the variance is calculated, the resulting calculation will be in units that are the natural squared units. Taking the square root of the variance puts the observations back into their original metric. The square root of the variance is known as the standard deviation. The standard deviation is an approximation of the average deviation around the mean. Although the standard deviation is not technically equal to the average deviation, it gives an approximation of how much the average score deviates from the mean. One method for calculating variance is to first calculate the deviation scores. The sum of the set of deviation score equal to zero. Variance is the squire of the standard deviation: conversely, a standard deviation is the squire root of the variance. The deviation of a distribution of scores can then be used to calculate the variance.
Standard Deviation
The standard deviation is the most widely applied measure of variability. When observations have been obtained from every item or sampling unit in a population, the symbol for the standard deviation is (lower case sigma). This is parameter of the population. When it is calculated from a sample it is symbolized s. Standard deviation of a distribution of scores is the squire root of the variance. Large standard deviations suggest that scores do not cluster around the mean: they are probably widely scattered. Similarly small standards deviations suggest that there is very little deference among scores.
Normal Distribution
The normal distribution is a mathematical construct which suggests that naturally occurring observations follow a given pattern. The pattern is the normal curve, which places most observations at the mean and lesser number of observations at either extreme. This curve or bell-shaped distribution reflects the tendency of the observations concerning a specific variable to cluster in a particular manner
The normal curve can be described for any set of data given the mean and standard deviation of the data and assumptions that the characteristics under study would be normally distributed within the population. A normal distribution of the data suggests that 68% of observations fall within one standard deviation of the mean, 95% fall within two standard deviations of the mean, and 99.87% fall within three standard deviations of the mean. Theoretically range of the curve is unlimited.
Standard Scores
One of the problems with means and standard deviations is that their meanings are not independent of context. For example, a mean of 45.6 means little unless the score is known. The Z-score is a transformation into standardized units that provides a context for the interpretation of scores. The Z-score is the difference between the score and the mean, divided by the standard deviation. To make comparisons between groups, standard scores rather than raw scores can be used. Standard scores enable the investigator to examine the position of a given score by measuring its mean deviation from the means of all sores.
Most often, the units on the x axis of the normal distribution are in Z-units. Any variable transformed into Z-units will have a mean of 0 and a standard deviation of 1. Translation of Z-scores into percentile ranks is accomplished using a table for the standard normal distribution. Certain Z-scores are of particular interest in statistics and psychological testing. The Z-score 1.96 represents the 97.5th percentile in a distribution whereas -1.96 represents the 2.5th percentile. A Z-score of less than -1.96 or greater than +1.96 falls outside of a 95 percent interval bounding the mean of the Z-distribution. Some statistical definitions of abnormality view these defined deviations as cutoff points. Thus, a person who is more than 1.96 Z-scores from the mean on some attribute might be regarded as abnormal. In addition to the interval bounded by 95 percent of the cases, the interval including 99 percent of all cases is also commonly used in statistics.
Confidence Intervals
In most statistical inference problems the sample mean is used to estimate the population mean. Each sample mean is considered to be an unbiased estimate of the population mean. Although the sample mean is unlikely to be exactly the same as the population mean, repeated random samples will form a sampling distribution of sample means. The mean of the sampling distribution is an unbiased estimate of the population mean. However, taking repeated random samples from the population is also difficult and expensive. Instead, it is necessary to estimate the population mean based on a single sample; this is done by creating an interval around the sample mean.
The first step in creating this interval is finding the standard error of the mean. The standard error of the mean is the standard deviation divided by the square root of the sample size. Statistical inference is used to estimate the probability that the population mean will fall within some defined interval. Because sample means are distributed normally around the population mean, the sample mean is most probably near the population value. However, it is possible that the sample mean is an overestimate or an underestimate of the population mean. Using information about the standard error of the mean, it is possible to put a single observation of a mean into context.
The ranges that are likely to capture the population mean are called confidence intervals. Confidence intervals are bounded by confidence limits. The confidence interval is defined as a range of values with a specified probability of including the population mean. A confidence interval is typically associated with a certain probability level. For example, the 95 percent confidence interval has a 95 percent chance of including the population mean. A 99 percent confidence interval is expected to capture the true mean in 99 of each 100 cases. The confidence limits are defined as the values for points that bound the confidence interval.Creating a confidence interval requires a mean, a standard error of the mean, and the Z-value associated with the interval.
PreviousPage 1, 2, 3, 4 Next |