Previous Page 1, 2, 3, 4 Next
Previous Page 1, 2, 3, 4 Next
Inferential statistics are mathematical procedures which help the investigator to predict or infer population parameters from sample measures. This is done by a process of inductive reasoning based on the mathematical theory of probability (Fowler, J., Jarvis, P. & Chevannes M. 2002).
The idea of probability is basic to inferential statistics. The goal of inferential statistical techniques is same, to determine as precisely as possible the probability of an occurrence. It can be regarded as quantifying the chance that a stated outcome of an event will take place. Probability refers to the likelihood that the differences between groups under study are the result of chance. Probability Theory states, any given event out of all possible outcomes. When any numbers of mutually exclusive sets are given they add up to a singularity. When a coin is tossed it has two out comes, either head or tail, i.e. 0.5 chance for head and 0.5 chance for tail. When these two chances are added it gives 1. For example, in a class there are fifty students, the chance of students to become first in the class is 1 in 50 (i.e. .02). By convention, probability values fall on a scale between 0 (impossibility) and 1 (certainty), but they are sometimes expressed as percentages, so the ‘probability’ scale has much in common with the proportion scale. The chance of committing type one error is decided by testing the hypothesis for its probability value. In behavioural sciences <.05 is taken as alpha value for testing the hypothesis. When stringent outcomes are required <.01 or <.001 are taken as the alpha value or p value.
Statistical Significance (alpha level)
The level of significance (or alpha level) is determined to identify the probability that the deference between the groups have occurred by chance rather than in response to the manipulation of variables. The decision of whether the null hypothesis should be rejected depends on the level of error that can be tolerated. The tolerance level of error is expressed as a level of significance or alpha level. The usual level of significance or alpha level is 0.05, although at times levels of 0.01 or o.001 may be used when high level of accuracy is required. In testing the significance of obtained statistics, if the investigator rejects the null hypothesis when, in fact, it is true he commits type I error or alpha error, and when the investigator accepts the null hypothesis when, in fact, it is false he commits type II or beta error (Singh AK, 2002).
Parametric and Non-parametric Tests
Parametric and non-parametric test are commonly employed in behavioral researches.
A parametric test is one which specifies certain conditions about the parameter of the population from which a sample is taken. Such statistical tests are considered to be more powerful than non-parametric tests and should be used if their basic requirements or assumptions are met. Assumptions for using parametric tests:
- The observation must be independent.
- The observation must be drawn from a normal distribution.
- The sample drawn from a population must have equal variances and this condition is more important if the size of the sample is particularly small, i.e. homogenicity of variables.
- The variables must be expressed in interval or ratio scales.
- The variables under study should be continuous
Examples of parametric tests are t-test, z-test and F-test.
A non-parametric test is one does not specify any conditions about the parameter of the population from which the population is drawn. These tests are called distribution-free statistics. For non-parametric tests, the variables under study should be continuous and the observations should be independent. Requisites for using a non-parametric statistical test are:
- The shape of the distribution of the population from which a sample is drawn is not known to be normal curve.
- The variables have been quantified on the basis of nominal measures (or frequency counts)
- The variables have been quantified on the basis of ordinal measures or ranking.
- A non-parametric test should be used only when parametric assumptions cannot be met.
Common non-parametric tests
- Chi-squire test
- Mann-Whitney U test
- Rank difference methods (Spearman rho and Kendal’s tau)
- Coefficient of concordance (W)
- Median test
- Kruskal-Wallis test
- Friedman test
Tips on using appropriate tests in experimental design
Two unmatched (unrelated) groups, experimental and control (e.g. patient receiving a prepared therapeutic intervention for depression and control group of patients on routine care)-
- See the distribution, whether normal or non-normal
- If normal, use parametric tests (independent t-test)
- If non-normal, go for nonparametric tests- Mann-Whitney U test or making the data normal through natural log transformation or z-transformation.
Two-matched (related) groups, pre-post design (the same group is rated before intervention and after the period of intervention the group is again rate. i.e. two ratings in the same or related group)-
- See distribution, whether normal or non-normal
- If normal use parametric paired t-test.
- If non-normal, use nonparametric Wilcoxon Sign Rank (W) test
More than two –unmatched (unrelated) groups (for example three groups: schizophrenia, bipolar and control group)-
- see distribution whether normal or non-normal
- if normally distributed use parametric One-way ANOVA
- if non-normal use nonparametric Kruskal-Wallis test
More than two matched (related) groups (for example in ongoing intervention ratings at different times- t1, t2, t3, t4 …)
- See distribution, normal or non-normal
- If the data is normal use parametric Repeated Measures ANOVA
- If data is non-normal use nonparametric Friedman’s test
Matched (related) and unmatched (unrelated) observations
When analyzing bivariate data such as correlations, a single sample unit gives a pair of observations representing two different variables. The observations comprising a pair are uniquely linked, are said to be matched or paired. For example, the systolic blood pressure of 10 patients and measurements of another 10 patients after administration are unmatched. However, the measurements of the same 10 patients before and after administration of the drug are matched. It is possible to conduct more sensitive analysis if the observations are matched.
Common Statistical tests
Chi-squire (X2) Test (analyzing frequencies)
The chi-squire test is one of the important non-parametric tests. Guilford (1956) has called it the ‘general-purpose statistic’. Chi-squire test are widely referred to as test of homogenicity, randomness, association, independence and goodness of fit. The chi-squire test is used when the data are expressed in terms of frequencies of proportions or percentages. This test applies only to discrete data, but any continuous data can be reduced to the categories of in such a way that they can be treated as discrete data. The chi-square statistic is used to evaluate the relative frequency or proportion of events in a population that fall into well-defined categories. For each category, there is an expected frequency that is obtained from knowledge of the population or from some other theoretical perspective. There is also an observed frequency for each category. The observed frequency is obtained from observations made by the investigator. The chi-square statistic expresses the discrepancy between the observed and the expected frequency.
There are several uses of chi-squire test as:
1. Chi-squire test can be used as a test of equal probability hypothesis (equal probability hypothesis is meant the probability of having the frequencies in all the given categories as equal).
2. Testing the significance of the independence hypothesis (independent hypothesis means that one variable is not affected by or related to another variable and hence, these two variables are independent).
3. Chi-squire test can be used in testing a hypothesis regarding the normal shape of a frequency distribution (goodness-of-fit).
4. Chi-squire test is used in testing significance of several statistics like phi-coefficient, coefficient of concordance, and coefficient of contingency.
5. In chi-squire test, the frequencies we observe are compared with those we expect on the basis of some null hypothesis. If the discrepancy between the observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degree of freedom. Then the null hypothesis is rejected in favor of some alternative. The mastery of the method lies not in so much in the computation of the test statistic itself, but in the calculation of expected frequencies.
6. The chi-squire statistic does not give any information regarding the strength of a relationship: it only conveys the existence of or non-existence of the relationship between the variables investigated. To establish the extent and nature of the relationship, additional statistics such as phi, Cramer’s V, or contingency coefficient can be used (Brockopp &Hastings-Tolsma, 2003).
Tips on analyzing frequencies
- All versions of the chi-squire test compare the agreement between a set of observed frequencies and those expected if some null hypothesis is true.
- All objects are counted the nominal scale or unambiguous intervals on a continuous scale like successive days or moths ma be regarded for the application of the tests.
- Apply Yate’s correction in the chi-squire test when there is only one degree of freedom, i.e. when there is only ‘one way’ test and in 2×2 contingency table.
Testing normality of a data
Parametric statistical techniques depend upon the mathematical properties of the normal curve. They usually assume that samples are drawn from populations that are normally distributed. Before adopting a statistical test, it is essential to determine whether the data is normal or non-normal. The normality of data can be checked by two ways, either plot out the data to see if they look normal or using sophisticated statistical procedures. There are statistical tests to see normality of the data. The commonest one is Kolmogorov-Smirnov test. As per the central limit theorem, if there is no significance in the P value (> .05) ideally a parametric test can be used for analyzing the data, and if there is significance (<.05) a non-parametric test should be used for analysis. A Shapiro-Wilk test is used to see whether parameters used to test normality is within the allowed limit. Statistical packages like SPSS can be used for doing this test.
t-test and z-test (comparing means)
In experimental sciences, comparisons between groups are very common. Usually, one group is the treatment, or experimental group, while the other group is the untreated, or control group. If patients are randomly assigned to these two groups, it is assumed that they differ only by chance prior to treatment. Differences between groups after the treatment are usually used to estimate treatment effect. The task of the statistician is to determine whether any observed differences between the groups following treatment should be attributed to chance or to the treatment. The t-test is commonly used for this purpose. There are actually several different types of t-tests
Types of t-Tests
- Comparison of a sample mean with a hypothetical population mean.
- Comparison between two scores in the same group of individuals.
- Comparison between observations made on two independent groups.
t-test and z-test are parametric inferential statistical techniques used when comparison of two means are required. It is used to test the null hypothesis that there is no difference in means between the two groups. The reporting of the results of t-test generally includes the df, t-value, and probability level. A t-test can be one-tailed or two-tailed. If the hypothesis is directional, a one-tailed test is generally used, and if the hypothesis is non-directional. t-test is used when sample size is less than 30 and z-test is used when sample size is more than 30.
There are dependent and independent t-tests. The formula to calculate a t-test can differ depending on whether the samples involved are dependent or independent. Samples are independent when there are two groups such as an experimental and a control group. Samples are dependent when the participants from two groups are paired in some manner. The form of the t-test that is used with a dependent sample may be termed as paired, dependent, matched, or correlated (Brockopp & Hastings-Tolsma, 2003).
Degree of freedom (df)
Degree of freedom (df) is a mathematical concept that describes the number of events or observations that are free to vary: for each statistical test there is a formula for calculating the appropriate degree of freedom (n-1).
The Mann-Whitney U test is a non-parametric substitute for the parametric t-test, for comparing the medians of two unmatched pairs. For application of U test data must be obtained on ordinal or interval scale. We can use Mann-Whitney U-test to compare the median time undertaken to perform the task by a sample of subjects who had not drunk with that of another sample who had drunk a standardized volume of alcohol. This test is used to see group difference, when the data is non-normal and the groups are independent. The test can be applied in groups with unequal or equal size.
Some key points about using Mann-Whitney U-test are:
- This test can be applied to interval data (measurements), to count of things, derived variable (proportions and indices) and to ordinal data (rank scales, etc.)
- Unlike some test statistics, the calculated value of U has to be smaller than the tabulated critical value in order to reject null hypothesis.
- The test is for difference in medians. It is common error to record a statement like ‘the Mann-Whitney U-test showed there is significant difference in means. There is, however, no need to calculate the medians of each sample to do the test.
Wilcoxon test -matched pairs
The Wilcoxon test for matched pairs is a non-parametric test for comparing the medians of two matched samples. It calls for a test statistic T whose probability distribution is known. The observation must be drawn on interval scale. It is not possible to use this test on ordinal measurements. The Wilcoxon's test can be used in matched pair samples. This test is for difference in medians and the test assumes that samples have been drawn from parent populations that are symmetrically not necessarily normally distributed.
Pearson Product-Moment Correlation Coefficient
The Pearson product-moment correlation method is a parametric test is a common method assessing the association between two variables under study. In this test an estimation of at least one parameter is involved, measurement is at an interval level, and it is assumed that the variable under study is normally distributed within the population.
Spearman Rank correlation Coefficient
Spearman’s r is a nonparametric test, which is equivalent to parametric Pearson r. Spearman’s Rank Correlation Technique is used when the conditions of the Product Moment Correlation Coefficient do no apply. This test is widely used by health scientists and uses ranks of the x and y observations and the raw data themselves are discarded.
Tips on using correlation tests
- When observations of one or both variables are on an ordinal scale, or are proportions, percentages, indices or counts of things, use the Spearman’s Rank Correlation Coefficient. The number of units in the sample i.e. the number of paired observations should be between 7 and 30.
- When observations are measured on interval scale use Product Moment Correlation Coefficient should be considered. . Sample units must be obtained randomly, and the data should be bivariate normal i.e. x and y.
- The relationship between the variables should be rectilinear (straight line) not curved. Certain mathematical transformations (e.g. logarithmic transformation) will ‘straighten up’ curved relationships.
- A strong and significant correlation does not mean does not mean one necessarily the cause of the other. It is possible that some additional, unidentified factor is underlying source of variability in both variables.
- Correlations measured in samples estimate correlations in the populations. A correlation in a sample is not ‘improved’ or strengthened by obtaining more observations: however, larger samples may be required to confirm the statistical significance of weaker correlations.