2018
COMMERCE
Paper: 203
(Research Methodology and Statistical Analysis)
Full Marks: 80
Time: Three Hours
The figures in the margin indicate full marks for the questions.
1. (a) What are the different types of research? Explain them in brief. (4+12=16)
-> Research is about using established methods to investigate a problem or question in detail with the aim of generating new knowledge about it.
It is a vital tool for scientific advancement because it allows researchers to prove or refute hypotheses based on clearly defined parameters, environments and assumptions. Due to this, it enables us to confidently contribute to knowledge as it allows research to be verified and replicated.
Knowing the types of research and what each of them focuses on will allow you to better plan your project, utilises the most appropriate methodologies and techniques and better communicate your findings to other researchers and supervisors.
Classification of Types of Research
There are various types of research that are classified according to their objective, depth of study, analysed data, time required to study the phenomenon and other factors. It’s important to note that a research project will not be limited to one type of research, but will likely use several.
According to its Purpose
Theoretical Research
Theoretical research, also referred to as pure or basic research, focuses on generating knowledge , regardless of its practical application. Here, data collection is used to generate new general concepts for a better understanding of a particular field or to answer a theoretical research question.
Results of this kind are usually oriented towards the formulation of theories and are usually based on documentary analysis, the development of mathematical formulas and the reflection of high-level researchers.
For example, a philosophical dissertation, since the aim is to generate new approaches from existing data without considering how its findings can be applied or implemented in practice.
Applied Research
Here, the goal is to find strategies that can be used to address a specific research problem. Applied research draws on theory to generate practical scientific knowledge, and its use is very common in STEM fields such as engineering, computer science and medicine.
This type of research is subdivided into two types:
1. Technological applied research: looks towards improving efficiency in a particular productive sector through the improvement of processes or machinery related to said productive processes.
2. Scientific applied research: have predictive purposes. Through this type of research design, we can measure certain variables to predict behaviors useful to the goods and services sector, such as consumption patterns and viability of commercial projects.
For example, market research, because by examining consumption patterns, strategies can be developed for the development of new products and marketing campaigns, etc.
According to your Depth of Scope
Exploratory Research
Exploratory research is used for the preliminary investigation of a subject that is not yet well understood or sufficiently researched. It serves to establish a frame of reference and a hypothesis from which an in-depth study can be developed that will enable conclusive results to be generated.
Because exploratory research is based on the study of little-studied phenomena, it relies less on theory and more on the collection of data to identify patterns that explain these phenomena.
For example, an investigation of the role social media in the perception of self-image.
Descriptive Research
The primary objective of descriptive research is to define the characteristics of a particular phenomenon without necessarily investigating the causes that produce it.
In this type of research, the researcher must take particular care not to intervene in the observed object or phenomenon, as its behaviour may change if an external factor is involved.
For example, investigating how the public census of influential government officials differs between urban and non-urban areas.
Explanatory Research
Explanatory research is the most common type of research method and is responsible for establishing cause-and-effect relationships that allow generalisations to be extended to similar realities. It is closely related to descriptive research, although it provides additional information about the observed object and its interactions with the environment.
For example, investigating the brittle behaviour of a specific material when under compressive load.
Correlational Research
The purpose of this type of scientific research is to identify the relationship between two or more variables. A correlational study aims to determine whether a variable changes, how much the other elements of the observed system change.
According to the Type of Data Used
Qualitative Research
Qualitative methods are often used in the social sciences to collect, compare and interpret information, has a linguistic-semiotic basis and is used in techniques such as discourse analysis, interviews, surveys, records and participant observations.
In order to use statistical methods to validate their results, the observations collected must be evaluated numerically. Qualitative research, however, tends to be subjective, since not all data can be fully controlled. Therefore, this type of research design is better suited to extracting meaning from an event or phenomenon (the ‘why’) than its cause (the ‘how’).
For example, examining the effects of sleep deprivation on mood.
Quantitative Research
Quantitative research study delves into a phenomena through quantitative data collection and using mathematical, statistical and computer-aided tools to measure them . This allows generalised conclusions to be projected over time.
For example, conducting a computer simulation on vehicle strike impacts to collect quantitative data.
Experimental Research
It is about designing or replicating a phenomenon whose variables are manipulated under strictly controlled conditions in order to identify or discover its effect on another independent variable or object. The phenomenon to be studied is measured through study and control groups, and according to the guidelines of the scientific method.
For example, randomised controlled trial studies for measuring the effectiveness of new pharmaceutical drugs on human subjects.
Non-Experimental Research
Also known as an observational study, it focuses on the analysis of a phenomenon in its natural context. As such, the researcher does not intervene directly, but limits their involvement to measuring the variables required for the study. Due to its observational nature, it is often used in descriptive research.
For example, a study on the effects of the use of certain chemical substances in a particular population group can be considered a non-experimental study.
Quasi-Experimental Research
It controls only some variables of the phenomenon under investigation and is therefore not entirely experimental. In this case, the study and the focus group cannot be randomly selected, but are chosen from existing groups or populations . This is to ensure the collected data is relevant and that the knowledge, perspectives and opinions of the population can be incorporated into the study.
For example, assessing the effectiveness of an intervention measure in reducing the spread of antibiotic-resistant bacteria.
According to the Type of Inference
Deductive Investigation
In this type of research, reality is explained by general laws that point to certain conclusions; conclusions are expected to be part of the premise of the research problem and considered correct if the premise is valid and the inductive method is applied correctly.
Inductive Research
In this type of research, knowledge is generated from an observation to achieve a generalisation. It is based on the collection of specific data to develop new theories.
Hypothetical-Deductive Investigation
It is based on observing reality to make a hypothesis, then use deduction to obtain a conclusion and finally verify or reject it through experience.
According to the Time in Which it is Carried Out
Longitudinal Study (also referred to as Diachronic Research)
It is the monitoring of the same event, individual or group over a defined period of time. It aims to track changes in a number of variables and see how they evolve over time. It is often used in medical, psychological and social areas .
For example, a cohort study that analyses changes in a particular indigenous population over a period of 15 years.
Cross-Sectional Study (also referred to as Synchronous Research)
Cross-sectional research design is used to observe phenomena, an individual or a group of research subjects at a given time.
According to The Sources of Information
Primary Research
This fundamental research type is defined by the fact that the data is collected directly from the source, that is, it consists of primary, first-hand information.
Secondary research
Unlike primary research, secondary research is developed with information from secondary sources, which are generally based on scientific literature and other documents compiled by another researcher.
According to How the Data is Obtained
Documentary (cabinet)
Documentary research, or secondary sources, is based on a systematic review of existing sources of information on a particular subject. This type of scientific research is commonly used when undertaking literature reviews or producing a case study.
Field
Field research study involves the direct collection of information at the location where the observed phenomenon occurs.
From Laboratory
Laboratory research is carried out in a controlled environment in order to isolate a dependent variable and establish its relationship with other variables through scientific methods.
Mixed-Method: Documentary, Field and/or Laboratory
Mixed research methodologies combine results from both secondary (documentary) sources and primary sources through field or laboratory research.
(b) What do you mean by data? Distinguish between primary data and secondary data. (4+12=16)
-> Data refers to distinct pieces of information, usually formatted and stored in a way that is concordant with a specific purpose. Data can exist in various forms: as numbers or text recorded on paper, as bits or bytes stored in electronic memory, or as facts living in a person’s mind. Since the advent of computer science in the mid-1900s, however, data most commonly refers to information that is transmitted or stored electronically.
Ideally, there are two ways to analyze the data:
1. Data Analysis in Qualitative Research
2. Data Analysis in Quantitative Research
1. Data Analysis in Qualitative Research
Data analysis and research in subjective information work somewhat better than numerical information as the quality information consists of words, portrayals, pictures, objects, and sometimes images. Getting knowledge from such entangled data is a confounded procedure; thus, it is usually utilized for exploratory research as well as data analysis.
Finding Patterns in the Qualitative Data
Although there are a few different ways to discover patterns in the printed data, a word-based strategy is the most depended and broadly utilized global method for research and analysis of data. Prominently, the process of data analysis in qualitative research is manual. Here the specialists, as a rule, read the accessible information and find monotonous or frequently utilized words.
2. Data Analysis in Quantitative Research
Preparing Data for Analysis
The primary stage in research and analysis of data is to do it for the examination with the goal that the nominal information can be changed over into something important. The preparation of data comprises the following.
1. Data Validation
2. Data Editing
3. Data Coding
For quantitative statistical research, the utilization of descriptive analysis regularly gives supreme numbers. However, the analysis is never adequate to show the justification behind those numbers. Still, it is important to think about the best technique to be utilized for research and analysis of data fitting your review survey and what story specialists need to tell.
Consequently, enterprises ready to make due in the hypercompetitive world must have a remarkable capacity to investigate complex research information, infer noteworthy bits of knowledge, and adjust to new market needs.
Here are 15 differences between primary and secondary data
- Definition
Primary data is the type of data that is collected by researchers directly from main sources while secondary data is the data that has already been collected through primary sources and made readily available for researchers to use for their own research.
The main difference between these 2 definitions is the fact that primary data is collected from the main source of data, while secondary data is not.
The secondary data made available to researchers from existing sources are formerly primary data which was collected for research in the past. The availability of secondary data is highly dependent on the primary researcher’s decision to share their data publicly or not.
- Examples:
An example of primary data is the national census data collected by the government while an example of secondary data is the data collected from online sources. The secondary data collected from an online source could be the primary data collected by another researcher.
For example, the government, after successfully the national census, they share the results in newspapers, online magazines, press releases, etc. Another government agency that is trying to allocate the state budget for healthcare, education, etc. may need to access the census results.
With access to this information, the number of children who needs education can be analyzed and hard to determine the amount that should be allocated to the education sector. Similarly, knowing the number of old people will help in allocating funds for them in the health sector.
- Data Types
The type of data provided by primary data is real-time, while the data provided by secondary data is stale. Researchers are able to have access to the most recent data when conducting primary research , which may not be the case for secondary data.
Secondary data have to depend on primary data that has been collected in the past to perform research. In some cases, the researcher may be lucky that the data is collected close to the time that he or she is conducting research.
Therefore, reducing the amount of difference between the secondary data being used and the recent data.
- Process
Researchers are usually very involved in the primary data collection process, while secondary data is quick and easy to collect. This is due to the fact that primary research is mostly longitudinal.
Therefore, researchers have to spend a long time performing research, recording information, and analyzing the data. This data can be collected and analyzed within a few hours when conducting secondary research.
For example, an organization may spend a long time analyzing the market size for transport companies looking to talk into the ride-hailing sector. A potential investor will take this data and use it to inform his decision of investing in the sector or not.
- Availability
Primary data is available in crude form while secondary data is available in a refined form. That is, secondary data is usually made available to the public in a simple form for a layman to understand while primary data are usually raw and will have to be simplified by the researcher.
Secondary data are this way because they have previously been broken down by researchers who collected the primary data afresh. A good example is the Thomson Reuters annual market reports that are made available to the public.
When Thomson Reuters collect this data afresh, they are usually raw and may be difficult to understand. They simplify the results of this data by visualizing it with graphs, charts, and explanations in words.
- Data Collection Tools
Primary data can be collected using surveys and questionnaires while secondary data are collected using the library, bots, etc. The different ones between these data collection tools are glaring and can it be interchangeably used.
When collecting primary data, researchers lookout for a tool that can be easily used and can collect reliable data. One of the best primary data collection tools that satisfy this condition is Formplus.
Formplus is a web-based primary data collection tool that helps researchers collect reliable data while simultaneously increasing the response rate from respondents.
- Sources
Primary data sources include; Surveys, observations, experiments, questionnaires, focus groups, interviews, etc., while secondary data sources include; books, journals, articles, web pages, blogs, etc. These sources vary explicitly and there is no intersection between the primary and secondary data sources.
Primary data sources are sources that require a deep commitment from researchers and require interaction with the subject of study. Secondary data, on the other hand, do not require interaction with the subject of study before it can be collected.
In most cases, secondary researchers do not have any interaction with the subject of research.
- Specific
Primary data is always specific to the researcher’s needs, while secondary data may or may not be specific to the researcher’s need. It depends solely on the kind of data the researcher was able to lay hands on.
Secondary researchers may be lucky to have access to data tailored specifically to meet their needs, which mag is not the case in some cases. For example, a market researcher researching the purchasing power of people from a particular community may not have access to the data of the subject community.
Alternatively, there may be another community with a similar standard of living to the subject community whose data is available. The researcher mag uses to settle for this data and use it to inform his conclusion on the subject community.
- Advantage
Some common advantages of primary data are its authenticity, specific nature, and up to date information while secondary data is very cheap and not time-consuming.
Primary data is very reliable because it is usually objective and collected directly from the original source. It also gives up to date information about a research topic compared to secondary data.
Secondary day, on the other hand, is not expensive making it easy for people to conduct secondary research. It doesn’t take so much time and most of the secondary data sources can be accessed for free.
- Disadvantage
The disadvantage of primary data is the cost and time spent on data collection while secondary data may be outdated or irrelevant. Primary data incur so much cost and takes time because of the processes involved in carrying out primary research.
For example, when physically interviewing research subjects, one may need one or more professionals, including the interviewees, videographers who will make a record of the interview in some cases and the people involved in preparing for the interview. Apart from the time required, the cost of doing this may be relatively high.
Secondary data may be outdated and irrelevant. In fact, researchers have to surf through irrelevant data before finally having access to the data relevant to the research purpose.
- Accuracy and Reliability
Primary data is more accurate and reliable while secondary data is relatively less reliable and accurate. This is mainly because the secondary data sources are not regulated and are subject to personal bias.
A good example of this is business owners who lay bloggers to write good reviews about their product just to gain more customers. This is not the case with primary data which is collected by being a researcher himself.
One of the researcher’s aim when gathering primary data for research will be gathering accurate data so as to arrive at correct conclusions. Therefore, biases will be avoided at all costs (e.g. same businesses when collecting feedback from customers).
- Cost-effectiveness
Primary data is very expensive while secondary data is economical. When working on a low budget, it is better for researchers to work with secondary data, then analyze it to uncover new trends.
In fact, a researcher might work with both primary data and secondary data for one research. This is usually very advisable in cases whereby the available secondary data does not fully meet the research needs.
Therefore, a little extension on the available data will be done and cost will also be saved. For example, a researcher may require a market report from 2010 to 2019 while the available reports stop at 2018.
- Collection Time
The time required to collect primary data is usually long while that required to collect secondary data is usually short. The primary data collection process is sometimes longitudinal in nature.
Therefore, researchers may need to observe the research subject for some time while taking down important data. For example, when observing the behavior of a group of people or particular species, researchers have to observe them for a while.
Secondary data can, however, be collected in a matter of minutes and analyzed to dead conclusions—taking a shorter time when compared to primary data. In some rare cases, especially when collecting little data, secondary data may take a longer time because of difficulty consulting different data sources to find the right data.
2. (a) What is measures of skewness? Explain the three different formulas to compute the coefficient of skewness. 6+10=16
(b) What are the different measures of central tendency? How these measures are used in analyzing data under different circumstances? (4+12=16)
-> A measure of central tendency (also referred to as measures of centre
or central location) is a summary measure that attempts to describe a whole
set of data with a single value that represents the middle or centre of its
distribution.
There are three main measures of central tendency: the mode, the median and
the mean. Each of these measures describes a different indication of the
typical or central value in the distribution.
Mode
The mode is the most commonly occurring value in a distribution.
Consider this dataset showing the retirement age of 11 people, in whole
years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age
data.
Age | Frequency |
54 | 3 |
55 | 1 |
56 | 1 |
57 | 2 |
58 | 2 |
60 | 2 |
The most commonly occurring value is 54; therefore the mode of this
distribution is 54 years.
Advantage of the mode:
The mode has an advantage over the median and the mean as it can be found
for both numerical and categorical (non-numerical) data.
Limitations of the mode:
There are some limitations to using the mode. In some distributions, the
mode may not reflect the centre of the distribution very well. When the
distribution of retirement age is ordered from lowest to highest value, it
is easy to see that the centre of the distribution is 57 years, but the
mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, and 60
It is also possible for there to be more than one mode for the same
distribution of data, (bi-modal, or multi-modal). The presence of more than
one mode can limit the ability of the mode in describing the centre or
typical value of the distribution because a single value to describe the
centre cannot be identified.
In some cases, particularly where the data are continuous, the distribution
may have no mode at all (i.e. if all values are different).
In cases such as these, it may be better to consider using the median or
mean, or group the data in to appropriate intervals, and find the modal
class.
Median
The median is the middle value in distribution when the values are arranged
in ascending or descending order.
The median divides the distribution in half (there are 50% of observations
on either side of the median value). In a distribution with an odd number
of observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the
median is the middle value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value
is the mean of the two middle values. In the following distribution, the
two middle values are 56 and 57, therefore the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Advantage of the median:
The median is less affected by outliers and skewed data than the mean, and
is usually the preferred measure of central tendency when the distribution
is not symmetrical.
Limitation of the median:
The median cannot be identified for categorical nominal data, as it cannot
be logically ordered.
Mean
The mean is the sum of the value of each observation in a dataset divided
by the number of observations. This is also known as the arithmetic
average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values
(54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of
observations (11) which equals 56.6 years.
Advantage of the mean:
The mean can be used for both continuous and discrete numeric data.
Limitations of the mean:
The mean cannot be calculated for categorical data, as the values cannot be
summed.
As the mean includes every value in the distribution the mean is influenced
by outliers and skewed distributions.
3. (a) Explain what is probability in statistics. State the situation under which multiplicative probability model is used. (4+12=16)
(b) State ‘Bayes’ theorem and illustrate it with an example. (8+8=16)
-> In statistics and probability theory, the Bayes’ theorem (also known as the Bayes’ rule) is a mathematical formula used to determine the conditional probability of events. Essentially, the Bayes’ theorem describes the probability of an event based on prior knowledge of the conditions that might be relevant to the event.
The theorem is named after English statistician, Thomas Bayes, who discovered the formula in 1763. It is considered the foundation of the special statistical inference approach called the Bayes’ inference.
Besides statistics , the Bayes’ theorem is also used in various disciplines, with medicine and pharmacology as the most notable examples. In addition, the theorem is commonly employed in different fields of finance. Some of the applications include but are not limited to, modeling the risk of lending money to borrowers or forecasting the probability of the success of an investment.
Formula for Bayes’ Theorem
The Bayes’ theorem is expressed in the following formula:
Where:
· P(A|B) – the probability of event A occurring, given event B has occurred
· P(B|A) – the probability of event B occurring, given event A has occurred
- P(A) – the probability of event A
- P(B) – the probability of event B
Note that events A and B are independent events (i.e., the probability of the outcome of event A does not depend on the probability of the outcome of event B).
A special case of the Bayes’ theorem is when event A is a binary variable . In such a case, the theorem is expressed in the following way:
Where:
- P(B|A^{–}) – the probability of event B occurring given that event A^{–} has occurred
- P(B|A^{+}) – the probability of event B occurring given that event A^{+} has occurred
In the special case above, events A^{–} and A^{+} are mutually exclusive outcomes of event A.
Example of Bayes’ Theorem
Imagine you are a financial analyst at an investment bank. According to your research of publicly-traded companies , 60% of the companies that increased their share price by more than 5% in the last three years replaced their CEOs during the period.
At the same time, only 35% of the companies that did not increase their share price by more than 5% in the same period replaced their CEOs. Knowing that the probability that the stock prices grow by more than 5% is 4%, find the probability that the shares of a company that fires its CEO will increase by more than 5%.
Before finding the probabilities, you must first define the notation of the probabilities.
· P(A) – the probability that the stock price increases by 5%
· P(B) – the probability that the CEO is replaced
· P(A|B) – the probability of the stock price increases by 5% given that the CEO has been replaced
· P(B|A) – the probability of the CEO replacement given the stock price has increased by 5%.
Using the Bayes’ theorem, we can find the required probability:
Thus, the probability that the shares of a company that replaces its CEO will grow by more than 5% is 6.67%.
4. (a) Discuss the methods of determining sample size. (16)
(b) Distinguish between type-I error and type-II Error. (16)
-> Type I error vs. Type II error
Basis for comparison | Type I error | Type II error |
Definition | Type 1 error, in statistical hypothesis testing, is the error caused by rejecting a null hypothesis when it is true. | Type II error is the error that occurs when the null hypothesis is accepted when it is not true. |
Also termed | Type I error is equivalent to false positive. | Type II error is equivalent to a false negative. |
Meaning | It is a false rejection of a true hypothesis. | It is the false acceptance of an incorrect hypothesis. |
Symbol | Type I error is denoted by α. | Type II error is denoted by β. |
Probability | The probability of type I error is equal to the level of significance. | The probability of type II error is equal to one minus the power of the test. |
Reduced | It can be reduced by decreasing the level of significance. | It can be reduced by increasing the level of significance. |
Cause | It is caused by luck or chance. | It is caused by a smaller sample size or a less powerful test. |
What is it? | Type I error is similar to a false hit. | Type II error is similar to a miss. |
Hypothesis | Type I error is associated with rejecting the null hypothesis. | Type II error is associated with rejecting the alternative hypothesis. |
When does it happen? | It happens when the acceptance levels are set too lenient. | It happens when the acceptance levels are set too stringent. |
5. (a) What is non-parametric statistical test? Under what circumstances non-parametric tests are used? (6+10=16)
-> In statistics, nonparametric tests are methods of statistical analysis that do not require a distribution to meet the required assumptions to be analyzed (especially if the data is not normally distributed). Due to this reason, they are sometimes referred to as distribution-free tests. Nonparametric tests serve as an alternative to parametric tests such as T-test or ANOVA that can be employed only if the underlying data satisfies certain criteria and assumptions.
Nonparametric tests are used as an alternative method to parametric tests, not as their substitutes. In other words, if the data meets the required assumptions for performing the parametric tests, the relevant parametric test must be applied.
In addition, in some cases, even if the data do not meet the necessary assumptions but the sample size of the data is large enough, we can still apply the parametric tests instead of the nonparametric tests.
Reasons to Use Nonparametric Tests
In order to achieve the correct results from the statistical analysis , we should know the situations in which the application of nonparametric tests is appropriate. The main reasons to apply the nonparametric test include the following:
1. The underlying data do not meet the assumptions about the population sample
Generally, the application of parametric tests requires various assumptions to be satisfied. For example, the data follows a normal distribution and the population variance is homogeneous. However, some data samples may show skewed distributions .
The skewness makes the parametric tests less powerful because the mean is no longer the best measure of central tendency because it is strongly affected by the extreme values. At the same time, nonparametric tests work well with skewed distributions and distributions that are better represented by the median.
2. The population sample size is too small
The sample size is an important assumption in selecting the appropriate statistical method . If a sample size is reasonably large, the applicable parametric test can be used. However, if a sample size is too small, it is possible that you may not be able to validate the distribution of the data. Thus, the application of nonparametric tests is the only suitable option.
3. The analyzed data is ordinal or nominal
Unlike parametric tests that can work only with continuous data, nonparametric tests can be applied to other data types such as ordinal or nominal data. For such types of variables, the nonparametric tests are the only appropriate solution.
Types of Tests
Nonparametric tests include numerous methods and models. Below are the most common tests and their corresponding parametric counterparts:
1. Mann-Whitney U Test
The Mann-Whitney U Test is a nonparametric version of the independent samples t-test. The test primarily deals with two independent samples that contain ordinal data.
2. Wilcoxon Signed Rank Test
The Wilcoxon Signed Rank Test is a nonparametric counterpart of the paired samples t-test. The test compares two dependent samples with ordinal data.
3. The Kruskal-Wallis Test
The Kruskal-Wallis Test is a nonparametric alternative to the one-way ANOVA . The Kruskal-Wallis test is used to compare more than two independent groups with ordinal data.
(b) Explain Wilcoxon Signed test with appropriate examples. (16)
-> The Wilcoxon signed rank test (also called the Wilcoxon signed rank sum test) is a non-parametric test to compare data. When the word “non-parametric” is used in stats, it doesn’t quite mean that you know nothing about the population. It usually means that you know the population data does not have a normal distribution . The Wilcoxon signed rank test should be used if the differences between pairs of data are non-normally distributed.
Two slightly different versions of the test exist:
- The Wilcoxon signed rank test compares your sample median against a hypothetical median.
- The Wilcoxon matched-pairs signed rank test computes the difference between each set of matched pairs, then follows the same procedure as the signed rank test to compare the sample against some median.
The term “Wilcoxon” is often used for either test. This usually isn’t confusing, as it should be obvious if the data is matched, or not matched.
The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a paired difference test ). It can be used as an alternative to the paired Student’s t-test (also known as “t-test for matched pairs” or “t-test for dependent samples”) when the distribution of the difference between two samples’ means cannot be assumed to be normally distributed . A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution.
Assumptions
1. Data are paired and come from the same population.
2. Each pair is chosen randomly and independently^{[} ^{citation needed} ^{]} .
3. The data are measured on at least an interval scale when, as is usual, within-pair differences are calculated to perform the test (though it does suffice that within-pair comparisons are on an ordinal scale ).