2015
COMMERCE
Paper: 203
(Research Methodology and Statistical Analysis)
Full Marks: 80
Time: 3 hours
The figures in the margin indicate full marks for the questions
1. (a) What are the factors that have to be taken into consideration by a researcher while choosing the type of research design? (16)
-> Research methodology is a scientific and logical technique that helps you decide on an appropriate research method to collect data. When you start your research, the first question that hits your mind is “What type of research do I need to meet my research objectives?” Selecting a research methodology is one of the most critical factors that can make or break your research project. Following are the factors to be considered while deciding your research methodology:
Research Goal
Think of your research goals. Consider what your research project wants to accomplish which will help you to decide the research design. Do you need to find out all the information at one fell swoop, or you want to conduct follow-up research? If you have an outline of the information that you need at the end of the research project, you will be able to use the right methodology to choose the right research method.
Statistical significance
Another essential factor to consider while choosing the research methodology is statistical results. If you need clear and highly data-driven research results or statistical answers, you will need quantitative data. However, if your research questions are based on the understanding of reasons, opinions, perceptions and motivations, your data will be less statistical and more thematic.
Quantitative vs qualitative data
Your research methodology will decide whether you need qualitative or quantitative or both methods. If you want to capture insights into a problem to develop ideas for a solution, you will use qualitative data. You will use qualitative tools such as open-ended interviews to collect data from the target audience. However, if you have any questions, quantitative tools such as surveys can be the best approach to achieve desired results.
Sample size
While considering a research methodology, the sample size is an important consideration. How big does your sample size need to be to determine answers to research questions and meet research objectives? Will you prefer surveying 50 or 1000 people? If you need a large sample size, you do not need time-consuming methods such as face-to-face interviews.
Timing
The availability of time is another crucial factor that comes into play at the time of deciding on research methodology. If you need results within the research frame, you might consider using tools and techniques that allow data collection in just a few days. For instance, random or convenience sampling can be your preferred data collection technique. However, if your prescribed data collection period is relatively long, you can organize in-person interviews with your samples.
(b) A soft drink company wants to launch a carbonated soft drink in an island inhabited by tribal people who have never tasted such a drink. As a consultant to the firm, you have been assigned the task of finding out the possibility of success of such a drink and the possible mode of operation for a ‘product launch’. What is the type of research design you are likely to adopt and why? 16
2. (a) Explain the three well known methods of measuring central tendency. (16)
A measure of central tendency (also referred to as measures of centre or
central location) is a summary measure that attempts to describe a whole
set of data with a single value that represents the middle or centre of its
distribution.
There are three main measures of central tendency: the mode, the median and
the mean. Each of these measures describes a different indication of the
typical or central value in the distribution.
Mode
The mode is the most commonly occurring value in a distribution.
Consider this dataset showing the retirement age of 11 people, in whole
years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age
data.
Age | Frequency |
54 | 3 |
55 | 1 |
56 | 1 |
57 | 2 |
58 | 2 |
60 | 2 |
The most commonly occurring value is 54; therefore the mode of this
distribution is 54 years.
Advantage of the mode:
The mode has an advantage over the median and the mean as it can be found
for both numerical and categorical (non-numerical) data.
Limitations of the mode:
There are some limitations to using the mode. In some distributions, the
mode may not reflect the centre of the distribution very well. When the
distribution of retirement age is ordered from lowest to highest value, it
is easy to see that the centre of the distribution is 57 years, but the
mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, and 60
It is also possible for there to be more than one mode for the same
distribution of data, (bi-modal, or multi-modal). The presence of more than
one mode can limit the ability of the mode in describing the centre or
typical value of the distribution because a single value to describe the
centre cannot be identified.
In some cases, particularly where the data are continuous, the distribution
may have no mode at all (i.e. if all values are different).
In cases such as these, it may be better to consider using the median or
mean, or group the data in to appropriate intervals, and find the modal
class.
Median
The median is the middle value in distribution when the values are arranged
in ascending or descending order.
The median divides the distribution in half (there are 50% of observations
on either side of the median value). In a distribution with an odd number
of observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the
median is the middle value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value
is the mean of the two middle values. In the following distribution, the
two middle values are 56 and 57, therefore the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Advantage of the median:
The median is less affected by outliers and skewed data than the mean, and
is usually the preferred measure of central tendency when the distribution
is not symmetrical.
Limitation of the median:
The median cannot be identified for categorical nominal data, as it cannot
be logically ordered.
Mean
The mean is the sum of the value of each observation in a dataset divided
by the number of observations. This is also known as the arithmetic
average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values
(54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of
observations (11) which equals 56.6 years.
Advantage of the mean:
The mean can be used for both continuous and discrete numeric data.
Limitations of the mean:
The mean cannot be calculated for categorical data, as the values cannot be
summed.
As the mean includes every value in the distribution the mean is influenced
by outliers and skewed distributions.
3. (a) Explain the statistical Decision Theory? (16)
-> Every individual has to make some decisions or others regarding his every day activity. The decisions of routine nature do not involve high risks and are consequently trivial in nature. When business executives make decisions, their decisions affect other people like consumers of the product, shareholders of the business unit, and employees of the organization.
Such decisions which affect other people in society involve a very careful and objective analysis of their consequences. The statistician’s task is to split a decision problem in its simple components and study whether any or some of them are amenable to scientific treatment and therefore he tries to bring out a method by which these components can be woven into coherent and consistent decision of the problem as a whole.
The decision problems can be classified into five types and they are:
1. Decision Making Under Certainty:
There are a few problems where the decision maker gets almost complete information so that he knows all the facts about the state of nature and again which state of nature would occur and also the consequences of the state of nature. In such a situation, the problem of decision making is simple because the decision maker has only to choose the strategy which will give him maximum pay-off in terms of utility.
In cases where the strategy rows are normally very large and it is impossible even to list them, the technique of operational research like linear and nonlinear programming and geometric programming would have to be used to achieve the optimal strategy.
2. Decision Making Under Risk:
A problem of this kind arises when the state of nature is unknown, but based on the objective or empirical evidence, we can possibly assign probabilities to various states of nature. In a number of problems on the basis of historical data and past experience, we are able to assign probabilities to various states of nature. In such cases, the pay-off matrix is of immense help for reaching an optimal decision by assigning probabilities to various states of nature.
3. Decision Making Under Uncertainty:
The process of making decision under conditions of uncertainty takes place when there is hardly any knowledge about states of nature and no objective information about their probabilities of occurrence. In such cases of absence of historical data and relative frequency, the probability of the occurrence of the particular state of nature cannot be indicated.
Such situations arise when a new product is introduced or a new plant is set up. Of course, even in such cases some market surveys are conducted and relevant information is gathered though it is not generally sufficient to indicate a probability figure for the occurrence of a particular state of nature.
4. Decision Making Under Partial Information:
This type of situation is somewhere between the conditions of risk and conditions of uncertainty. As regards conditions of risk, we have seen that the probability of the occurrence of various states of nature are known as the basis of past experience, and in conditions of uncertainty, there is no such data available. But many situations arise where there is partial availability of data. In such circumstances, we can say that decision making is done on the basis of partial information.
5. Decision Making Under Conflict:
A condition of conflict is supposed to occur when we are dealing with rational opponent rather than the state of nature. The decision maker, therefore, has to choose a strategy taking into consideration the action or counter-action of his opponent. Brand competition, military weapons, market place, etc. are problems which come under this category. The strategy choice is done as the basis of game theory where a decision maker anticipates the action of the opponent and then determines his own strategy.
The main purpose of studying decision theory is to put the problem into a suitable logical framework. It includes identification of the problem. Personal perception and innovativeness are two essential things for the identification of the problem, and then generating alternative course of action and finally evolving criteria for evaluating the different alternatives to arrive at the best choice of action.
The basic components of a decision situation are the following:
1. Acts:
There are many alternative courses of action in any decision problem. But only some relevant alternatives need be considered. For instance, the business firm may decide to market its goods within the state or within the country or beyond the boundaries of the country. Here, there are three alternatives. There may be more such alternatives. The final choice of any one will depend upon the payoffs from each strategy.
2. States of Nature:
There are those possible events or the states of nature which are uncertain but are vital for the choice of any one of the alternative acts. For example, the radio dealer does not know how many radios he will be able to sell. There is an element of uncertainty about it and for this reason he cannot decide how many radios to buy. This uncertainty is known as the state of nature or the state of the world.
3. Outcomes:
There is an outcome of the combination of each of the likely acts and possible states of nature. This is otherwise known as conditional value. The outcome has not much significant unless we calculate the pay-offs in terms of monetary gain or loss for each outcome. Thus outcome refers to the result of the combination of an act and each of the states of nature.
4. Pay-off:
The pay-off deals with the monetary gain or loss from each of the outcomes. It can be also in terms of cost-saving or lime-saving but the expression of pay-off should always be in quantitative terms to help precise analysis. Therefore where the value of output is expressed directly in terms of gain expressed in money it is called pay-off. The calculation of pay-off or utility of each outcome has to be carefully done.
5. Expected Values of Each Act:
In practical business situation, there is risk and uncertainty. In the case of risk, the probability of each state of nature is known, and in uncertainty, it is unknown. Therefore, each likely outcome of an act has to be appraised with reference to the probability of occurrence.
The expected value of a given act can be calculated by the following formula:
Where P_{1} to P_{n }refers to event probabilities of events E_{1}to E_{n }and O_{ij}, the pay-offs of the outcome with the combination of each event and act. The expected value of each alternative is thus calculated with reference to probability assigned to each state of nature.
(b) Explain the Bayes Theorem. (16)
-> In statistics and probability theory, the Bayes’ theorem (also known as the Bayes’ rule) is a mathematical formula used to determine the conditional probability of events. Essentially, the Bayes’ theorem describes the probability of an event based on prior knowledge of the conditions that might be relevant to the event.
The theorem is named after English statistician, Thomas Bayes, who discovered the formula in 1763. It is considered the foundation of the special statistical inference approach called the Bayes’ inference.
Besides statistics , the Bayes’ theorem is also used in various disciplines, with medicine and pharmacology as the most notable examples. In addition, the theorem is commonly employed in different fields of finance. Some of the applications include but are not limited to, modeling the risk of lending money to borrowers or forecasting the probability of the success of an investment.
Formula for Bayes’ Theorem
The Bayes’ theorem is expressed in the following formula:
Where:
· P(A|B) – the probability of event A occurring, given event B has occurred
· P(B|A) – the probability of event B occurring, given event A has occurred
- P(A) – the probability of event A
- P(B) – the probability of event B
Note that events A and B are independent events (i.e., the probability of the outcome of event A does not depend on the probability of the outcome of event B).
A special case of the Bayes’ theorem is when event A is a binary variable . In such a case, the theorem is expressed in the following way:
Where:
- P(B|A^{–}) – the probability of event B occurring given that event A^{–} has occurred
- P(B|A^{+}) – the probability of event B occurring given that event A^{+} has occurred
In the special case above, events A^{–} and A^{+} are mutually exclusive outcomes of event A.
Example of Bayes’ Theorem
Imagine you are a financial analyst at an investment bank. According to your research of publicly-traded companies , 60% of the companies that increased their share price by more than 5% in the last three years replaced their CEOs during the period.
At the same time, only 35% of the companies that did not increase their share price by more than 5% in the same period replaced their CEOs. Knowing that the probability that the stock prices grow by more than 5% is 4%, find the probability that the shares of a company that fires its CEO will increase by more than 5%.
Before finding the probabilities, you must first define the notation of the probabilities.
- P(A) – the probability that the stock price increases by 5%
- P(B) – the probability that the CEO is replaced
· P(A|B) – the probability of the stock price increases by 5% given that the CEO has been replaced
· P(B|A) – the probability of the CEO replacement given the stock price has increased by 5%.
Using the Bayes’ theorem, we can find the required probability:
Thus, the probability that the shares of a company that replaces its CEO will grow by more than 5% is 6.67%.
4. (a) What is Null Hypothesis and Alternate Hypothesis? (16)
Null hypothesis definition
The null hypothesis is a general statement that states that there is no relationship between two phenomenons under consideration or that there is no association between two groups.
· A hypothesis, in general, is an assumption that is yet to be proved with sufficient pieces of evidence. A null hypothesis thus is the hypothesis a researcher is trying to disprove.
· A null hypothesis is a hypothesis capable of being objectively verified, tested, and even rejected.
· If a study is to compare method A with method B about their relationship, and if the study is preceded on the assumption that both methods are equally good, then this assumption is termed as the null hypothesis.
· The null hypothesis should always be a specific hypothesis, i.e., it should not state about or approximately a certain value.
Null hypothesis symbol
- The symbol for the null hypothesis is H_{0,} and it is read as H-null, H-zero, or H-naught.
· The null hypothesis is usually associated with just ‘equals to’ sign as a null hypothesis can either be accepted or rejected.
Null hypothesis purpose
· The main purpose of a null hypothesis is to verify/ disprove the proposed statistical assumptions.
- Some scientific null hypothesis help to advance a theory.
· The null hypothesis is also used to verify the consistent results of multiple experiments. For e.g., the null hypothesis stating that there is no relation between some medication and age of the patients supports the general effectiveness conclusion, and allows recommendations.
Null hypothesis principle
- The principle of the null hypothesis is collecting the data and determining the chances of the collected data in the study of a random sample, proving that the null hypothesis is true.
· In situations or studies where the collected data doesn’t complete the expectation of the null hypothesis, it is concluded that the data doesn’t provide sufficient or reliable pieces of evidence to support the null hypothesis and thus, it is rejected.
· The data collected is tested through some statistical tool which is designed to measure the extent of departure of the date from the null hypothesis.
· The procedure decides whether the observed departure obtained from the statistical tool is larger than a defined value so that the probability of occurrence of a high departure value is very small under the null hypothesis.
· However, some data might not contradict the null hypothesis which explains that only a weak conclusion can be made and that the data doesn’t provide strong pieces of evidence against the null hypothesis and the null hypothesis might or might not be true.
· Under some other conditions, if the data collected is sufficient and is capable of providing enough evidence, the null hypothesis can be considered valid, indicating no relationship between the phenomena.
Alternative hypothesis definition
An alternative hypothesis is a statement that describes that there is a relationship between two selected variables in a study.
· An alternative hypothesis is usually used to state that a new theory is preferable to the old one (null hypothesis).
· This hypothesis can be simply termed as an alternative to the null hypothesis.
· The alternative hypothesis is the hypothesis that is to be proved that indicates that the results of a study are significant and that the sample observation is not results just from chance but from some non-random cause.
· If a study is to compare method A with method B about their relationship and we assume that the method A is superior or the method B is inferior, then such a statement is termed as an alternative hypothesis.
· Alternative hypotheses should be clearly stated, considering the nature of the research problem.
Alternative hypothesis symbol
- The symbol of the alternative hypothesis is either H_{1} or H _{a} while using less than, greater than or not equal signs.
Alternative hypothesis purpose
· An alternative hypothesis provides the researchers with some specific restatements and clarifications of the research problem.
· An alternative hypothesis provides a direction to the study, which then can be utilized by the researcher to obtain the desired results.
· Since the alternative hypothesis is selected before conducting the study, it allows the test to prove that the study is supported by evidence, separating it from the researchers’ desires and values.
· An alternative hypothesis provides a chance of discovering new theories that can disprove an existing one that might not be supported by evidence.
· The alternative hypothesis is important as they prove that a relationship exists between two variables selected and that the results of the study conducted are relevant and significant.
Alternative hypothesis principle
· The principle behind the alternative hypothesis is similar to that of the null hypothesis.
· The alternative hypothesis is based on the concept that when sufficient evidence is collected from the data of random sample, it provides a basis for proving the assumption made by the researcher regarding the study.
· Like in the null hypothesis, the data collected from a random sample is passed through a statistical tool that measures the extent of departure of the data from the null hypothesis.
· If the departure is small under the selected level of significance, the alternative hypothesis is accepted, and the null hypothesis is rejected.
· If the data collected don’t have chances of being in the study of the random sample and are instead decided by the relationship within the sample of the study, an alternative hypothesis stands true.
Null hypothesis vs Alternative hypothesis
Basis of comparison | Null hypothesis | Alternative hypothesis |
Definition | The null hypothesis is a general statement that states that there is no relationship between two phenomenons under consideration or that there is no association between two groups. | An alternative hypothesis is a statement that describes that there is a relationship between two selected variables in a study. |
Symbol | It is denoted by H_{0}. | It is denoted by H_{1} or H_{a}. |
Mathematical expression | It is followed by ‘equals to’ sign. | It is followed by not equals to, ‘less than’ or ‘greater than’ sign. |
Observation | The null hypothesis believes that the results are observed as a result of chance. | The alternative hypothesis believes that the results are observed as a result of some real causes. |
Nature | It is the hypothesis that the researcher tries to disprove. | It is a hypothesis that the researcher tries to prove. |
Result | The result of the null hypothesis indicates no changes in opinions or actions. | The result of an alternative hypothesis causes changes in opinions and actions. |
Significance of data | If the null hypothesis is accepted, the results of the study become insignificant. | If an alternative hypothesis is accepted, the results of the study become significant. |
Acceptance | If the p-value is greater than the level of significance, the null hypothesis is accepted. | If the p-value is smaller than the level of significance, an alternative hypothesis is accepted. |
Importance | The null hypothesis allows the acceptance of correct existing theories and the consistency of multiple experiments. | Alternative hypothesis are important as it establishes a relationship between two variables, resulting in new improved theories. |
(b) How can you express the reliability of an estimated population mean? (16)
-> The most fundamental point and interval estimation process involves the estimation of a population mean. Suppose it is of interest to estimate the population means, μ, for a quantitative variable. Data collected from a simple random sample can be used to compute thesample mean, x̄, where the value of x̄ provides a point estimate of μ.
When the sample mean is used as a point estimate of the population mean, some error can be expected owing to the fact that a sample, or subset of the population, is used to compute the point estimate. The absolute value of the difference between the sample mean, x̄, and the population mean, μ, written |x̄ − μ|, is called the sampling error . Interval estimation incorporates a probability statement about the magnitude of the sampling error. The sampling distribution of x̄ provides the basis for such a statement.
Statisticians have shown that the mean of the sampling distribution of x̄ is equal to the population mean, μ, and that the standard deviation is given by σ/Square root of √n, where σ is the population standard deviation. The standard deviation of a sampling distribution is called the standard error . For large sample sizes, the central limit theorem indicates that the sampling distribution of x̄ can be approximated by a normal probability distribution . As a matter of practice, statisticians usually consider samples of size 30 or more to be large.
In the large-sample case, a 95% confidence interval estimate for the population mean is given by x̄ ± 1.96σ/Square root of √n. When the population standard deviation, σ, is unknown, the sample standard deviation is used to estimate σ in the confidence interval formula. The quantity 1.96σ/Square root of √n is often called the margin of error for the estimate . The quantity σ/Square root of √n is the standard error, and 1.96 is the number of standard errors from the mean necessary to include 95% of the values in a normal distribution . The interpretation of a 95% confidence interval is that 95% of the intervals constructed in this manner will contain the population mean. Thus, any interval computed in this manner has a 95% confidence of containing the population mean. By changing the constant from 1.96 to 1.645, a 90% confidence interval can be obtained. It should be noted from the formula for an interval estimate that a 90% confidence interval is narrower than a 95% confidence interval and as such has a slightly smaller confidence of including the population mean. Lower levels of confidence lead to even more narrow intervals. In practice, a 95% confidence interval is the most widely used.
Owing to the presence of the n^{1/2} term in the formula for an interval estimate, the sample size affects the margin of error. Larger sample sizes lead to smaller margins of error. This observation forms the basis for procedures used to select the sample size. Sample sizes can be chosen such that the confidence interval satisfies any desired requirements about the size of the margin of error.
The procedure just described for developing interval estimates of a population mean is based on the use of a large sample. In the small-sample case—i.e., where the sample size n is less than 30—the t distribution is used when specifying the margin of error and constructing a confidence interval estimate. For example, at a 95% level of confidence, a value from the t distribution, determined by the value of n, would replace the 1.96 value obtained from the normal distribution. The t values will always be larger, leading to wider confidence intervals, but, as the sample size becomes larger, the t values get closer to the corresponding values from a normal distribution. With a sample size of 25, the t value used would be 2.064, as compared with the normal probability distribution value of 1.96 in the large-sample case.
Estimation of other parameters
For qualitative variables , the population proportion is a parameter of interest. A point estimate of the population proportion is given by the sample proportion. With knowledge of the sampling distribution of the sample proportion, an interval estimate of a population proportion is obtained in much the same fashion as for a population mean. Point and interval estimation procedures such as these can be applied to other population parameters as well. For instance, interval estimation of a population variance, standard deviation, and total can be required in other applications.
Estimation procedures for two populations
The estimation procedures can be extended to two populations for comparative studies. For example, suppose a study is being conducted to determine differences between the salaries paid to a population of men and a population of women. Two independent simple random samples, one from the population of men and one from the population of women, would provide two sample means, x̄_{1} and x̄_{2}. The difference between the two sample means, x̄_{1} − x̄_{2}, would be used as a point estimate of the difference between the two population means. The sampling distribution of x̄_{1} − x̄_{2} would provide the basis for a confidence interval estimate of the difference between the two population means. For qualitative variables, point and interval estimates of the difference between population proportions can be constructed by considering the difference between sample proportions.
5. (a) How is Chi-square used as a non parametric test? (16)
-> Reasons why Chi Square is suitable for descriptive statistics:
1. If the sample size is large, it will always prove significant, hence not reliable for large sample size.
2. A large sample size requires probability sampling (random), hence Chi Square is not suitable for determining if sample is well represented in the population (parametric). This is why Chi Square behaves well as a non-parametric technique.
3. The statistical data types suitable for Chi Square are nominal and ordinal types which fall under descriptive statistics.
4. To avoid confusion, I will state that’ Chi Square behaves well for inferential statistics if the population is known and low, and no need for probability sampling.
(b) Briefly outline the steps in writing the final report in case of an empirical study. (16)
->