08 Oct Discovering statistics using SPSS
To be used in conjunction with Field, A. P. (2009). Discovering statistics using SPSS (third edition). London: Sage. Questions are listed under the chapter they best represent; however, they should not be given to students with the chapter numbers indicated (or else it will make the answers to some questions fairly obvious!). Correct answers are denoted with a .
Chapter 1 – Everything you wanted to know about Statistics
The standard deviation is the square root of
the coefficient of determination
sum of squares
variance
range
A frequency distribution in which low scores are most frequent (i.e. bars on the graph are highest on the left hand side) is said to be:
Positively skewed
Leptokurtic
Platykurtic
Negatively skewed
If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z- score for a score of 18?
a. –2
11
2
d. –1.41
Which of the following is true about a 95% confidence interval of the mean of a given sample:
95 out of 100 sample means will fall within the limits of the confidence interval.
There is a 95% chance that the population mean will fall within the limits of the confidence interval.
95 out of 100 population means will fall within the limits of the confidence interval.
There is a 0.05 probability that the population mean falls within the limits of the confidence interval.
What does a significant test statistic tell us?
There is an important effect.
The hull hypothesis is false.
There is an effect in the population of sufficient magnitude to be scientifically interesting.
All of the above.
A type I error is when
We conclude that there is a meaningful effect in the population when in fact there is not.
We conclude that there is not a meaningful effect in the population when in fact there is.
We conclude that the test statistic is significant when in fact it is not.
The data we have typed into SPSS is different to the data collected.
If we calculated an effect size and found it was r = .42 which expression would best describe the size of effect.
small
small-to-medium
large
medium-to-large
Which of these statements about statistical power is not true:
Power is the ability of a test to detect an effect.
We can use power to determine how big a sample is required to detect an effect of a certain size.
Power is linked to the probability of making a type I error.
All of the above are true.
What is a significance level?
The level at which statistics finally become meaningful to a stein
The impact that reporting statistics incorrectly could have
A pre-set level of probability that the results are correct
A pre-set level of probability at which it will be accepted that results are due to chance or not.
What is the conventional level of probability that is often accepted when conducting statistical tests?
a. 0.1
b. 0.05
c. 0.5
d. 0.001
A null hypothesis:
states that the experimental treatment will have an effect
is rarely used in experiments
predicts that the experimental treatment will have no effect
none of the above
Which of the following terms best describes the sentence: ‘In a blind-tasting, people will not be able to tell the difference between margarine and butter’
a directional hypothesis
an operational definition
a null hypothesis
a non-directional hypothesis
The aim of experimental research is to:
be a phenomenon
cause a phenomenon
investigate what caused a phenomenon
to prevent a phenomenon
‘Sleep derivation will reduce the ability to perform a complex cognitive task’. State the direction of this hypothesis:
Directional
Non-Directional
Both
Not enough information given
In experiments the independent variable is manipulated to determine:
effects on the individual participants
effect on the dependent variable
effects of certain stimuli
relation to other variables
Chapter 2 – The SPSS Environment
Which of the following could not be represented by columns in the SPSS Data editor:
Levels of repeated measures variables
Items on a questionnaire
Levels of between-group variables
Total values from different questionnaires.
Ordinal level data are characterised by:
data that can be meaningfully arranged by order of magnitude
equal intervals between each adjacent score
a fixed zero
none of the above
What is the advantage of using SPSS over calculating statistics by hand?
Quantitative data analysis is so complex today it is essential to use a stats package
It reduces the chance of making errors in your calculations
It equips you with a useful transferable skill
All of the above
In SPSS, what is the ‘Data Viewer’?
A table summarising the frequencies of data for one variable
A spreadsheet into which data can be entered
A dialog box that allows you to choose a statistical test
A screen in which variables can be defined and labelled
How is a variable name different from a variable label?
It is shorter and less detailed
It is longer and more detailed
It is abstract and unspecific
It refers to codes rather than variables
What does the operation ‘Recode Into Different Variables’ do to the data?
Replaces missing data with some random scores
Reverses the position of the independent and dependent variable on a graph
Redistributes a range of values into a new set of categories and creates a new variable
Represents the data in the form of a pie chart
How would you use the drop-down menus in SPSS to generate a frequency table?
Open the Output Viewer and click: Save As → Pie Chart
Click on: Analyze → Descriptive Statistics → Frequencies
Click on: Graphs → Frequencies → Pearson
Open the Variable Viewer and recode the value labels
When crosstabulating two variables, it is conventional to:
represent the independent variable in rows and the dependent variable in columns.
assign both the dependent and independent variables to columns.
represent the dependent variable in rows and the independent variable in columns.
assign both the dependent and independent variables to rows.
In which sub-dialog box can the Chi Square test be found?
Frequencies: Percentages
Crosstabs: Statistics
Bivariate: Pearson
Sex : Female
To generate a correlation coefficient between two variables with ordinal data, which set of instructions should you give SPSS?
Analyze → Crosstabs → Descriptive Statistics → Spearman → ok
Graphs → Frequencies → [select variables]→ Spearman → ok
Analyze → Compare Means → Anova table → First layer → Spearman → ok
Analyze → Correlate → Bivariate →[select variables] → Spearman → ok
Which of the following is NOT a file extension for files saved in SPSS?
.sav
.spo
.sps
. doc
If you are constructing a data file for a repeated measures design with 10 subjects and three conditions, hw many columns and rows will the file have?
Ten columns and four rows
Four columns and four rows
Ten columns and ten rows
Four columns and ten rows
Why might a data file have “missing data”?
Some of a participant’s responses might be missing
There has been a mistake in saving the SPSS data file
A participant did not take part in the whole study
None of the above
What might be an appropriate way to deal with missing data?
Ignore it
Go back to the participant and demand an answer
Define missing values using the “recode” function
Start the study again taking more care with data recording
What is the correct way to record non-numerical values?
You can’t, SPSS only uses numbers
Define the variable as “string”
Recode all the values as numbers
Define the variable as “date”
Chapter 3 – Exploring Data
Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)?
the data should be normally distributed
the samples being tested should have approximately equal variances
your data should be at least interval level
all of the above
Which of the following does a box-whisker plot not display:
The range
The inter-quartile range
The lower quartile
The mean
Which of the following is least affected by outliers
The range
The mean
The median
The standard deviation
I collected some data about how much buyers of my book liked it (on a scale of 1 = it’s utter rubbish) to 10 (I never read anything else). I ended up with a sample of 15467 people. When I looked at the distribution, I found a skew of 1.23 (SE = .65). The mean rating was 4.78. What is the z-score for the skew of my data?
a. 1.89
b. 0.53
c. -3.92
d. 3.36
Which of the following would be the best way to decide whether the skew in the example above is problematic?
See if the z-score is bigger than 1.96 or smaller than -1.96
See if the skew is significant at p < .05.
Use the Kolmogorov-Smirnov test.
None of the above because of the large sample size.
Which of the following is not a transformation that can be used to correct skewed data?
Log transformation
Tangent transformation
Square root transformation
Reciprocal transformation
The Kolmogorov-Smirnov test can be used to test:
Whether data are normally-distributed.
Whether group variances are equal.
Whether scores are measured at the interval level.
Whether group means differ.
The assumption of homogeneity of variance is met when:
The variance in one group is twice as big as that of a different group.
Variances in different groups are approximately equal.
The variance across groups is proportional to the means of those groups.
The variance is the same as the inter-quartile range.
If a Kolmogorov-Smirnov test is conducted and the result is significant, what does this mean for the data sample?
The data sample is normally distributed
The comparison used in the test is not valid
The data sample is not normally distributed
The test is wrong
Which of the following tests whether variances are homogenous?
Levene’s test
Bartlett’s test
Neither
Both
If a distribution is multimodal, what does this mean?
It will not be a normal distribution
The data has been entered incorrectly
It will be a normal distribution
It will have to be checked with a Levene’s test
What is an outlier?
A set of data outside the data file
A single score that is very different form the others
A score derived from a participant who has lied
A variable that cannot be quantified
Why are z-scores used to check for outliers?
They standardise scores for a known mean and standard deviation, allowing comparison
They allow you to allocate letters for missing values
A z-score is an outlier
They standardise scores in order to convert them to values closer to the mean
What does impendence of data mean?
That we must never collect two set so f data from one person
That independent researchers must collect the data
That scores from one participant are free from influences from other participant
That scores in one condition are free from influences from other conditions
Which of the followings NOT a property of a variance ratio?
It can be used to demonstrate homogeneity of variances
It is one variance divided by another
It is one variance multiplied by another
It can show the effect of a treatment on several groups
Chapter 4 – Correlation
The covariance is
An unstandardized version of the correlation coefficient.
A measure of the strength of relationship between two variables.
Dependent on the units of measurement of the variables.
All of the above.
A scatterplot shows
The frequency with which values appear in the data.
The average value of groups of data.
Scores on one variable plotted against scores on a second variable.
The proportion of data falling into different categories.
Which of the following statement about Pearson’s correlation coefficient is not true?
It can be used as an effect size measure
It varies between -1 and +1
It cannot be used with binary variables (those taking on a value of 0 or 1).
It can be used on ranked data.
The correlation between two variables A and B is .12 with a significance of p < .01, what can we concluded?
That there is a substantial relationship between A and B.
That there is a small relationship between A and B.
That variable A causes variable B.
All of the above.
How much variance has been explained by a correlation of .9? a. 81%
b. 18%
9%
None of the above
When interpreting a correlation coefficient, it is important to look at:
The significance of the correlation coefficient.
The magnitude of the correlation coefficient.
The +/ – sign of the correlation coefficient.
All of the above.
The relationship between two variables controlling for the effect that a third variable has on one of those variables can be expressed using a:
Semi-partial correlation.
Bivariate correlation.
Point-biserial correlation.
Partial correlation.
20 people took part in study in which they completed two questionnaires: one that measured musical ability and one that measured their mathematical aptitude, the two sets of scores were then analysed to determine if the two skills were related. Which research design was used in the study?
an observational study
a case study
a correlational study
an experiment
If there were a perfect positive correlation between two interval/ratio variables, the Pearson’s r test would give a correlation coefficient of:
a. – 0.33.
b. +1.
c. + 0.88.
d. – 1.
What is the name of the test that is used to assess the relationship between two ordinal variables?
Spearman’s rho
Phi
Cramer’s V
Chi Square
What is meant by a ‘spurious’ relationship between two variables?
One that is so illogical it cannot possibly be true
An apparent relationship that is so curious it demands further attention
A relationship that appears to be true because each variable is related to a third one
One that produces a perfect negative correlation on a scatter diagram
A researcher conducts some research in which they identify a significant positive correlation (r =0.42) between the number of children a person has and their life satisfaction. Which of the following is it inappropriate to conclude from this research?
That having children makes people more satisfied with their life.
That someone who has children is likely to be more happy than someone who doesn’t.
That the consequences of having children are unclear.
That it is possible to predict someone’s life happiness partly on the basis of the number of children they have.
One of the factors that affects the reliability of findings from studies using correlations is:
the number of variables being investigated
the type of relationship that is found
the level of significance set at the start of the study
the number of people who take part
Correlational studies allow the researcher to:
test for differences between two variables
predict the effect of one variable upon another
make causal inferences about the relationship between two variables
identify the relationship between two variables
A positive correlation shows that:
two variables are unrelated
as one score increases so does the other
as one score increases so the other decreases
both a and b
Chapter 5 – Regression
R2 is
The percentage of variance in the predictor accounted for by the outcome variable.
The proportion of variance in the outcome accounted for by the predictor variable or variables.
The proportion of variance in the predictor accounted for by the outcome variable.
The percentage of variance in the outcome accounted for by the predictor variable or variables.
Which of the following statements about the t-statistic in regression is not true?
The t-statistic tests whether the regression coefficient, b, is equal to 0.
The t-statistic provides some idea of how well a predictor predicts the outcome variable.
The t-statistic can be used to see whether a predictor variable makes a statistically significant contribution to the regression model.
The t-statistic is equal to the regression coefficient divided by its standard deviation.
Which of the following statements about the F-ratio is true:
The F-ratio is the ratio of variance explained by the model to the error in the model.
The F-ratio is the ratio of variance explained by the model to the total variance in the outcome variable.
The F-ratio is the ratio of error variance to the total variance.
The F-ratio is the proportion of variance explained by the regression model.
Which of the following statements about outliers is not true?
Outliers are values very different from the rest of the data.
Outliers bias the mean.
Outliers bias regression parameters.
Outliers are influential cases.
What is multicollinearity?
When predictor variables correlate very highly with each other.
When predictor variables have a linear relationship with the outcome variable.
When predictor variables are correlated with variables not in the regression model.
When predictor variables are independent.
For which regression assumption does the Durbin-Watson statistic test?
Linearity.
Independence of errors.
Homoscedasticity.
Multicollinearity.
Which of the following is not a reason why multicollinearity a problem in regression?
It limits the size of R.
It makes it difficult to assess the importance of individual predictors.
It leads to unstable regression coefficients.
It creates heteroscedasticity in the data.
Using the model in Chapter 5 (equation 5.12), how many records would be sold if
£29000 was spent on advertising, it was played 19 times on radio and the band were rated 7 on the attractiveness scale?
2,461,660 records
2435 records
2488 records
d. 2,435,050 records
Which of these statements is not true?
If the average variance inflation factor is greater than 1 then the regression model might be biased.
Tolerance values above 0.2 may indicate multicollinearity in the data.
Multicollinearity in the data is shown by a VIF (variance inflation factor) greater than 10.
The tolerance is 1 divided by the VIF (variance inflation factor).
The following graph shows:
Heterscedasticity.
Non-linearity.
Heteroscedasticity and non-linearity.
Regression assumptions that have been met.
A researcher had a categorical variable that they wanted to include as a predictor in a regression equation. The researcher was trying to predict the success of a back pain intervention, and the categorical variable was the duration of the back pain prior to treatment with 4 categories: less than 6 months, 6-12 months, 1-2 years, more than 2 years. They needed to code these variables into dummy variables for the regression using less than 6 months as their control category. Which of the following represents the correct coding scheme?
Duration of Pain
Dummy 1
(Under 6 Months vs
6-12 Months)
Dummy 2
(Under 6 Months vs 1-2 Years)
Dummy 3
(Under 6 Months vs Over 2 Years)
Under 6 Months
0
0
0
6-12 Months
1
0
0
1-2 Years
0
1
0
More Than 2 Years
0
0
1
Duration of Pain
Dummy 1
(Under 6 Months vs
6-12 Months)
Dummy 2
(Under 6 Months vs 1-2 Years)
Dummy 3
(Under 6 Months vs Over 2 Years)
Under 6 Months
1
1
1
6-12 Months
1
0
0
1-2 Years
0
1
0
More Than 2 Years
0
0
1
b.
Duration of Pain
Dummy 1
(Under 6 Months vs
6-12 Months)
Dummy 2
(Under 6 Months vs 1-2 Years)
Dummy 3
(Under 6 Months vs Over 2 Years)
Under 6 Months
0
0
0
6-12 Months
0
1
1
1-2 Years
1
0
1
More Than 2 Years
1
1
0
c.
Duration of Pain
Dummy 1
(Under 6 Months vs
6-12 Months)
Dummy 2
(Under 6 Months vs 1-2 Years)
Dummy 3
(Under 6 Months vs Over 2 Years)
Under 6 Months
1
1
1
6-12 Months
0
1
1
1-2 Years
1
0
1
More Than 2 Years
1
1
0
d.
The difficulty with using one regression equation to predict values in a different set of data is called
Shrinkage
Contraction
Reduction
Washing
The distance of cases from the model mean is called
Leverage values
Hat values
Standard distances
Mahalanobis distances
A way of representing discrete variables in multiple regression is by constructing
Stupid variables
Dummy variables
Imitation variables
Faking variables
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.