Discovering statistics using SPSS

08 Oct Discovering statistics using SPSS

Posted at 16:34h in business by

To be used in conjunction with Field, A. P. (2009). Discovering statistics using SPSS (third edition). London: Sage. Questions are listed under the chapter they best represent; however, they should not be given to students with the chapter numbers indicated (or else it will make the answers to some questions fairly obvious!). Correct answers are denoted with a .

Chapter 1 – Everything you wanted to know about Statistics

The standard deviation is the square root of

the coefficient of determination

sum of squares

variance

range

A frequency distribution in which low scores are most frequent (i.e. bars on the graph are highest on the left hand side) is said to be:

Positively skewed

Leptokurtic

Platykurtic

Negatively skewed

If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z- score for a score of 18?

a. –2

d. –1.41

Which of the following is true about a 95% confidence interval of the mean of a given sample:

95 out of 100 sample means will fall within the limits of the confidence interval.

There is a 95% chance that the population mean will fall within the limits of the confidence interval.

95 out of 100 population means will fall within the limits of the confidence interval.

There is a 0.05 probability that the population mean falls within the limits of the confidence interval.

What does a significant test statistic tell us?

There is an important effect.

The hull hypothesis is false.

There is an effect in the population of sufficient magnitude to be scientifically interesting.

All of the above.

A type I error is when

We conclude that there is a meaningful effect in the population when in fact there is not.

We conclude that there is not a meaningful effect in the population when in fact there is.

We conclude that the test statistic is significant when in fact it is not.

The data we have typed into SPSS is different to the data collected.

If we calculated an effect size and found it was r = .42 which expression would best describe the size of effect.

small

small-to-medium

large

medium-to-large

Which of these statements about statistical power is not true:

Power is the ability of a test to detect an effect.

We can use power to determine how big a sample is required to detect an effect of a certain size.

Power is linked to the probability of making a type I error.

All of the above are true.

What is a significance level?

The level at which statistics finally become meaningful to a stein

The impact that reporting statistics incorrectly could have

A pre-set level of probability that the results are correct

A pre-set level of probability at which it will be accepted that results are due to chance or not.

What is the conventional level of probability that is often accepted when conducting statistical tests?

a. 0.1

b. 0.05

c. 0.5

d. 0.001

A null hypothesis:

states that the experimental treatment will have an effect

is rarely used in experiments

predicts that the experimental treatment will have no effect

none of the above

Which of the following terms best describes the sentence: ‘In a blind-tasting, people will not be able to tell the difference between margarine and butter’

a directional hypothesis

an operational definition

a null hypothesis

a non-directional hypothesis

The aim of experimental research is to:

be a phenomenon

cause a phenomenon

investigate what caused a phenomenon

to prevent a phenomenon

‘Sleep derivation will reduce the ability to perform a complex cognitive task’. State the direction of this hypothesis:

Directional

Non-Directional

Both

Not enough information given

In experiments the independent variable is manipulated to determine:

effects on the individual participants

effect on the dependent variable

effects of certain stimuli

relation to other variables

Chapter 2 – The SPSS Environment

Which of the following could not be represented by columns in the SPSS Data editor:

Levels of repeated measures variables

Items on a questionnaire

Levels of between-group variables

Total values from different questionnaires.

Ordinal level data are characterised by:

data that can be meaningfully arranged by order of magnitude

equal intervals between each adjacent score

a fixed zero

none of the above

What is the advantage of using SPSS over calculating statistics by hand?

Quantitative data analysis is so complex today it is essential to use a stats package

It reduces the chance of making errors in your calculations

It equips you with a useful transferable skill

All of the above

In SPSS, what is the ‘Data Viewer’?

A table summarising the frequencies of data for one variable

A spreadsheet into which data can be entered

A dialog box that allows you to choose a statistical test

A screen in which variables can be defined and labelled

How is a variable name different from a variable label?

It is shorter and less detailed

It is longer and more detailed

It is abstract and unspecific

It refers to codes rather than variables

What does the operation ‘Recode Into Different Variables’ do to the data?

Replaces missing data with some random scores

Reverses the position of the independent and dependent variable on a graph

Redistributes a range of values into a new set of categories and creates a new variable

Represents the data in the form of a pie chart

How would you use the drop-down menus in SPSS to generate a frequency table?

Open the Output Viewer and click: Save As → Pie Chart

Click on: Analyze → Descriptive Statistics → Frequencies

Click on: Graphs → Frequencies → Pearson

Open the Variable Viewer and recode the value labels

When crosstabulating two variables, it is conventional to:

represent the independent variable in rows and the dependent variable in columns.

assign both the dependent and independent variables to columns.

represent the dependent variable in rows and the independent variable in columns.

assign both the dependent and independent variables to rows.

In which sub-dialog box can the Chi Square test be found?

Frequencies: Percentages

Crosstabs: Statistics

Bivariate: Pearson

Sex : Female

To generate a correlation coefficient between two variables with ordinal data, which set of instructions should you give SPSS?

Analyze → Crosstabs → Descriptive Statistics → Spearman → ok

Graphs → Frequencies → [select variables]→ Spearman → ok

Analyze → Compare Means → Anova table → First layer → Spearman → ok

Analyze → Correlate → Bivariate →[select variables] → Spearman → ok

Which of the following is NOT a file extension for files saved in SPSS?

.sav

.spo

.sps

. doc

If you are constructing a data file for a repeated measures design with 10 subjects and three conditions, hw many columns and rows will the file have?

Ten columns and four rows

Four columns and four rows

Ten columns and ten rows

Four columns and ten rows

Why might a data file have “missing data”?

Some of a participant’s responses might be missing

There has been a mistake in saving the SPSS data file

A participant did not take part in the whole study

None of the above

What might be an appropriate way to deal with missing data?

Ignore it

Go back to the participant and demand an answer

Define missing values using the “recode” function

Start the study again taking more care with data recording

What is the correct way to record non-numerical values?

You can’t, SPSS only uses numbers

Define the variable as “string”

Recode all the values as numbers

Define the variable as “date”

Chapter 3 – Exploring Data

Which of the following are assumptions underlying the use of parametric tests (based on the normal distribution)?

the data should be normally distributed

the samples being tested should have approximately equal variances

your data should be at least interval level

all of the above

Which of the following does a box-whisker plot not display:

The range

The inter-quartile range

The lower quartile

The mean

Which of the following is least affected by outliers

The range

The mean

The median

The standard deviation

I collected some data about how much buyers of my book liked it (on a scale of 1 = it’s utter rubbish) to 10 (I never read anything else). I ended up with a sample of 15467 people. When I looked at the distribution, I found a skew of 1.23 (SE = .65). The mean rating was 4.78. What is the z-score for the skew of my data?

a. 1.89

b. 0.53

c. -3.92

d. 3.36

Which of the following would be the best way to decide whether the skew in the example above is problematic?

See if the z-score is bigger than 1.96 or smaller than -1.96

See if the skew is significant at p < .05.

Use the Kolmogorov-Smirnov test.

None of the above because of the large sample size.

Which of the following is not a transformation that can be used to correct skewed data?

Log transformation

Tangent transformation

Square root transformation

Reciprocal transformation

The Kolmogorov-Smirnov test can be used to test:

Whether data are normally-distributed.

Whether group variances are equal.

Whether scores are measured at the interval level.

Whether group means differ.

The assumption of homogeneity of variance is met when:

The variance in one group is twice as big as that of a different group.

Variances in different groups are approximately equal.

The variance across groups is proportional to the means of those groups.

The variance is the same as the inter-quartile range.

If a Kolmogorov-Smirnov test is conducted and the result is significant, what does this mean for the data sample?

The data sample is normally distributed

The comparison used in the test is not valid

The data sample is not normally distributed

The test is wrong

Which of the following tests whether variances are homogenous?

Levene’s test

Bartlett’s test

Neither

Both

If a distribution is multimodal, what does this mean?

It will not be a normal distribution

The data has been entered incorrectly

It will be a normal distribution

It will have to be checked with a Levene’s test

What is an outlier?

A set of data outside the data file

A single score that is very different form the others

A score derived from a participant who has lied

A variable that cannot be quantified

Why are z-scores used to check for outliers?

They standardise scores for a known mean and standard deviation, allowing comparison

They allow you to allocate letters for missing values

A z-score is an outlier

They standardise scores in order to convert them to values closer to the mean

What does impendence of data mean?

That we must never collect two set so f data from one person

That independent researchers must collect the data

That scores from one participant are free from influences from other participant

That scores in one condition are free from influences from other conditions

Which of the followings NOT a property of a variance ratio?

It can be used to demonstrate homogeneity of variances

It is one variance divided by another

It is one variance multiplied by another

It can show the effect of a treatment on several groups

Chapter 4 – Correlation

The covariance is

An unstandardized version of the correlation coefficient.

A measure of the strength of relationship between two variables.

Dependent on the units of measurement of the variables.

All of the above.

A scatterplot shows

The frequency with which values appear in the data.

The average value of groups of data.

Scores on one variable plotted against scores on a second variable.

The proportion of data falling into different categories.

Which of the following statement about Pearson’s correlation coefficient is not true?

It can be used as an effect size measure

It varies between -1 and +1

It cannot be used with binary variables (those taking on a value of 0 or 1).

It can be used on ranked data.

The correlation between two variables A and B is .12 with a significance of p < .01, what can we concluded?

That there is a substantial relationship between A and B.

That there is a small relationship between A and B.

That variable A causes variable B.

All of the above.

How much variance has been explained by a correlation of .9? a. 81%

b. 18%

None of the above

When interpreting a correlation coefficient, it is important to look at:

The significance of the correlation coefficient.

The magnitude of the correlation coefficient.

The +/ – sign of the correlation coefficient.

All of the above.

The relationship between two variables controlling for the effect that a third variable has on one of those variables can be expressed using a:

Semi-partial correlation.

Bivariate correlation.

Point-biserial correlation.

Partial correlation.

20 people took part in study in which they completed two questionnaires: one that measured musical ability and one that measured their mathematical aptitude, the two sets of scores were then analysed to determine if the two skills were related. Which research design was used in the study?

an observational study

a case study

a correlational study

an experiment

If there were a perfect positive correlation between two interval/ratio variables, the Pearson’s r test would give a correlation coefficient of:

a. – 0.33.

b. +1.

c. + 0.88.

d. – 1.

What is the name of the test that is used to assess the relationship between two ordinal variables?

Spearman’s rho

Phi

Cramer’s V

Chi Square

What is meant by a ‘spurious’ relationship between two variables?

One that is so illogical it cannot possibly be true

An apparent relationship that is so curious it demands further attention

A relationship that appears to be true because each variable is related to a third one

One that produces a perfect negative correlation on a scatter diagram

A researcher conducts some research in which they identify a significant positive correlation (r =0.42) between the number of children a person has and their life satisfaction. Which of the following is it inappropriate to conclude from this research?

That having children makes people more satisfied with their life.

That someone who has children is likely to be more happy than someone who doesn’t.

That the consequences of having children are unclear.

That it is possible to predict someone’s life happiness partly on the basis of the number of children they have.

One of the factors that affects the reliability of findings from studies using correlations is:

the number of variables being investigated

the type of relationship that is found

the level of significance set at the start of the study

the number of people who take part

Correlational studies allow the researcher to:

test for differences between two variables

predict the effect of one variable upon another

make causal inferences about the relationship between two variables

identify the relationship between two variables

A positive correlation shows that:

two variables are unrelated

as one score increases so does the other

as one score increases so the other decreases

both a and b

Chapter 5 – Regression

R2 is

The percentage of variance in the predictor accounted for by the outcome variable.

The proportion of variance in the outcome accounted for by the predictor variable or variables.

The proportion of variance in the predictor accounted for by the outcome variable.

The percentage of variance in the outcome accounted for by the predictor variable or variables.

Which of the following statements about the t-statistic in regression is not true?

The t-statistic tests whether the regression coefficient, b, is equal to 0.

The t-statistic provides some idea of how well a predictor predicts the outcome variable.

The t-statistic can be used to see whether a predictor variable makes a statistically significant contribution to the regression model.

The t-statistic is equal to the regression coefficient divided by its standard deviation.

Which of the following statements about the F-ratio is true:

The F-ratio is the ratio of variance explained by the model to the error in the model.

The F-ratio is the ratio of variance explained by the model to the total variance in the outcome variable.

The F-ratio is the ratio of error variance to the total variance.

The F-ratio is the proportion of variance explained by the regression model.

Which of the following statements about outliers is not true?

Outliers are values very different from the rest of the data.

Outliers bias the mean.

Outliers bias regression parameters.

Outliers are influential cases.

What is multicollinearity?

When predictor variables correlate very highly with each other.

When predictor variables have a linear relationship with the outcome variable.

When predictor variables are correlated with variables not in the regression model.

When predictor variables are independent.

For which regression assumption does the Durbin-Watson statistic test?

Linearity.

Independence of errors.

Homoscedasticity.

Multicollinearity.

Which of the following is not a reason why multicollinearity a problem in regression?

It limits the size of R.

It makes it difficult to assess the importance of individual predictors.

It leads to unstable regression coefficients.

It creates heteroscedasticity in the data.

Using the model in Chapter 5 (equation 5.12), how many records would be sold if

£29000 was spent on advertising, it was played 19 times on radio and the band were rated 7 on the attractiveness scale?

2,461,660 records

2435 records

2488 records

d. 2,435,050 records

Which of these statements is not true?

If the average variance inflation factor is greater than 1 then the regression model might be biased.

Tolerance values above 0.2 may indicate multicollinearity in the data.

Multicollinearity in the data is shown by a VIF (variance inflation factor) greater than 10.

The tolerance is 1 divided by the VIF (variance inflation factor).

The following graph shows:

Heterscedasticity.

Non-linearity.

Heteroscedasticity and non-linearity.

Regression assumptions that have been met.

A researcher had a categorical variable that they wanted to include as a predictor in a regression equation. The researcher was trying to predict the success of a back pain intervention, and the categorical variable was the duration of the back pain prior to treatment with 4 categories: less than 6 months, 6-12 months, 1-2 years, more than 2 years. They needed to code these variables into dummy variables for the regression using less than 6 months as their control category. Which of the following represents the correct coding scheme?

Duration of Pain

Dummy 1

(Under 6 Months vs

6-12 Months)

Dummy 2

(Under 6 Months vs 1-2 Years)

Dummy 3

(Under 6 Months vs Over 2 Years)

Under 6 Months

6-12 Months

1-2 Years

More Than 2 Years

Duration of Pain

Dummy 1

(Under 6 Months vs

6-12 Months)

Dummy 2

(Under 6 Months vs 1-2 Years)

Dummy 3

(Under 6 Months vs Over 2 Years)

Under 6 Months

6-12 Months

1-2 Years

More Than 2 Years

Duration of Pain

Dummy 1

(Under 6 Months vs

6-12 Months)

Dummy 2

(Under 6 Months vs 1-2 Years)

Dummy 3

(Under 6 Months vs Over 2 Years)

Under 6 Months

6-12 Months

1-2 Years

More Than 2 Years

Duration of Pain

Dummy 1

(Under 6 Months vs

6-12 Months)

Dummy 2

(Under 6 Months vs 1-2 Years)

Dummy 3

(Under 6 Months vs Over 2 Years)

Under 6 Months

6-12 Months

1-2 Years

More Than 2 Years

The difficulty with using one regression equation to predict values in a different set of data is called

Shrinkage

Contraction

Reduction

Washing

The distance of cases from the model mean is called

Leverage values

Hat values

Standard distances

Mahalanobis distances

A way of representing discrete variables in multiple regression is by constructing

Stupid variables

Dummy variables

Imitation variables

Faking variables

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

08 Oct Discovering statistics using SPSS

About Us

Quick Links

Recent Posts

We Accept