17 Aug For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.
Question 1 (3 points)
Use the vgsales data from the file vgsales.xlsx. For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.
Put these data together as one table where the left-most column is the game name, the middle 4 columns are the total sales in NA, EU, JP, and other sales, and the right-most column is the total global sales. Sort the data by total global sales. Show only the top-10 rows (ie, 10 games with the highest total global sales).
It is fine to take a screenshot of your data in RStudio, provided that the font is large enough that the TA can read it; you do not need to export the data from R and create a “pretty” table in another program.
Hint re importing data: To import the vgsales data into R, you can first convert the data to a CSV file and import the data as demonstrated in class with read_csv(). Or you can use the read_excel() function from the readxl package.
Hint re calculations: once the data are in R, you should be able to create the requested table with one set of piped-together commands. This is not a requirement and you will be awared full credit as long as you create the requested table using R any way you like.
Question 2 (5 points)
Import the Order and OrderDetail datasets from order.csv and orderdetail.csv. Use these datasets to calculate the total revenue (in millions) per shipping region. Also calculate the percent of revenue for each shipping region. Order the rows by total revenue such that the shipping region with the largest total revenue is at the top.
Revenue can be calculated as Unit Price * Quantity * (1 – Discount).
Your table should have nine rows (one per shipping region) and three columns (the shipping region, the total revenue in millions, and revenue per region as a percent of all revenue).
Question 3 (5 points)
Continue to use the Order and OrderDetail data from question 2, as well as the revenue values you calculated. Use the “unaggregated” dataset with 621,883 rows (each row is a line item from an order). Drop the 73 rows with a missing (ie, N/A) value for the shipping year.
Then use facet_grid() to create a grid of histograms on this line-item data.
Each plot in the grid will be a histogram of the log(revenue)
revenue is a very positively skewed distribution, so we are plotting the natural log of revenue (R uses the log() function to calculate the natural log)
Each row of the grid should be a Shipping Region.
Each column of the grid will be the year in which an order was shipped.
Before creating the plot, you will need to create a ship_year variable. To do that, use the following code to help you:
mutate( ship_date = as.Date(shippeddate, "%m/%d/%Y"), ship_year = lubridate::year(ship_date))
This code says that we want to convert the shippeddate variable into a date using the as.Date() function. We have to tell the as.Date() function how the date us currently written, and so we use "%m/%d/%Y". Then we use the year()function from the lubridate package to “extract” the year values.
You can use code like filter(!is.na(ship_year)) to remove rows where the shipping year has a missing value.
- Question 4 (5 points)
Use the smartphone customer dataset. Scale the 6 phone-use variables (gaming, chat, maps, video, social, and reading). Then run k-means on all 6 variables with K=3. Answer the following two questions:
- How many customers are in each cluster?
- What is the within-cluster sum of squares value?
- (2 points) Question 5
Plot gaming vs reading minutes as a scatter plot and color the points according to their cluster assignment from question 4. Why do the clusters “overlap” in this plot — ie, the points “mix” near the cluster boundaries — but the clusters did not overlap when we did the k-means example in class?
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.