09 Apr Cleansing Week4 R script
Background
· For this assignment, you will be using the Cleansing_Week4.R script and the data.csv
· The code is in R programming language. You should open R studio and open the file Cleansing_Week4.R. Follow the steps in the code and answer each of the following questions below.
· Some manipulation and rework of the code is required. The steps are explained in detail in the Code.
· Steps 0 through Step 5, included, should be completed.
Instructions
You should complete all the steps provided in the code and answer the following questions in a report.
After you complete your readings, and listen to the provided videos (Required), you will proceed with this implementation and report.
1. Introduction
· Provide information about the Language, GUI, and Data File you are using in this assignment. Use references to support the importance of the language you are using, the advantages, disadvantages, and how it relates to other languages that are used in Data Science.
· Provide the Value stored in the variable Randomizer in your code and your Student ID in this section. Take a printscreen of the output in your Console and paste it here.
2. Data Presentation before Cleansing
Run Step 0 and answer the following questions.
A. Data file format and the corresponding command that you used to read the data. Does the file have headers?
B. How many observations are there?
C. How many variables are in the data?
D. What is the purpose of the command str(df). Take a printscreen of the output in your Console and paste it here.
E. summary(df) # find out what this means and answer the question in your paper.
F. Answer the following questions:
a. # What type of variables does your file include
b. # Specific data types?
c. # Are they read properly?
d. # Are there any issues?
e. # Does your file include both NAs and blanks? How did you identify those?
f. # How many NAs do you have and
g. # How many blanks?
3. Data Preprocessing
A. Summarize the steps of preprocessing you expect to complete before you run the previous steps in your code. Recommend methods of inputting NAs in each of the variables when needed, and or observations. Review literature and suggest methods of imputation for Categorical and Numeric Variables.
B. Run the Step 1 in your code. How this step affected the NAs and the blanks in your variables (you can run summary(df)) to determine this. Take a printscreen of the output in your Console and paste it here.
C. For each of the Numeric Variables record the Mean and the Median, for the Categorical Variables record the counts. Present them on your paper on a table.
D. Run Steps 2-3 and 4. How many observations include NAs, how many variables include NAs, what is the percentage of rows and columns that have NAs, if we were to eliminate those, what is the approximate size of the remaining dataset? Is this the proper method of imputing?
E. Run Step 5 and answer the following questions:
1.
a. What is the method of imputation that is described? What does linear interpolation mean? Research and discuss if this is an appropriate method. The above method of imputation has now changed some of the statistics of your variables.
· Run summary(df) and compare with the previous statistics. Take a printscreen of the output in your Console and paste it here.
· Do you observe any undesired changes? Explain in detail, how could you have avoided this?
· Are there any more NA’s in your file?
Length: This assignment must be 4-5 pages (excluding the title and reference page)
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.