Chat with us, powered by LiveChat Prepare Data for Listings123 - Essayabode

Prepare Data for Listings123

Prepare Data for Listings123 ?

Use the information in the email you received from your manager, Stevie, along with the full Listings123 data (DataExport_2020). You will be preparing your data and applying data cleaning best practices to the DataExport_2020 data. Make sure you fill out the Prompts sheet when completing each step.

This project’s email thread is located below, in case you need to reference it.

You will prepare your data for analysis by applying the following steps:

  1. Explore the Data
  1. Summarize Data Using a Pivot Table
  1. Clean Inconsistent, Invalid, Duplicate, and Missing Data
  1. Clean Text Data I (Search and Count)
  1. Clean Text Data II (Split Text to Columns)
  1. Merge Tables
  1. Summarize Data Wrangling

IMPORTANT
Before submitting your project, make sure:

⎯⎯ All the column names AND sheet names are named exactly as stated in the instructions. Do not change any existing names in the spreadsheet unless instructed.

⎯⎯ The indicated columns are organized in the correct sheet.

⎯⎯ True and False values are stated as TRUE and FALSE, not 1/0, T/F, etc.

Otherwise, your project will be graded incorrectly.

TIP
If you’d like more of a challenging experience or less guidance when completing this project do not open the HINT drop downs until after you’ve attempted the project step.

STEP 1: Explore the Data

Before jumping into wrangling your data, it’s important to explore the data and address the requests mentioned in the email.

INSTRUCTIONS

  1. Go to the Step 1 in the Prompts sheet in your spreadsheet and answer Questions 1-3. You will need to reference the DataExport_2020 sheet to answer these questions.
  1. Before answering Question 4 in the Prompts sheets, calculate the Number of Reviews Per Day column in the DataExport_2020 sheet.HINT
  1. Then proceed to answer Step 1, Question 4 in the Prompts sheet
  2. NOTE
    All of your work for Step 1 should be in the Prompts (Step 1) and DataExport_2020 sheets. STEP 2: Summarize Data Using A Pivot Table Listings123 is hoping to find specific traits within the listings data that might indicate the success of a host. Listings123 believes it might be different for each city. Summarize the data by city using a Pivot Table. INSTRUCTIONS
    1. Create a Pivot Table in the Summary By City sheet using the data from the DataExport_2020 sheet. The pivot table should contain the following fields summarized by City.
    1. COUNT of Host ID
      (Rename this field as “COUNT of Listings”)
  3. COUNTUNIQUE of Host ID
  1. AVERAGE of Bathrooms
  1. AVERAGE of Bedrooms
  1. AVERAGE of Beds
  2.  
  3. SUM of Number of Reviews
  1. AVERAGE of Days Listed
  1. AVERAGE of Review Scores Location
  1. AVERAGE of Review Scores Value
  1. HINT
  2. Create a new field in column L of the Summary By City sheet that finds the percentage of listings with existing hosts. Name this column “% of Listings with Repeat Hosts”HINT
  3.  
  4. Format your pivot table to make it legible. Apply customization best practices that include your new column.HINT
    1.  
    1. Go to the Step 2 in the Prompts sheet and answer Questions 1-3. You will need to reference the Summary By City sheet to answer these questions.

NOTE
All of your work for Step 2 should be in the Prompts (Step 2) and Summary By City sheets.

STEP 3: Clean Inconsistent, Invalid, Duplicate, and Missing Data

Mid-way through your project, you’re messaging David (Database Administrator) directly, for more information about the quality of the data, and you receive the following Slack message: Analysts do find bad listings in the data. We once found a listing with 852 bedrooms! I have been removing these listings from the data, and the engineering team has worked to implement validation checks to prevent this type of dirty data.

I’d recommend you clean DataExport_2020 before continuing your project.

As an analyst, you should always check your data before starting your analysis whether you are cleaning your data for dirty data or to prepare it to fit the analysis.

For this step you’ll be working in a copy of DataExport_2020 sheet called Copy of DataExport_2020.

 

IMPORTANT
You’ll notice there are new blank columns (J:Q) in the Copy of DataExport_2020 sheet. Do not remove them as part of your data cleaning process. Feel free to ignore them for now. You will come back to them later in the project.

INSTRUCTIONS

  1. Go to the Copy of DataExport_2020 sheet in your spreadsheet and examine the data.

    Use the following checklist to start your data cleaning process:

  2.  
  3. HINT
  1. Go through the Copy of DataExport_2020 sheet and clean the inconsistent data types.
  1. HINT
  1.  
  2. Clean the invalid values in the Copy of DataExport_2020 sheet. Ignore columns I and X.
    1. HINT
    1. Clean the duplicate rows in the Copy of DataExport_2020 sheet. HINT
    1. Clean the missing values in the Copy of DataExport_2020 sheet. Ignore columns J:Q.
      Based on information you gathered from David you are able to replace missing values, such as:
      ⎯⎯ If the Room Type is a “Private room” or “Shared room” replace the blank Bathrooms with 1, Beds with 1, or Bedrooms with 1.
      ⎯⎯ There are ~17 empty values in the Amenities column. If this is less than 10% of the data delete the rows with empty amenity values. Otherwise, replace accordingly.

 

  1. You want to get rid of the #DIV/0! error message in the Number of Reviews Per Day column. (You originally calculated this in DataExport_2020.) You know this error is happening because there are values being divided by 0. To remove the error, you want to adjust your formula to account for specific conditions.
    Go to the Number of Reviews Per Day (Clean) column in the Copy of DataExport_2020 sheet and adjust your formula to divide the Number of Reviews by the Days Listed, if the value for the days listed is greater than 0, otherwise, divide the Number of Reviews by 1.
  1. HINT
  1. Go to the Step 3 in the Prompts sheet and answer Questions 1. You will need to reference the Copy of DataExport_2020 sheet to answer these questions.

TIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you start noting them down now.

NOTE
All of your work for Step 3 should be in the Prompts (Step 3) and Copy of DataExport_2020 sheets.

STEP 4 – Clean Text Data I (Search and Count)

You know that amenities are a key factor for renters choosing between listings. This is a column you want to analyze in the future, but notice that right now it is not in a clean format.

Continue using the following checklist in your data cleaning process:

HINT

INSTRUCTIONS

  1. When pulling the data, there were some amenities that didn’t transfer correctly. Specifically, there are two non-descriptive amenities in the amenities columns. Find and replace the following amenities with “Other” in the Copy of DataExport_2020 sheet:
    ⎯⎯ “translation missing: en.hosting_amenity_49″
    ⎯⎯“translation missing: en.hosting_amenity_50”HINT
  2.  
  3. In the same sheet indicate whether each listing has the amenities listed below in its assigned column. Return True or False.
  1. Internet (column J)
  2. Kitchen (column K)
    1. 24-Hour Check-In (column L)
    1. Free Parking (column M)
    1. Private Entrance (column N)
    1. Smoking Allowed (column O)
  3. Other (column P)
    1. HINT
    1. Go to the Step 4 in the Prompts sheet and answer Question 1. You will need to reference the Copy of DataExport_2020 sheet to answer this question.HINT
    2.  
    3. TIP
      In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now.
  4. NOTE
    All of your work for Step 4 should be in the Prompts (Step 4), Copy of DataExport_2020, and Scratch Work sheets.

STEP 5 – Clean Text Data II (Split Text to Columns)

Continue using the following checklist in your data cleaning process:

INSTRUCTIONS

  1. Copy the Amenities column from the Copy of DataExport_2020 sheet and paste it in column A of the Scratch Work sheet.
  1. Split the text within the Amenities column into separate columns in the Scratch Work sheet.
  1. Calculate the Number of Amenities column in the Copy of DataExport_2020 sheet.HINT
  1. Go to the Step 5 in the Prompts sheet and answer Questions 1-2. You will need to reference the Copy of DataExport_2020 sheet to answer these questions. HINT
  2. TIP
    In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now. NOTE
    All of your work for Step 5 should be in the Prompts (Step 5), Copy of DataExport_2020, and Scratch Work sheets. STEP 6: Merge Tables While preparing your data your manager, Stevie sends another email message:

    STEP 6: Merge Tables While preparing your data your manager, Stevie sends another email message:


  3. Hi, great to meet you! I have attached the data below. Let me know if you have any questions. ? Click here to access Alexis’ data INSTRUCTIONS
    1. Manually copy the data into your current spreadsheet in the sheet called Restaurants Hotels. Update the formatting if necessary.
    1. Fill in the missing values in the Number of full-service restaurants column in the Restaurants Hotels sheet by manually looking up the data in city-data.com.HINT
    1. In the Merged City Data sheet paste the values from the Summary By City sheet (excluding the Grand Total row). Merge the Number of full-service restaurants and Fire-safe hotels and motels data with the data in the Restaurants Hotels sheet.HINT
  4. You’ll notice there is no data for New Orleans, LA. Alexis has been using a computer program to automatically pull data from city-data.com, but due to an error, the program is not able to collect data for New Orleans, Look up Number of full-service restaurants and Fire-safe hotels and motels data from city-data.com for New Orleans, LA, and manually enter it into the Merged City Data sheet.
  5. Go to the Step 6 in the Prompts sheet and answer Questions 1-2. You will need to reference the Merged City Data sheet to answer these questions.

TIP
In Step 7, Question 1 in your Prompts sheet you will have to document all the data cleaning steps you have completed in your analysis. It’s recommended you continue noting them down now.

NOTE
All of your work for Step 6 should be in the Prompts (Step 6), Restaurants Hotels, and Merged City Data sheets.

 

STEP 7: Summarize Data Wrangling

Suppose a teammate on the analytics team is onboarding on the project and asked about your data wrangling steps thus far. The goal is for your teammate to have a smooth onboarding process given the clear organization and in-depth detail of your response.

INSTRUCTIONS

  1. Go to the Step 7 in the Prompts sheet and answer Question 1. Use numbered steps to document your wrangling process.

NOTE
All of your work for Step 7 should be in the Prompts (Step 7) sheet.

That’s a wrap! ?

 

Hooray!!! ? You’ve completed Part 2 of the final project. That was a big lift, but very important work you will complete as an analyst.

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?