Homework 1: Data Analysis

Due January 29th at 11:59 PM

Topics: CSV File Parsing, Data Cleaning, Destructuring, Array Methods, Control Flow, Template Literals, Data Analysis/Aggregation

This homework will help you get acquainted with using JavaScript syntax and developing moderately complex JavaScript projects. For this assignment, you will be analysing a dataset of multilingual mobile app reviews, presented to you as a CSV file. You will be writing code to parse, clean, prepare, and analyze this data to extract meaningful insights. This is a classic exercise in data wrangling and preprocessing!

Your analysis of this dataset will mostly be exploratory, within the bounds of what you've learned in JavaScript so far. We will not expect analysis on the level of a Big Data class. However, you may find that you can do quite a lot of interesting anaylsis already with the knowledge of JavaScript you've learned so far!

Assignment Goals

Strengthen your ability to work with objects, arrays, and various data types in real-world scenarios
Gain experience using modern ES6 features such as destructuring, template literals, and advanced function syntax for arrays
Practice designing and implementing data transformation, filtering, and analysis logic using loops, conditions, and functions
Learn to use various built-in and external Node dependencies (fs and papaparse)

Introduction & Installation

Accept assignment on Github Classroom

Imagine you are a data analyst for a mobile app developer, and you are tasked with performing analysis on app reviews of various popular apps from users across multiple languages. You have been asked to perform this analysis in JavaScript, and fortunately you have just recently learned the basics of JavaScript, enough to parse, clean, and analyze this data!

This assignment will be written in Node.js, a server-side implementation of JavaScript. Through code editors and plain text files, you will be able to run script files that perform the data analysis without the need of a browser to run JavaScript. Much like Java's compiler or Python's interpreter, Node will handle the compilation and interpretation of the code for you.

Before you start this homework, make sure you have already installed Node.js and a code editor like VSCode by following the instructions in the JS Development Guide.

Then, you should download the starter files from the top of this section (or from the card on this site's homepage), which contains an AI Synthesis template, a package.json file, the prettier and eslint config files, and an src directory with some skeleton code with TODO items and the CSV file for the dataset. Once you have the starter files, navigate to the starter code directory (outside of the src directory) within a terminal (either using the one integrated into your code editor or the terminal included with your OS), and run the command:

npm install

This command will be something you will commonly run at the start of every JavaScript project, in order to install the dependencies you need for the project. These dependencies are defined in the package.json file, and the command downloads all of the dependencies from npmjs.com. For this homework, we will be using the papaparse library to parse CSV files. Upon running this command, you should see a new directory called node_modules and a package-lock.json file have been created. The directory houses all the dependencies, while the the json file records installed dependencies. We will discuss project structure and dependency management in a later lecture, so for now leave both of these files alone for now!

To run this homework, you can enter the command:

node src/main.js

Please note: If you used AI for any part of this assignment, save all the chat logs and context! Your instructors will want to see this usage documented within the README for this assignment.

Instructions

Step 1: Parse the Data

Files: analysis.js

Your first task will be to utilize two dependencies, the built-in fs, and papaparse, which you installed earlier, in order to turn the CSV data into an object we can work with in JavaScript. Here are the specifications and tips for using each of these dependencies:

fs, which stands for File System, is a built-in module for Node.js
It allows your program to interact with the file system on your computer.
After importing the module, you can call various methods, defined in its documentation.
For our purposes, use fs.readFileSync(path[, options]), which is a simple synchronous option to read files. Here's an example of how you may import fs and use this method:

const fs = require('fs');
const data = 
  fs.readFileSync('./file.txt', 'utf8');

papaparse

papaparse is a simple CSV parser that parses CSV files into JSON/JavaScript objects with minimal code. You should read the opening page of documentation, as it contains all you need for this assignment.
It's important to note that papaparse parses everything into a string, this means even numbers and booleans. While papaparse has options to automatically convert types, we recommend against doing so, so that you can get practice with JavaScripts normal type conversion in the data cleaning part of the assignment.
Similar to fs above, you must import papaparse in order to use it. Here's an example:

const Papa = require('papaparse');
const csv = Papa.parse(some_data);

Look at the documentation to see what other options you might need to specify given this dataset. Hint: there's may be a few important options you will need to specify!

At the end of this section, you should have your data stored an object in a variable.

Step 2: Data Cleaning

Files: analysis.js

Now that you have your csv data parsed, you will want to destructure and clean your data in preparation for analysis. Data you recieve from large datasets may not always be clean- in fact, this dataset has many null values, including null review texts, ratings, countries, genders, and app versions.

As mentioned earlier, papaparse converted all the data into strings. You will be converting all of the column values into their proper types. Additionally, you will restructure all the user properties into its own object property for each review.

Your data cleaning goals are as follows:

Filter out every record with null column values, except user_gender, as a null gender value is allowed.
Merge all the user statistics, including user_id, user_age, user_country, and user_gender, into an object that holds them called user, while removing the original properties.
Convert the review_id, user_id, num_helpful_votes, and user_age to Integer
Convert the rating into a Float
Convert review_date into a Date
Convert verified_purchase to Boolean

Here is an example of one of the cleaned, filtered records from the dataset:

{
  review_id: 99,
  app_name: 'Grammarly',
  app_category: 'Music & Audio',
  review_text: 'Stupido malattia donna 
    magari già posare sbagliare qualità. 
    Tempo vino morale sviluppo 
    ora popolazione avvicinare.',
  review_language: 'fi',
  rating: 2.3,
  review_date: 2024-06-28T22:59:58.000Z,
  verified_purchase: false,
  device_type: 'Windows Phone',
  num_helpful_votes: 1090,
  app_version: '6.9.40-beta'
  user: {
    user_age: 44,
    user_country: 'Vietnam',
    user_gender: '',
    user_id: 9262579,
  }
}

At the end of this section, you should have an object that contained your filtered and cleaned data. Make sure your properties exactly match the property names from the original dataset and have the correct data types stored in them.

Step 3: Sentiment Analysis

Files: analysis.js

With newly cleaned data, we are now ready to start analysing it! In this step, we will add a property to each record called sentiment that represents a general label of the rating each user granted to apps. We will also destructure some parts of the reviews for analysis later on.

Write a function called labelSentiment() that takes in a rating as an argument and returns strings based on that rating:

positive if the rating is greater than 4.0.
neutral if the rating is between 2.0 and 3.0.
negative if the rating is below 2.0

Next, we will sort the sentiments by app, and then by language, into arrays of objects. Do these analyses in the functions sentimentAnalysisApp() and sentimentAnalysisLang() respectively. Be sure to follow the proper object format for returning arrays of these statistics provided in the function documentation.

For this section, you may find destructuring your data into other objects to be helpful in sorting your data for analysis and printing. Refer to the lecture slides for how you can destructuring and other syntactical techniques for this section!

Step 4: Summary Statistics

Files: analysis.js

To wrap up our analysis, let us look at some basic summary statistics. Using the cleaned data and any objects you have created from the previous section, answer the following statistical analysis questions:

What is the most reviewed app in this dataset, and how many reviews does it have?
For the most reviewed app, what is the most commonly used device?
For the most reviewed app, what the average star rating (out of 5.0)?

Create an object to store the answers to these questions and return them from the function summaryStatistics(). Use the object format provided in the homework documentation.

Submission

README

Answer the provided reflection questions within the starter code README file. In this reflection, you will also indicite whether or not you used AI, and also document your usage of AI as well. Please don't forget this step, as it is important feedback for the homework and the content of the course!

Submission

To submit, simply push your commits to the repository generated from GitHub classroom. Make sure your latest commit before the deadline includes your completed analysis.js file and your README.md file. Before you submit, make sure you lint your code for style errors using the command npm run lint. More details on style can be found in the style guide. We will take -1 points for every style error remaining in the submission for the submitted files. You may also test your code against our provided Mocha test suite, which should be the same as our autograder for this assignment, using npm test.