# Analysis for Continuous Improvement

Author Name: Sara Trochanowski

9-digit PID: 730489697

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. The course should have work that is intended to be done with a partner because it will help address questions/concerns for all students who are confused.
2. The course should have a pretest at the beginning of the course because it will indicate progress for all students.
3. The course should offer additional practice questions because it will ensure optimal understanding for all students.
4. The course should provide additional optional assignments because it will help all students who choose to do the assignments feel that they have mastered the material.
5. The course should offer free computers because it will encourage lower-income students to join comp 110 and learn to code.

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: The course should offer free computers because it will encourage lower-income students to join comp 110 and learn to code.

2. Suggestion for how to collect data to support this idea in the future: Survey more UNC students (outside of those already enrolled in COMP 110) to learn how many students are discouraged from taking the course due to a lack of sufficient technology.

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: The course should have work that is intended to be done with a partner because it will help address questions/concerns for all students who are confused. (Data: oh_effective)

2. This idea is more valuable than the others brainstormed because: I think this idea is most valuable because according to the survey results, most students find the office hour visits to be very effective (scores of 6-7 on average). This indicates that students feel that they benefit from being able to discuss their questions/concerns with others. Rather than a big change like permitting collaboration on exercises, I think that adding another aspect to the course that encourages students to collaborate in pairs or groups to complete a code (group project) would help all students learn together, minimizing confusion.


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [25]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [26]:
from data_utils import read_csv_rows, columnar, head, select, count

Next, we use the read_csv_rows function to read the rows of a csv into a table, and then the columnar function transforms the row-oriented table into a column-oriented table.

In [27]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"
data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)
data_cols: dict[str, list[str]] = columnar(data_rows)

data_cols

{'row': ['0',
  '1',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9',
  '10',
  '11',
  '12',
  '13',
  '14',
  '15',
  '16',
  '17',
  '18',
  '19',
  '20',
  '21',
  '22',
  '23',
  '24',
  '25',
  '26',
  '27',
  '28',
  '29',
  '30',
  '31',
  '32',
  '33',
  '34',
  '35',
  '36',
  '37',
  '38',
  '39',
  '40',
  '41',
  '42',
  '43',
  '44',
  '45',
  '46',
  '47',
  '48',
  '49',
  '50',
  '51',
  '52',
  '53',
  '54',
  '55',
  '56',
  '57',
  '58',
  '59',
  '60',
  '61',
  '62',
  '63',
  '64',
  '65',
  '66',
  '67',
  '68',
  '69',
  '70',
  '71',
  '72',
  '73',
  '74',
  '75',
  '76',
  '77',
  '78',
  '79',
  '80',
  '81',
  '82',
  '83',
  '84',
  '85',
  '86',
  '87',
  '88',
  '89',
  '90',
  '91',
  '92',
  '93',
  '94',
  '95',
  '96',
  '97',
  '98',
  '99',
  '100',
  '101',
  '102',
  '103',
  '104',
  '105',
  '106',
  '107',
  '108',
  '109',
  '110',
  '111',
  '112',
  '113',
  '114',
  '115',
  '116',
  '117',
  '118',
  '119',
  '120',
  '121',
  '12

Next, we use the head function to produce a new column-based table with only the first N rows of data.

In [28]:
data_cols_head: dict[str, list[str]] = head(data_cols, 10)
data_cols_head

{'row': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
 'year': ['22', '25', '25', '24', '25', '25', '25', '24', '25', '22'],
 'unc_status': ['Returning UNC Student',
  'Returning UNC Student',
  'Incoming First-year Student',
  'Returning UNC Student',
  'Incoming First-year Student',
  'Incoming First-year Student',
  'Incoming First-year Student',
  'Returning UNC Student',
  'Incoming First-year Student',
  'Returning UNC Student'],
 'comp_major': ['No',
  'No',
  'Yes - BA',
  'Yes - BS',
  'Yes - BA',
  'Yes - BS',
  'Yes - BA',
  'Yes - BA',
  'Yes - BS',
  'No'],
 'primary_major': ['Mathematics',
  'Mathematics',
  'Computer Science',
  'Computer Science',
  'Computer Science',
  'Computer Science',
  'Computer Science',
  'Neuroscience',
  'Computer Science',
  'Neuroscience'],
 'data_science': ['No',
  'Yes',
  'No',
  'Maybe',
  'No',
  'Maybe',
  'Yes',
  'No',
  'Yes',
  'No'],
 'prereqs': ['MATH 233, MATH 347, MATH 381',
  'MATH 130, MATH 231, STOR 155',
  'MATH 130,

Next, we use the select function to produce a new column-based table with a subset of the original columns.

In [29]:
selected_data: dict[str, list[str]] = select(data_cols, ["oh_effective"])
selected_data

{'oh_effective': ['',
  '7',
  '4',
  '5',
  '7',
  '',
  '4',
  '7',
  '7',
  '7',
  '7',
  '5',
  '',
  '',
  '6',
  '6',
  '7',
  '4',
  '6',
  '',
  '7',
  '5',
  '4',
  '7',
  '7',
  '3',
  '5',
  '7',
  '6',
  '7',
  '',
  '5',
  '5',
  '',
  '4',
  '4',
  '',
  '7',
  '7',
  '7',
  '',
  '7',
  '',
  '7',
  '6',
  '6',
  '',
  '5',
  '6',
  '4',
  '5',
  '',
  '6',
  '4',
  '4',
  '7',
  '6',
  '',
  '',
  '7',
  '7',
  '',
  '7',
  '4',
  '',
  '6',
  '',
  '6',
  '',
  '6',
  '7',
  '5',
  '4',
  '3',
  '',
  '',
  '7',
  '7',
  '7',
  '7',
  '3',
  '',
  '',
  '7',
  '',
  '6',
  '4',
  '4',
  '',
  '',
  '6',
  '7',
  '7',
  '1',
  '4',
  '6',
  '',
  '7',
  '7',
  '4',
  '6',
  '5',
  '',
  '7',
  '7',
  '',
  '7',
  '7',
  '6',
  '',
  '',
  '7',
  '6',
  '4',
  '',
  '5',
  '',
  '4',
  '',
  '4',
  '7',
  '5',
  '',
  '',
  '7',
  '7',
  '7',
  '6',
  '6',
  '7',
  '7',
  '',
  '',
  '7',
  '',
  '',
  '',
  '7',
  '5',
  '',
  '4',
  '',
  '',
  '6',
  '6',
  '7',
  '7'

Next, we use the count function to produce a new column-based table with two column-based tables.

In [30]:
oh_effective: dict[str, int] = count(selected_data["oh_effective"])
print(f"oh_effective: {oh_effective}")

oh_effective: {'': 191, '7': 182, '4': 68, '5': 59, '6': 101, '3': 14, '1': 2, '2': 3}


I defined a helper function that would find the mean of the oh_effective responses to determine how effective office visits and collaborative work have been for students on average.

In [32]:
from data_utils import ar
ar(selected_data, 620)

4.02741935483871

To better visualize the relevant data being analyzed, I have created a table, where "O" represents the data "oh_effective" from the survey. This table lists the data being referred to in the first column and the mean results from the survey in the second column. If the means of the other data (such as oh_visits) were to be collected, they could be added in the row below in the table.

In [45]:
from tabulate import tabulate

str([ar(selected_data, 620)])
average: dict[str, str] = {"Data": "O", "Mean Results": [ar(selected_data, 620)]}

tabulate(average, average, "html")

Data,Mean Results
O,4.02742


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion

My analysis of the data supports my idea that the course should have work that is intended to be done with a partner because it will help address questions/concerns for all students who are confused. In this data, a rating of 7 indicates that the office hour visits were considered to be very effective, while a ration of 1 considers them to be ineffective. It appears that most students find the office hour visits to be very effective (182 students rated them a 7 and 101 students rated them a 6) while very few students would consider the office hours to be not effective (2 students rated them a 1 and 3 students rated them a 3). The mean score is a 4.03. The majority of students find the collaborative aspect offered by office hours to be more beneficial than not, so it is likely that they would also find the collaborative aspect of a partner assignment to be beneficial.

To further explore this idea, it would be interesting to analyze how often students go to office hours. Looking at how effective office hours are compared to how often students actually attend office hours could show whether students are taking advantage of the beneficial options provided for them. If few students attend office hours, making required partner projects could help ensure that the students receive the benefits of collaborative work. 

However, it is also important to question whether adding this component would increase the workload too much for students. I think that by replacing one or two of the current exercises with partner projects would ensure that students still do enough individual work and also get to benefit from some partner work where they can openly discuss their questions/concerns with others. 

This analysis is worth investigating even more to help make appropriate adjustments to the exercises of COMP 110 to better benefit the students. 