The Influence of Tech Work Environments on Mental Health

Authors: Pearl Hwang, Xue Qiu, Ronak Thakur, Tony Yao

Introduction

Does the work environment of tech companies influence how mental health impacts an employee's work? What kind of mental health resources do tech companies provide that may alleviate the mental health impacts on an employee's work?

These are just a sample of the questions that our team aimed to answer using a 2014 Survey on Mental Health in the Tech Workplace (courtesy of Open Sourcing Mental Illness). The following notebook includes our team's full exploratory data analysis on the 2014 survey data and is comprised of multiple regression models, statistical prediction tests, and a variety of graphs to best represent certain relations between variables.

The goal of this analysis is to clearly show the advantages that employers are losing if they are not adequately acknowledging mental health in their workplace. We believe that seeing these advantages in clear, numerical values will result in more companies in the technology field to grow and recognize the importance of accommodating for mental health issues. All of these goals come together in our ultimate hope that one day, all technology workplaces will grow to be a safe and inclusive place for everyone.

The following notebook is divided into four main sections:
      1. Data Overview
      2. File Initialization
      3. Data Tidying
      4. Data Analysis & Visualization
      5. Conclusion & Moving Forward

Data Overview

We will be using the dataset, "Mental Health In Tech Survey", from https://www.kaggle.com/osmi/mental-health-in-tech-survey. The attributes from the dataset are listed below:

Initialization of Files

Tidying & Cleaning the Data

We see that there are multiple NaN values in our dataset that refer to missing data. We decided to resolve missing data by getting rid of unimportant rows and columns that are not involved with our topic of "Influence of Tech Work Environments on Mental Health." Specifically:

  1. Dropped every row where "self_employed" was answered as "No" because for our analysis, we are focusing on the environment of the Tech Workspace as opposed to those that are self-employed. Therefore, our dataframe consists of only rows where "self_employed" was answered "Yes". Because every row has the same response for "self_employed" (which is "Yes"), we got dropped the "self_employed" column as well.
  2. Dropped every row where "Country" was not answered as "United States" because we wanted to focus our analysis in one country. Another reason we focused on the United States was because the original dataset consisted of data from primarily the United States, as opposed to the other countries. Because every row has the same response for "Country"(which is "United States"), we dropped the "Country" column as well.
  3. Dropped the "comments" column because this is not a significant attribute.
  4. Dropped the "phys_health_interview" because this is not a significant attribute. We are not concerned about information related to the job interview.
  5. Dropped the "mental_health_interview" because this is not a significant attribute. We are not concerned about information related to the job interview.
  6. Dropped the "state" because this is not a significant attribute since we are purely looking at work environments and mental health.
  7. Dropped the "Timestamp" because this attribute has no significance to work environments and mental health, this is merely when the data was retrieved.

Next, we will be giving dummy values (0, 1, 2...) for our categorical responses for each of the attributes to more easily work with the data in terms of finding correlations.

We will be creating a column in the table that tells us whether or not a person has a mental health condition. From the original dataset, the "work_interfere" attribute is worded such that those without mental condition would answer NA. Therefore, it allows us to categorize those that answered with NA as not having a mental health condition, whereas those that answered with "Often", "Sometimes", "Rarely", or "Never", have a mental health condition.

Below is the resulting dataframe after cleaning and tidying the data.

Data Analysis and Visualization

From the heatmap above, the darker reds/organge show a higher correlation between two categories while lighter colors show a lower correlation between two categories. Looking at the numbers, the numbers closer to -1 and 1 show higher correlations as well.

From this heatmap, it appears that the attributes "benefits", "care_options", "wellness_program", "seek_help" all have a relatively high correlation with each other. This correlation may be a result that being aware about one of: the mental health benefits provided by employers, the options for mental health care provided by employers, the mental health wellness program, or the resources to learn more about mental health issues, could correlate with being aware about the others.

Now, we would like to focus on the portion of the dataset on employees that do have a mental health condition, and how different variables in the tech workplace environment that may be influencing their mental health condition.

For the following data analysis below, we'll be using a significance level of 0.10.

One question respondents were asked was "Does your employer provide mental health benefits?" and the answer choices were "Yes" and "No". This focus is on those with mental health conditions and the effect of mental health benefits on their work. We decided to split the data into 2 groups, those who had access to mental health benefits (Categorized as "Yes") and those who don't or didn't know (Categorized as "No).

After forming these 2 groups, we analyzed their work inteference for which the possible answers included "never", "rarely", "sometimes", and "often". By assigning 0,1,2,and 3 to the respective answers, we were able to calculate the average work interference for those who receive benefits and those who don't.

The null hypothesis: 𝐻𝑜 = There is no relationship between mental health benefits and work interference on mental health.

The alternate hypothesis: 𝐻𝑎 = There is a relationship between mental health benefits and work interference on mental health.

From the T-test above, we can see that our p-value, 0.7514 > 0.10, which means that we fail to reject the null hypothesis. Having access to mental health benefits does not have a statistically significant impact on improving mental health in the tech industry.

Another question respondents were asked was "Do you know the options for mental health care your employer provides?" and the answer choices were "Yes" and "No". This focus is on those with mental health conditions and the effect of mental health care on their work. We decided to split the data into 2 groups, those who had access to mental health care (Categorized as "Yes") and those who don't or didn't know (Categorized as "No).

After forming these 2 groups, we analyzed their work interference for which the possible answers included "never", "rarely", "sometimes", and "often". By assigning 0, 1, 2, and 3 to the respective answers, we were able to calculate the average work interference for those who have mental health care options, and those who don't.

The null hypothesis: 𝐻𝑜 = There is no relationship between having mental health care options and work interference on mental health.

The alternate hypothesis: 𝐻𝑎 = There is a relationship between having mental health care options and work interference on mental health.

From the T-test above, we can see that our p-value, 0.0860 < 0.10, which means that we reject the null hypothesis. Having access to mental health care options does have a statistically significant impact on improving mental health in the tech industry.

Another question respondents were asked was "Has your employer ever discussed mental health as part of an employee wellness program?" and the answer choices were "Yes" and "No". This focus is on those with mental health conditions and the effect of mental health wellness programs on their work. We decided to split the data into 2 groups, those who had access to mental health wellness programs (Categorized as "Yes") and those who don't or didn't know (Categorized as "No).

After forming these 2 groups, we analyzed their work interference for which the possible answers included "never", "rarely", "sometimes", and "often". By assigning 0, 1, 2, and 3 to the respective answers, we were able to calculate the average work interference for those who had access to mental health wellness programs and those who don't.

The null hypothesis: 𝐻𝑜 = There is no relationship between having mental health wellness programs and work interference on mental health.

The alternate hypothesis: 𝐻𝑎 = There is a relationship between having mental health wellness programs and work interference on mental health.

From the T-test above, we can see that our p-value, 0.0587 < 0.10, which means that we reject the null hypothesis. Having access to mental health wellness programs does have a statistically significant impact on improving mental health in the tech industry.

Another question respondents were asked was "Does your employer provide resources to learn more about mental health issues and how to seek help?" and the answer choices were "Yes" and "No". This focus is on those with mental health conditions and the effect of having resources to learn more about mental health and seek help on their work. We decided to split the data into 2 groups, those who had access to these mental health resources (Categorized as "Yes") and those who don't or didn't know (Categorized as "No).

After forming these 2 groups, we analyzed their work interference for which the possible answers included "never", "rarely", "sometimes", and "often". By assigning 0, 1, 2, and 3 to the respective answers, we were able to calculate the average work interference for those who had access to resources to learn more about mental health and seek help and those who don't.

The null hypothesis: 𝐻𝑜 = There is no relationship between having resources to learn more about mental health and seek help versus work interference on mental health.

The alternate hypothesis: 𝐻𝑎 = There is a relationship between having resources to learn more about mental health and seek help versus work interference on mental health.

From the T-test above, we can see that our p-value, 0.1290 > 0.10, which means that we fail to reject the null hypothesis. Having access to resources to learn more about mental health and seek help does not have a statistically significant impact on improving mental health in the tech industry.

Conclusion and Moving Forward

In conclusion, from the t-tests above, employees with mental health condition at companies that offer wellness programs and mental care options experience less interference from their condition during work.

Moving forward, we can do even further analysis with this data. Previously, we only looked at some attributes, however, it's important to look at every attribute that may influence mental health interference at their tech work environments.

Below, we can see the logistic regression model and confusion matrix for predictors "wellness_program" and "care_option" with response "work_interfere":

Looking above, the logistic regression equation on "wellness_program" and "care_option" with response "work_interfere" creates a model that may not be optimal for predicting how much mental health interferes with one's work. This is because our confusion matrix results in a very low accuracy. This low accuracy and non-optimal model could be due to not including other attributes that may affect how much mental health interferes with one's work.

Moving forward, we would definitely take a closer look at more attributes and possibly more datasets that could make our logistic model better, with increased accuracy in predictions of how much mental health interferes with one's work depending on the tech work environment.