• Sentiment analysis
    Entity analysis
    Syntax analysis
    Text classification
    NLP Bayes & TensorFlow
  • DATA VIZ
    General report
  • eBurnout
    Detect burnout syndrome

Introduction

The following Master's Thesis presents the text analysis and search of hidden patterns on the anonymized data of the eBurnout application (www.eburnout.com) belonging to the research project of the European University called "Applications based on IoT and Big Data in the hospital environment", corresponding to the psychiatric and emergency medical personnel of the Infanta Sofía (Alcobendas) and Son Llàtzer (Palma de Mallorca) hospitals in Spain. This study began on May 31, 2018 and ended on September 28, 2018 at the presentation of this Thesis.

The main objective of the study is to discover the information contained in the analyzed data. For this purpose, the following analytical tools have been used: R, Python and Tableau helped by the scalable and high performance infrastructure Google Cloud Platform (GCP).

Finally, it should be noted that the eBurnout application has been correctly adapted to the EU General Data Protection Regulation (GDPR) of May 25, 2018, guaranteeing the following:

100% anonymous

The user's privacy is guaranteed when entering their data.

100% safely

Security is guaranteed when handling sensitive information.

Informed consent

All information is documented in the informed consent.



More information Presentation video

Participation in the study

Psychiatry
Emergency
Others




Exploratory text analysis

The whole process of exploring the Dataset data has been done using Google Cloud Platform from its ETL processes to visualisation.

uemus5
uemus13
uemus12
hs8
uemus8
uemus15
hs10
uemus9
  • Users with wristband
  • Users without wristband
Length reviews

Length reviews (All users)

Visualization of the length of the opinions expressed (y = frequency, x = total of opinions). 0 means that there are many users who have not commented.

Quantity of words (1)

Quantity of words (Us. with wristband)

Visualization of the number of generic words of users with Fitbit wristband (both hospitals).

Quantity of words (2)

Number of words (Us. without wristband)

Display of the number of generic words of users without Fitbit wristband (both hospitals).

Words-comment (1)

Words-comment (All users)

Display of the number of words per comment of all users (both hospitals).

Words-comment (2)

Words-comment (Us. with wristband)

Display of the number of words per comment of users with Fitbit wristband (both hospitals).

Words-comment (3)

Words-comment (Us. without wristband)

Display of the number of words per comment of users without Fitbit wristband (both hospitals).

Word Cloud (1)

Word Cloud (All users)

Visualization of the cloud of words most used by all users (both hospitals).

Word Cloud (2)

Word Cloud (Us. with wristband)

Cloud visualization of the words most used by users with Fitbit wristband (both hospitals).

Word Cloud (3)

Word Cloud (Us. without wristband)

Cloud visualization of the words most used by users without Fitbit wristband (both hospitals).

Sentiment analysis

The analysis of the feeling of the two hospitals was done using NLTK and Google Cloud Natural Language API.

Words feeling (Son Llàtzer hospital) - NLTK

Words feeling (Son Llàtzer hospital) - NLTK

Visualization of the feeling of the words used in the Son Llàtzer hospital.

Words feeling (Infanta Sofía hospital) - NLTK

Words feeling (Infanta Sofía hospital) - NLTK

Visualization of the feeling of the words used in the Infanta Sofía hospital.

Words feeling (ALL) - Google Cloud Natural Language API

Words feeling (ALL) - GCNL API

Visualization of the Score and Magnitude of the feeling of the words of both hospitals and separated.




Analysis of entities, syntax and text classification

The analysis of entities, syntax and text classification has been done using the Google Cloud Natural Language API.

Text classification (both hospitals) - Google Cloud Natural Language API

Text classification (both hospitals) - GCNL API

Visualization of the Name and Confidence of the text classification of both hospitals.

Text categorization (Naive Bayes & TensorFlow)

Prediction of burnout in a user based on an opinion using Naive Bayes and TensorFlow.

Naive Bayes NLP real vs predicted value

Naive Bayes NLP

Result of executing NLP Bayes of the predictions obtained (real vs predicted value).

Naive Bayes NLP predict whether an user has burnout from a text or not

Naive Bayes NLP

Result of executing NLP Bayes to predict whether an user has burnout from a text (label_test) or not.

TensorFlow's NLP result

TensorFlow's NLP

Result of executing NLP TensorFlow to predict whether an user has burnout from a text or not.

Confusion Matrix TensorFlow's NLP

Confusion Matrix TensorFlow's NLP

Matrix of confusion with the total of successes and failures in the predictions of Burnout (TensorFlow's NLP).




Other visualizations (hidden patterns)

Visualizations to find hidden patterns and compare them with text analysis using the Seaborn Python library, the R GGPlot library adapted in Python and the Tableau tool.

  • Height, weight and working life in eBurnout

    Height, weight and working life (both hospitals).

  • Exercise vs burnout in eBurnout

    Exercise vs burnout (both hospitals).

  • Exercise vs sleep efficiency in eBurnout

    Exercise vs sleep efficiency (both hospitals).

  • Burnout in months (Son Llàtzer hospital) in eBurnout

    Burnout in months (Son Llàtzer hospital).

  • Burnout in months (Infanta Sofía hospital) in eBurnout

    Burnout in months (Infanta Sofía hospital).

  • Burnout vs sleep (Son Llàtzer hospital) in eBurnout

    Burnout vs sleep (Son Llàtzer hospital).

  • Burnout vs sleep (Infanta Sofía hospital) in eBurnout

    Burnout vs sleep (Infanta Sofía hospital).

  • Burnout vs heartbeat (Son Llàtzer hospital) in eBurnout

    Burnout vs heartbeat (Son Llàtzer hospital).

  • Burnout vs heartbeat (Infanta Sofía hospital) in eBurnout

    Burnout vs heartbeat (Infanta Sofía hospital).

  • Burnout vs temperature (Son Llàtzer hospital) in eBurnout

    Burnout vs temperature (Son Llàtzer hospital).

  • Burnout vs temperature (Infanta Sofía hospital) in eBurnout

    Burnout vs temperature (Infanta Sofía hospital).

  • Weight and height vs burnout (both hospitals) in eBurnout

    Weight and height vs burnout (both hospitals).

  • Total Burnout (both hospitals) in eBurnout

    Total Burnout (both hospitals).

  • Total Burnout (separate hospitals) in eBurnout

    Total Burnout (separate hospitals).

  • Burnout level (separate hospitals) in eBurnout

    Burnout level (separate hospitals).

  • Burnout Madrid vs Mallorca in eBurnout

    Burnout Madrid vs Mallorca.

  • No Burnout Madrid vs Mallorca in eBurnout

    No Burnout Madrid vs Mallorca.

Patterns detected and General Report

Detection of hidden patterns compared to text analysis and general report.

  • 1. Participation in the field of free text by medical personnel is higher in the Infanta Sofía hospital than in the Son Llàtzer hospital and, in this first one, there is more burnout.
  • 2. The number of negative opinions in the Son Llàtzer hospital is lower than the Infanta Sofía hospital and it has been detected that the weather can be decisive in this. Also the proportion of positive opinions is higher in this first.
  • 3. The efficiency of sleep and heartbeat do not influence the burnout syndrome but they do influence the text written by the medical staff. This can be contrasted with the Word Cloud.
  • 4. Users with Fitbit wristband have had a higher degree of participation in the study writing text than those who do not.
  • 5. The variables weight and height do influence the burnout syndrome and it has been detected that its also influence the participation of text. .
  • 6. The percentage of participation in the free text as well as the level of burnout in both hospitals is lower in the month of August than in the rest of the months (possibly due to the vacation period of the medical staff). The critical month in the entire study is the month of July.
  • 7. The number of words per comment of users with wristband is pretty similar to users without a wristband. A pattern that can explain this in the field of visualization has not been detected.

GitHub of the repository eBurnout Text Analysis

Work authorship

Resources used

Frameworks

  • Ionic
  • AngularJS

Programming languages

  • TS,JS,HTML5 y CSS3
  • NoSQL
  • Python
  • Shell Script

Databases

  • Firebase
  • Google Cloud Storage

Version control

  • SourceTree
  • Trello

Data analysis

  • Google Cloud Platform

Visualisation

  • Seaborn
  • GGPlot
  • Tableau

Obtaining the data

Infanta Sofia hospital
Son Llatzer hospital

Download application

The eBurnout application can be downloaded from Google Play Store (Android) and App Store (iOS).

App Store
Google Play Store