Article Text

Download PDFPDF

Mapping loneliness through social intelligence analysis: a step towards creating global loneliness map
  1. Hurmat Ali Shah and
  2. Mowafa Househ
  1. Hamad Bin Khalifa University, College of Science and Engineering, Doha, Ad-Dawhah, Qatar
  1. Correspondence to Mowafa Househ; Mhouseh{at}


Objectives Loneliness is a prevalent global public health concern with complex dynamics requiring further exploration. This study aims to enhance understanding of loneliness dynamics through building towards a global loneliness map using social intelligence analysis.

Settings and design This paper presents a proof of concept for the global loneliness map, using data collected in October 2022. Twitter posts containing keywords such as ‘lonely’, ‘loneliness’, ‘alone’, ‘solitude’ and ‘isolation’ were gathered, resulting in 841 796 tweets from the USA. City-specific data were extracted from these tweets to construct a loneliness map for the country. Sentiment analysis using the valence aware dictionary for sentiment reasoning tool was employed to differentiate metaphorical expressions from meaningful correlations between loneliness and socioeconomic and emotional factors.

Measures and results The sentiment analysis encompassed the USA dataset and city-wise subsets, identifying negative sentiment tweets. Psychosocial linguistic features of these negative tweets were analysed to reveal significant connections between loneliness, socioeconomic aspects and emotional themes. Word clouds depicted topic variations between positively and negatively toned tweets. A frequency list of correlated topics within broader socioeconomic and emotional categories was generated from negative sentiment tweets. Additionally, a comprehensive table displayed top correlated topics for each city.

Conclusions Leveraging social media data provide insights into the multifaceted nature of loneliness. Given its subjectivity, loneliness experiences exhibit variability. This study serves as a proof of concept for an extensive global loneliness map, holding implications for global public health strategies and policy development. Understanding loneliness dynamics on a larger scale can facilitate targeted interventions and support.

  • public health informatics
  • machine learning
  • social media

Data availability statement

Data are available on reasonable request.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Social media data are used to track mental health conditions and to gain insights into complex social and public health issues.


  • This paper uses social media data to understand the complex issue of loneliness which is explored in detail in social sciences but understanding it from with the help of data is lacking in literature. With the help of natural language processing tools, we analysed tweets to look for associations of socioeconomic and personal-emotional categories which are highly occurring with the mention of loneliness.


  • This study gives insight into the dynamic and varying nature of loneliness across geographical areas. This fact can be useful for policymakers in designing specific interventions to counter loneliness by understanding loneliness in a geographical area better.


Loneliness is a global public health issue. Loneliness not only affects the quality of life but also leads to other mental health issues thus burdening the public health service system. Every year 162 000 Americans die from loneliness and social isolation.1 Every forty seconds someone, around the world, commits suicide while loneliness is shown to be a direct cause of suicide.2 Loneliness is shown to be associated with high risk for multiple health conditions such as physical and mental health, dementia and early mortality.3 4 Moreover, loneliness has been shown to increase the risk of death by 26%.4 Loneliness is also associated with additional cost to the healthcare infrastructure. For instance, in the USA, an additional US$6.7 billion are spent in expenses because of loneliness.5 Similarly, in terms of costs, loneliness costs US$154 billion to employers in terms of absenteeism and loss in productivity.5

Loneliness must be understood separately from the interlinked concept of isolation. Loneliness is the subjective perception of an individual’s actual and desired social connections and relationships. While social isolation on the other hand is an objective phenomenon of lack of social connections, be that with immediate family or larger community. The route to loneliness can vary from one person to the other. The relation between loneliness and social isolation with the determinant factors is complex and often bidirectional. For some people, loneliness is a prevalent state of mind. This can be the result of genetic influence or early adversity. Depression and social anxiety may lead some others to be lonely. While for some people, it may be the result of trauma and internalised stigma. These factors as well as others such as old age, economic status and negative self-image may contribute to loneliness.6 7 While transient loneliness can result in emotional distress, it is commonplace and can be overcome. But loneliness can become chronic and permanent because of lack of consistent and constant social connectedness, thus altering neurobiological and behavioural patterns and mechanisms.8

There are several intervention strategies for fighting off loneliness. These strategies are meant to mitigate the long-term mental health effects of loneliness. There have been technological interventions ranging from using social media for connectivity to videoconferencing and community-oriented interventions. The effectiveness of technology-based interventions to fight off loneliness has been studied in the literature. Authors in Döring et al’s study9 showed that communication-based technologies can reduce loneliness and isolation in older people. It was reported by Choi and Lee10 that older people using social platforms changed their behaviour through use of multifaceted technology platforms. These platforms enable social participation, cognition, nutrition and physical activity.

The recent trend in technology is towards the use of artificial intelligence (AI)-based conversational agents and chatbots. Xie and Pentina11 found out through survey of patients already using a chatbot that patients form an emotional attachment with the chatbots if the patients perceive the chatbots’ response to offer emotional support. The role of chatbots in interventions for mental health was studied by Boucher et al12 and the potential challenges were discussed. Similarly, Abd-Alrazaq et al13 found that patients have overall feeling of satisfaction with the use of chatbot through a systematic review. However, Manis and Matis14 also pointed out the benefits of technology and chatbots particularly in terms of long-term isolation. AI-based chatbots, thus, can counter loneliness given they are complemented with other interventions.

As mentioned, digital technology interventions are shown to help in reducing the feeling of loneliness, there is a need to understand the prevalence of loneliness to devise such technology-based and community-oriented strategies. This can be understood through a loneliness map. Health informatics is applied to the area of digital health and study of loneliness through various studies. There are other studies which use social media data to gain a detailed insight into the problem of loneliness.9 Building on the tools of health informatics and social media analysis of mental health, digital health and loneliness a detailed global map of loneliness can act as a guideline and as the foundational grounding for intervention strategies. Loneliness is a big burden on global public health spending, global loss of accumulated number of days of work as well as affecting the quality of life. What we need more in understanding of loneliness is from the health informatics perspective. The map, a part of which this paper will develop, will be our first towards loneliness informatics.

Through the global loneliness map, the approach is to explore the relationship between loneliness and mental health issues. This map can be used to zoom in on a country where the relationship of loneliness with negative sentiment is higher to derive further analysis. We will also provide a correlation of linguistic features representing respective personal and social categories, such as relationships, sleep habits and emotional dysregulation for different categories to show how these can vary across countries. This can be helpful in recognising and understanding the nature of association of loneliness with negative sentiments in different categories. The loneliness map will monitor the relationship of loneliness to mental health issues across the globe by analysing the data collected through ML and AI tools. The surveillance data on the relationship between loneliness and mental health issues can be used to design policy programmes to build a community of support.

This paper presents a proof of concept for such a global loneliness map. Developing the loneliness map which is exhaustive and backed by rigorous evidence is a time and resource intensive project. This paper presents the first step towards it. The remaining parts of the map, that is, using multiple data sources and analysing different regions and countries exhaustively will be carried out stepwise. Rather than using multiple sources of data we first focus on Twitter because the data it provides is diverse as well as from a limited dataset multiple insights can be gained as the users have to express themselves in limited characters. Moreover, we start with the USA. We collected data mentioning keywords associated with loneliness and found out that the data returned by the Twitter algorithm has more tweets from the USA. We collected global data on loneliness as we wanted a snapshot into loneliness rather than exhaustive analysis of one country. We retrieved the US cities which have more than 10 000 tweets each related to loneliness.

To develop the first part of loneliness map, we used sentiment analysis of Twitter data through natural a language processing tool. This is based on psycholinguistic model of understanding mental health issues. The collected tweets are stored in a database and then sentiment analysis using valence aware dictionary for sentiment reasoning (VADER)15 tool from the natural language toolkit (NLTK) is carried out. VADER is lexicon and rule-based model for sentiment analysis. The lexicon-based approach means that the algorithm is constructed using a dictionary which contains a detailed list of sentiment features. In addition, VADER also complements the lexicon-based dictionary with grammatical rules which are heuristic in nature. These rules complement the lexicon-based sentiment analysis to determine polarity of the sentiment. The result of the sentiment analysis tool gives us an indication of loneliness in the particular dataset.

Literature review

Understanding loneliness theoretically and its relation to mental health has been the subject of several studies such as.10 16 17 From the health informatics side, there also have been studies which deal with the application of technology-based intervention to cope with loneliness such as.9 Loneliness is shown by these studies to be associated with increased risk of mental health issues. Interventions for loneliness which are based either on technology or through building community were shown to be effective in reducing the negative effects of loneliness.

Technology is used to fill in the gap created by lack of access to a healthcare professional or service. Byrne et al18 carried out a scoping review of reviews to study the effectiveness of communication technologies to reduce the feeling of loneliness in older people. The study concluded that communication-based technologies do in-fact reduce feeling of loneliness in older people. Similarly, Hards et al19 studied through a systematic review and meta-analysis of digital technologies-based intervention to reduce loneliness in older adults. The study analysed 6 articles finally with 646 participants reported in combined. The study showed no statistical difference between the effectiveness of digital intervention, but it self-reported the lack of enough studies and small sample size of participants to be the cause for lack of validating effectiveness.

However, Döring et al9 establish the relationship between communication-based technologies and reduction in the feeling of loneliness. Through a cross-sectional study of 4315 older adults, aged above 50, it reported that rural older adults who used technology less-frequently felt loneliness more than urban older adults. Choi and Lee10 carried out a study of effectiveness of social networking sites usage in older people for reducing loneliness. The study found some evidence that the use of social networking sites was associated with reduction in feeling of loneliness and reduction in feeling of depression. But the studies lacked on the experimental side.

The brief literature review provided above provides the scientific foundation for effectiveness of technology-based intervention in loneliness. However, there is a gap in global understanding and prevalence of loneliness. Surkalim et al20 carried out a study of prevalence of loneliness in 113 countries to identify data availability, gaps and patterns for population level existence of loneliness. However, the study did not design a tool, nor an intervention based on the meta-analysis carried out.

Twitter has been used for studying other phenomena and public health concerns such as.21 22 Data were collected in Guntuku et al23 from twitter to study loneliness. Twitter is also used for other mental health related topics such as a detailed study of tweets related to insomnia and its correlation with mental health was carried out by Maghsoudi.24 Similarly, Alhuzali et al25 carried analysis of emotions in the UK and geo-located the emotions across different cities to find the sentiment during COVID-19 pandemic. For mental health problems26 carried analysis of twitter data to detect the magnitude of depression. Given the literature overview, the following are the scope and contribution of this paper:

  1. This study provides proof of concept for a loneliness map where the dynamics of loneliness can be understood through publicly available social media and other online data.

  2. How the topics and themes associated with loneliness over Twitter relate to larger socioeconomic and personal-emotional categories?

  3. Is there a difference in aggregate expression of loneliness, thus pointing to the dynamic nature of loneliness, even across the same country or does the expression change across the country relevant to different geographical or socioeconomic conditions?

Data processing and sentiment analysis

In this section, we will present how the data are collected, discuss the data sources as well as sentiment analysis carried out on the data.


The study does not use identity of persons involved generating the data but gives an aggregate and an overall picture based on opinions expressed publicly. We use social intelligence analysis (SIA) to find the correlation of loneliness with mental health problems and other correlated topics. The SIA is a broad theme which incorporates multiple social media sources such as Facebook, Reddit and Quora, etc. SIA is important to gain insight into user’s data and in our case understand the dynamics of loneliness. While SIA can be used for a variety of purposes such as mining content to create stories or to find out trends, we have used SIA for sentiment analysis of collected data on loneliness. As mentioned in the introduction, this is the proof of concept or first step towards a global loneliness map. Therefore, we have only used Twitter for collecting data and used a sentiment analysis tool for analysing the sentiment of data retrieved from USA, which mentions keywords associated with loneliness.

We used respective analysis of publicly available data of users posting about loneliness. Twitter is a social media platform which is used for connectivity and opinion sharing and allows users to post via short messages consisting of 280 characters. Twitter gives access to the users’ data through its publicly available Twitter API for developers. The data we gathered was based on topic modelling through open-vocabulary topics. The relevant tweets about loneliness were gathered and stored in a database. Topics, which are combinations of clusters of co-occurring words were created. These topics are then analysed further through a dictatory-based approach. Our approach also relies on dictionary-based psycholinguistic features to create a loneliness map as is used by Pennebaker et al.27

For topic modelling, we used the words ‘lonely’, ‘loneliness’, ‘alone’, ‘isolated’ and ‘isolation’ to give a list of tweets containing these keywords. In theoretical literature, the words ‘loneliness’ and ‘lonely’ are used to describe the feeling under consideration in this paper. Authors in Guntuku et al’s study23 collected Twitter data based on keywords ‘lonely’ and ‘alone’. We went further and included the synonyms and related words with loneliness for collecting our Twitter data.

We did not want to exhaustively search for one specific country because we wanted the data collected to be proof of concept. We can focus exhaustively on the cities or countries and collect more data about them based on the data collected in this step. The data collected were analysed through a sentiment analysis approach to find out the topics most correlated with loneliness in different cities in the USA. The next subsection explains why sentiment analysis on the collected data is needed.

Sentiment analysis

We collect a particular number of tweets with the keywords for loneliness. If we were reporting all the tweets that contained feelings of loneliness, we would not have required a further step. In that case, the problem becomes determining the association or corelation between themes (which may represent loneliness) with the keywords depicting loneliness. For instance, we had to find what is the relationship between ‘hurt’, ‘sick’, ‘tired’, ‘sleep’, etc with the expression of loneliness. This task is usually carried out by association of lexicon categories with tweets including the words ‘lonely’ or ‘alone’.

The problem we are formulating in this paper is on a larger scale. Thus, the limited scale of representative tweets has to be interpreted in a novel way to give us any meaningful insight into loneliness. All the tweets in each dataset contain keywords representing loneliness. These data can be analysed in one way to give association between loneliness with other categories across the globe for different selected countries. This trend in its own is important to give a global picture of determinants of loneliness and to give a tool to policy-makers to address loneliness in their specific country. But the mention of ‘lonely’ or ‘alone’ can also be in a non-negative way. This fact gives us an opportunity to look at the relationship between mentioning keywords representing loneliness and negative emotions which may ultimately be linked to psycholinguistic feature of mental well-being.

For establishing the corelation between loneliness and negative sentiment we used VADER based on Python’s NLTK. VADER is suited for microblog content, such as that of Twitter. VADER combines lexicon, that is, dictionary-based analysis, and rule-based approach to characterise the sentiment. Other lexicon-based sentiment analysers such as linguistic inquiry and word count (LIWC)27 are only polarity based. VADER on the other hand also gives valence of the sentiment on the range from 1 to 9. Because of the sentiment score we can also know through VADER the extent to which the sentiment is negative or positive.

This valence is based on generalisable rules that represent grammatical and syntactical conventions that humans use in contexts meant for emphasising a sentiment intensity.

For our purposes, another important feature of VADER is the inclusion of sentiment bearing lexical non-verbal items such as emoticons and verbal items such as slang, acronyms, initialisms which are prevalent in social media context. The combination of valence polarity though both lexicon and rule-based approach are valuable for fine-grained sentiment analysis. VADER overcomes the shortcomings of lexicon-based analysers such as LIWC through a machine learning approach. The shortcomings of lexicon-based approach come in coverage, general sentiment intensity and acquiring a new set of human lexical features.

In this paper, through the global loneliness map, the approach is to correlate the categories of loneliness with possible negative mental health outcomes. This map can be used to zoom in on a country where the relationship of loneliness with negative sentiment is higher to derive further analysis. We also provide a correlation of linguistic features representing respective personal and social categories, such as relationships, sleep habits and emotional dysregulation for different categories to show how these can vary across countries. This can be helpful in recognising and understanding the nature of association of loneliness with negative sentiments in different categories. Subsequently, this can guide intervention strategies in those specific areas.


Data about the keywords associated with loneliness were collected during October 2022 through the developer API of Twitter. The purpose of this paper is not to find the number of people with loneliness in a particular area or country. That kind of study would require collecting billions of tweets. Rather the purpose in this study is to find the correlations of loneliness with socioeconomic, political and personal-psychological categories. For this purpose, we do not need to go deeper into a user’s timeline and monitor their activity. We are more interested in the aggregate behaviour of users in relation to the expression of loneliness. We deidentify the tweets before analysing them, that is, we remove the users’ names and IDs. This is part of the data cleaning process. The data are publicly available, but we will not disclose the collected data without anonymising it.

Globally, 4.1 million tweets were collected. Out of these 841 796 were from the USA. Five cities had tweets higher than 10 000 which we analysed. We also analysed one city with tweets less than 10 000 but higher than 5000 to see whether the result conforms to the other cities with number of tweets more than 10 000. Orlando was the city, and the number of tweets was 5535.

Figure 1 presents our pipeline of analysis of data collected from Twitter. Twitter gives access to the users’ data through its publicly available Twitter API for developers. The data we gathered was based on topic modelling through open-vocabulary topics. The relevant tweets about loneliness were gathered and stored in a database. Topics, which are combinations of clusters of co-occurring words, were created. These topics are then analysed further through a dictatory-based approach.

Figure 1

Pipeline for processing Twitter data.

Tweets were collected containing the keywords mentioned in the last subsection. Tweets were extracted from these two countries to make a subdataset belonging to the USA. This was meant to reflect the majority composition of the dataset. Sentiment analysis was carried out after cleaning the data such as removing redundant characters, numbers, special characters, users’ profile ID and information such as ‘retweet’. Sentiment analysis is important to differentiate between the phrases and topics carrying meaningful information on loneliness and metaphorical and non-sequitur uses of the terms and topics associated with loneliness. Figure 2 gives the process of collecting data from Twitter and the process of analysis of the tweets.

Figure 2

Strengthening the Reporting of Observational Studies in Epidemiology diagram for the Twitter data.

Table 1 gives sentiment analysis for different cities as explained above and for the overall dataset which contains data about the USA. Table 1 also points towards an interesting outlier in the dataset, that is, Houston accounts for almost all the neutral tweets. Some of the cities have a more balanced amount of negative and other tweets (ie, positive and neutral) while two clear outlines can be pointed out in the dataset. For Houston, only 21.2% tweets are negative while for Queens 80.6% tweets are negative. The data were collected for 2 weeks, and it is not wide and deep enough to know with certainty the causes of these outliers. As mentioned, this study is a proof of concept for a wider loneliness map on the basis of SIA, that is, through analysing various social media and web based data through the tools of machine learning and AI. However, the neutral tweets along with the positive tweets do not add to the analysis of loneliness as carried out in this paper. With this dataset, the reason for these outliers cannot be ascertained without looking further into long-term data for each city. In further studies, the long term data will be collected to have balanced dataset for each city and find out the reasons for proportion of each category of tweets.

Table 1

Sentiment analysis of tweets containing the keywords/topics of loneliness

The aim of the loneliness map and this paper is to find the correlation between loneliness and mental health issues and other topics which can vary from personal expression to socioeconomic factors. Before going into detailed analysis of the tweets on loneliness, it was important to find out the tweets which are metaphorical or non-sequitur. The neutrality can also represent the mention of loneliness in descriptive terms. The data here show that the sample size is consistent in producing reliable results as Orlando with the smallest sample size has similar results as other cities.

Figure 3 presents the word clouds of the sentiment of the tweets. This figure illustrates the most highly associated words with the groups of users tweeting with keywords associated with loneliness. It is important to plot the word cloud of both positive tweets and negative tweets to differentiate between metaphorical use and the meaningful use as intended by the study design of this paper. From the figure it can be seen that the words associated with positive sentiment of mention of loneliness are positive words such as commitment, sobriety, sober and months (number of months). The word cloud was generated after redundant words were removed such as the ‘RT’ (retweeted) and mention of the user’s ID.

Figure 3

Words more likely to be posted by Twitter users (A) when the sentiment of the tweet is positive, (B) when sentiment of the tweet is negative.

Table 2 presents the highly correlated topics with negative mention of loneliness. The tweets with negative sentiment were first tokenised and stemmed to get a concise list of words and topics associated with loneliness. The list was then analysed and meaningful words representing topics of interest such as emotional, social and health, etc identifiers were found out. Words such as ‘oh’, ‘yeah’ and ‘ur’ were ignored in composing the list. From table 2, it can be seen for the overall US dataset intimate relationships followed by interpersonal relationships are the highest correlated topics, thus, issues associated with loneliness. ‘COVID-19’ is the single highest occurring word in the dataset. The search keywords contained ‘isolated’ and ‘isolation’ and given the social and physical distancing required by COVID-19 prevention guidelines the highest occurrence of COVID-19 in association with negative sentiment of loneliness is expected. This tells us that the isolation because of COVID-19 has negative effects on people’s sentiments, thus their overall mental health. We also found the association of drug and addiction words with loneliness. The same was also found in figure 3B where the word ‘sober’ which is associated with recovery from addiction was used although in a positive sense. The combination of both figure 3B and table 2 shows the association of drug/alcohol addiction with loneliness; thus, it can be further investigated with keywords associated with both loneliness and addiction.

Table 2

Highly correlated topics with mentions of loneliness

In table 3, the city-wise topic association of themes with loneliness was found out. It was based on analysis of tweets with negative sentiment. It was found out that the sample, however, limited, contained variation as per themes and topics associated with the negative consequences of loneliness. While these topics and their association with loneliness are not definitive, that is, it may change with availability of more data per city, it provides proof of concept for the idea of mapping loneliness, nonetheless. Some of the corelations are intuitive and self-expressive, for example, Queens being a big city with the peculiar nature of big city one would expect more self-oriented or self-focused expressions. The data analysed here provide a peek for data collected over a limited period, but it proves that the expression and dynamics of loneliness can change with geography which in turn can be dependent on particular urban infrastructure, healthcare system, socioeconomic issues and culture of the region. Similarly, figure 4 shows a few selected examples of city-wise association of topics with loneliness. As can be seen each word cloud is different with some meaningful words contained in each. For example, in Houston the word ‘lgbtq’ can be seen, while for Orlando words such as ‘love’ can be spotted out. This again drives home the point of variance in experience and expression of loneliness. It must be noted that the word cloud is based on the full words and phrases while the list in table 2 is based on stemmed words.

Figure 4

Selected city-wise examples of world clouds of words/topics associated with negative sentiment of loneliness, (A) Houston, (B) Orlando, (C) Nashville.

Table 3

Top correlated topics with negative mention of loneliness across cities analysed

Discussion and limitations

The methodology developed in this paper shows the association of loneliness with language which is associated with mental health issues such as anger and depression. The tweets analysed prove that psychosocial linguistic features can be found in self-expression of loneliness which can identify dynamics of loneliness.28–30 Further, we present the topics and themes associated with loneliness can vary along both the thematic area and the geographic region. Tweets containing keywords associated with loneliness also represent a self-focused discourse which affirms previous literature on loneliness.31 32 Tables 2 and 3 also point towards other results which have been established in literature on loneliness. These include conformity with literature on association of loneliness with substance abuse, emotional dysregulation and trouble with relationships.33 A loneliness map developed by SIA through machine learning and social media data analysis can thus be a powerful tool for policy-makers.

As mentioned in the Literature review section, there are very rare studies carried out on studying loneliness through Twitter, therefore, this paper is a novel idea in studying and understanding loneliness. However, Twitter and social media have been used to study other mental health and public health concerns. Loneliness was studied in detail for Pennsylvania by Guntuku et al.23 The study provides insight into how loneliness is felt in the particular region. Our study goes beyond this study and carries out a comparative analysis of loneliness across six cities of the USA. Moreover, this study will be used as a proof of concept for a detailed map of loneliness on a global scale. Twitter data were used by Melton et al34 to study the response to vaccination against COVID-19. They developed their own sentiment analysis model, but they did not provide detailed analysis of the users’ tweets. The categorisation of socioeconomic, political and personal-psychological topics was missing from the study which this study provides. Similarly, the dynamics of insomnia and its correlation with different external and internal factors was not carried out by24 as compared with this study which goes in depth to give the dynamics of the topic of study, that is, loneliness.

There are some limitations of this study. The first limitation is that the dataset size is small as compared with the actual data being generated by both countries on the keywords of loneliness. Tweets can run into millions even for a city on the keyword of loneliness. But the purpose of this study is not to carry out a rigorous analysis but to give a proof of concept for a loneliness map. The other limitation of this study on another front is the automatic classification of Tweets into negative and positive through sentiment analysis. While this has been the basis of the paper to carry out automated analysis, the result of this automated sentiment analysis needs to be validated through looking at a certain number of Tweets which have been identified negative. Through this way, we will be able to know the confidence of analysis and quantify the error.


This paper develops the proof of concept for loneliness map project. In this proof of concept, we analysed different cities in the USA through data collected from Twitter to see the correlation of loneliness with negative sentiment and other correlated topics. The loneliness map will be incrementally developed by considering multiple data sources and different regions and countries of the world. The loneliness map project will integrate multiple data sources (such as social media content, surveys and news) to analyse loneliness through ML and AI to create a map of loneliness across the globe to understand the impact of loneliness on mental health. Loneliness map is not only meant to see the prevalence of loneliness in different countries, regions and cities around the world, but it will also be instrumental in understanding the impact of different sociocultural, political, economic and geographical dynamics on loneliness and mental health. The loneliness map can guide intervention for policy-makers in healthcare such as the health map in.35 The interventions can also be guided by data provided by the loneliness map. The division between urban and rural, economic zones and classes and their relationship with loneliness can be observed from a loneliness map. The map can also trace historical data of loneliness in particular regions and find the co-relationship of increasing loneliness with mental health. From the digital mental health perspective, the question that whether loneliness is the result of mental health problems or cause, can be answered through the data provided by the loneliness map.

In this paper, sentiment analysis of tweets containing keywords associated with loneliness was carried out for the US cities. The results showed variance in the sentiment associated with loneliness in different cities as well as the top correlated topics with the mention of loneliness. This can be important for policy-makers to understand the particular nature of loneliness in these cities. These results are only indicative and will need further exhaustive study. To point out for the sake of clarity, the number of tweets containing the keywords associated with loneliness can run up to millions for a particular city during a year. But the objective of this paper is not to study exhaustively each city but to determine from the data collected the sentiment associated with loneliness in order to prove that the dynamics of loneliness are not the same even in the same country. This provides a peep into the varying nature of loneliness, thus driving the point home that loneliness can be varied and would need different strategies to counter the negative feelings associated with loneliness.

In future, this work can be extended in many directions. We plan to extend the analysis with the same cities by collecting focused data and analysing it in more detail to find the socioeconomic and personal-emotional dynamics of loneliness for the city. We also will collect data about loneliness from different countries in different languages, translate the data and analyse through sentiment analysis. In further work in data collection, we will use other social media platforms such as Facebook, Reddit and Quora to collect data and topics on loneliness. The data on these platforms are detailed which can give more intimate analysis of a person’s experience of loneliness. While Twitter’s data are diverse, the data of these other social media platforms would be detailed and intimate. The data from these different social media platforms can then be combined to have a more accurate understanding of loneliness, thus also improving the quality of the loneliness map.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

No ethical approval was sought for this study as this study analyses publicly available online content.



  • Contributors HAS designed the study, carried out the analysis and wrote the paper. HAS is also guarantor of the study. MH designed and supervised the study.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.