Social profiling is the process of constructing a social media user's profile using his or her social data. In general, profiling refers to the data science process of generating a person's profile with computerized algorithms and technology.[1] There are various platforms for sharing this information with the proliferation of growing popular social networks, including but not limited to LinkedIn, Google+, Facebook and Twitter.[2]

Social profile and social data

A person's social data refers to the personal data that they generate either online or offline[3] (for more information, see social data revolution). A large amount of these data, including one's language, location and interest, is shared through social media and social network. Users join multiple social media platforms and their profiles across these platforms can be linked using different methods[4] to obtain their interests, locations, content, and friend list. Altogether, this information can be used to construct a person's social profile.

Meeting the user's satisfaction level for information collection is becoming more challenging. This is because of too much "noise" generated, which affects the process of information collection due to explosively increasing online data. Social profiling is an emerging approach to overcome the challenges faced in meeting user's demands by introducing the concept of personalized search while keeping in consideration user profiles generated using social network data. A study reviews and classifies research inferring users social profile attributes from social media data as individual and group profiling. The existing techniques along with utilized data sources, the limitations, and challenges were highlighted.

The prominent approaches adopted include machine learning, ontology, and fuzzy logic. Social media data from Twitter and Facebook have been used by most of the studies to infer the social attributes of users. The literature showed that user social attributes, including age, gender, home location, wellness, emotion, opinion, relation, influence are still need to be explored.[5]

Personalized meta-search engines

The ever-increasing online content has resulted in the lack of proficiency of centralized search engine's results.[6][7] It can no longer satisfy user's demand for information. A possible solution that would increase coverage of search results would be meta-search engines,[6] an approach that collects information from numerous centralized search engines. A new problem thus emerges, that is too much data and too much noise is generated in the collection process.

Therefore, a new technique called personalized meta-search engines was developed. It makes use of a user's profile (largely social profile) to filter the search results. A user's profile can be a combination of a number of things, including but not limited to, "a user's manual selected interests, user's search history", and personal social network data.[6]

Social media profiling

According to Samuel D. Warren II and Louis Brandeis (1890), disclosure of private information and the misuse of it can hurt people's feelings and cause considerable damage in people's lives.[8] Social networks provide people access to intimate online interactions; therefore, information access control, information transactions, privacy issues, connections and relationships on social media have become important research fields and are subjects of concern to the public.

Ricard Fogues and other co-authors state that "any privacy mechanism has at its base an access control", that dictate "how permissions are given, what elements can be private, how access rules are defined, and so on".[9] Current access control for social media accounts tend to still be very simplistic: there is very limited diversity in the category of relationships on for social network accounts. User's relationships to others are, on most platforms, only categorized as "friend" or "non-friend" and people may leak important information to "friends" inside their social circle but not necessarily users to they consciously want to share the information to.[9] The below section is concerned with social media profiling and what profiling information on social media accounts can achieve.

Privacy leaks

A lot of information is voluntarily shared on online social networks, such as photos and updates on life activities (new job, hobbies, etc.). People rest assured that different social network accounts on different platforms will not be linked as long as they do not grant permission to these links. However, according to Diane Gan, information gathered online enables "target subjects to be identified on other social networking sites such as Foursquare, Instagram, LinkedIn, Facebook and Google+, where more personal information was leaked".[10]

The majority of social networking platforms use the "opt out approach" for their features. If users wish to protect their privacy, it is user's own responsibility to check and change the privacy settings as a number of them are set to default option.[10] A major social network platforms have developed geo-tag functions and are in popular usage. This is concerning because 39% of users have experienced profiling hacking; 78% burglars have used major social media networks and Google Street-view to select their victims; and an astonishing 54% of burglars attempted to break into empty houses when people posted their status updates and geo-locations.[11]


Formation and maintenance of social media accounts and their relationships with other accounts are associated with various social outcomes.[12] In 2015, for many firms, customer relationship management is essential and is partially done through Facebook.[13] Before the emergence and prevalence of social media, customer identification was primarily based upon information that a firm could directly acquire:[14] for example, it may be through a customer's purchasing process or voluntary act of completing a survey/loyalty program. However, the rise of social media has greatly reduced the approach of building a customer's profile/model based on available data. Marketers now increasingly seek customer information through Facebook;[13] this may include a variety of information users disclose to all users or partial users on Facebook: name, gender, date of birth, e-mail address, sexual orientation, marital status, interests, hobbies, favorite sports team(s), favorite athlete(s), or favorite music, and more importantly, Facebook connections.[13]

However, due to the privacy policy design, acquiring true information on Facebook is no trivial task. Often, Facebook users either refuse to disclose true information (sometimes using pseudonyms) or setting information to be only visible to friends, Facebook users who "LIKE" your page are also hard to identify. To do online profiling of users and cluster users, marketers and companies can and will access the following kinds of data: gender, the IP address and city of each user through the Facebook Insight page, who "LIKED" a certain user, a page list of all the pages that a person "LIKED" (transaction data), other people that a user follow (even if it exceeds the first 500, which we usually can not see) and all the publicly shared data.[13]


First launched on the Internet in March 2006, Twitter is a platform on which users can connect and communicate with any other user in just 280 characters.[10] Like Facebook, Twitter is also a crucial tunnel for users to leak important information, often unconsciously, but able to be accessed and collected by others.

According to Rachel Numer, in a sample of 10.8 million tweets by more than 5,000 users, their posted and publicly shared information are enough to reveal a user's income range.[15] A postdoctoral researcher from the University of Pennsylvania, Daniel Preoţiuc-Pietro and his colleagues were able to categorize 90% of users into corresponding income groups. Their existing collected data, after being fed into a machine-learning model, generated reliable predictions on the characteristics of each income group.[15]

The mobile app called displays live tweets on Google Maps by using geo-location details attached to the tweet, and traces the user's movement in the real world.[10]

Profiling photos on social network

The advent and universality of social media networks have boosted the role of images and visual information dissemination.[16] Many types of visual information on social media transmit messages from the author, location information and other personal information. For example, a user may post a photo of themselves in which landmarks are visible, which can enable other users to determine where they are. In a study done by Cristina Segalin, Dong Seon Cheng and Marco Cristani, they found that profiling user posts' photos can reveal personal traits such as personality and mood.[16] In the study, convolutional neural networks (CNNs) is introduced. It builds on the main characteristics of computational aesthetics CA (emphasizing "computational methods", "human aesthetic point of view", and "the need to focus on objective approaches"[16]) defined by Hoenig (Hoenig, 2005). This tool can extract and identify content in photos.


In a study called "A Rule-Based Flickr Tag Recommendation System", the author suggests personalized tag recommendations,[17] largely based on user profiles and other web resources. It has proven to be useful in many aspects: "web content indexing", "multimedia data retrieval", and enterprise Web searches.[17]





In 2011, marketers and retailers are increasing their market presence by creating their own pages on social media, on which they post information, ask people to like and share to enter into contests, and much more. Studies in 2011 show that on average a person spends about 23 minutes on a social networking site per day.[18] Therefore, companies from small to large ones are investing in gathering user behavior information, rating, reviews, and more.[19]


Until 2006, communications online are not content led in terms of the amount of time people spend online. However, content sharing and creating has been the primary online activity of general social media users and that has forever changed online marketing.[20] In the book Advanced Social media Marketing,[21] the author gives an example of how a New York wedding planner might identify his audience when marketing on Facebook. Some of these categories may include: (1) who live in the United States; (2) Who live within 50 miles of New York; (3) Age 21 and older; (4) engaged female.[21] No matter you choose to pay cost per click or cost per impressions/views "the cost of Facebook Marketplace ads and Sponsored Stories is set by your maximum bid and the competition for the same audiences".[21] The cost of clicks is usually $0.5–1.5 each.



Klout is a popular online tool that focuses on assessing a user's social influence by social profiling. It takes several social media platforms (such as Facebook, Twitter etc.) and numerous aspects into account and generate a user's score from 1 to 100. Regardless of one's number of likes for a post, or connections on LinkedIn, social media contains plentiful personal information. Klout generates a single score that indicates a person's influence.[22]

In a study called "How Much Klout do You Have...A Test of System Generated Cues on Source Credibility" done by Chad Edwards, Klout scores can influence people's perceived credibility.[23] As Klout Score becomes a popular combined-into-one-score method of accessing people's influence, it can be a convenient tool and a biased one at the same time. A study of how social media followers influence people's judgments done by David Westerman illustrates that possible bias that Klout may contain.[24] In one study, participants were asked to view six identical mock Twitter pages with only one major independent variable: page followers. Result shows that pages with too many or too fewer followers would both decrease its credibility, despite of its similar content. Klout score may be subject to the same bias as well.[24]

While this is sometimes used during recruitment process, it remains to be controversial.


Kred not only assigns each user an influence score, but also allows each user to claim a Kred profile and Kred account. Through this platform, each user can view how top influencers engage with their online community and how each of your online action impacted your influence scores.

Keyhole Data Analytics

Several suggestions that Kred is giving to the audience about increasing influence are: (1) be generous with your audience, free comfortable sharing content from your friends and tweeting others; (2) join an online community; (3) create and share meaningful content; (4) track your progress online.

Follower Wonk

Follower Wonk is specifically targeted towards Twitter analytics, which helps users to understand follower demographics, and optimizes your activities to find which activity attracts the most positive feedback from followers.


Keyhole is a hashtag tracking and analytics device that tracks Instagram, Twitter and Facebook hashtag data. It is a service that allows you to track which top influencer is using a certain hashtag and what are the other demographic information about the hashtag. When you enter a hashtag on its website, it will automatically randomly sample users that currently used this tag which allows user to analyze each hashtag they are interested in.

Online activist social profile

The prevalence of the Internet and social media has provided online activists both a new platform for activism, and the most popular tool. While online activism might stir up great controversy and trend, few people actually participate or sacrifice for relevant events. It becomes an interesting topic to analyse the profile of online activists. In a study done by Harp and his co-authors about online activist in China, Latin America and United States, the majority of online activists are males in Latin America and China with a median income of $10,000 or less, while the majority of online activist is female in United States with a median income of $30,000 - $69,999; and the education level of online activists in the United States tend to be postgraduate work/education while activists in other countries have lower education levels.[25]

A closer examination of their online shared content shows that the most shared information online include five types:

  1. To fundraise: Out of the three countries, China's activists have the most content on fundraise out of the three.
  2. To post links: Latin American activists have does the most on posting links.
  3. To promote debate or Discussion: Both Latin America's and China's activists posts more contents to promote debate or discussion than American activists do.
  4. To post information such as announcements and news: American activists post more such content than the activists from other countries.
  5. To communicate with Journalist: In this section, China's activists gets the lead.

Social credit score in China

See also: Social credit and Social Credit System

The Chinese government hopes to establish a "social-credit system" that aims to score "financial creditworthiness of citizens", social behavior and even political behaviour.[26] This system will be combining big data and social profiling technologies. According to Celia Hatton from BBC News, everyone in China will be expected to enroll in a national database that includes and automatically calculates fiscal information, political behavior, social behavior and daily life including minor traffic violations – a single score that evaluates a citizen's trustworthiness.[27]

Credibility scores, social influence scores and other comprehensive evaluations of people are not rare in other countries. However, China's "social-credit system" remains to be controversial as this single score can be a reflection of a person's every aspect.[27] Indeed, "much about the social-credit system remains unclear".[26]

How would companies be limited by credit score system in China?

Although the implementation of social credit score remains controversial in China, Chinese government aims to fully implement this system by 2018.[28] According to Jake Laband (the deputy director of the Beijing office of the US-China Business Council), low credit scores will "limit eligibility for financing, employment, and Party membership, as well restrict real estate transactions and travel." Social credit score will not only be affected by legal criteria, but also social criteria, such as contract breaking. However, this has been a great concern for privacy for big companies due to the huge amount of data that will be analyzed by the system.

See also


  1. ^ Kanojea, Sumitkumar; Mukhopadhyaya, Debajyoti; Girase, Sheetal (2016). "User Profiling for University Recommender System using Automatic Information Retrieval". Procedia Computer Science. 78: 5–12. doi:10.1016/j.procs.2016.02.002.
  2. ^ Vu, Xuan Truong; Abel, Marie-Hélène; Morizet-Mahoudeaux, Pierre (October 1, 2015). "A user-centered and group-based approach for social data filtering and sharing" (PDF). Computers in Human Behavior. Computing for Human Learning, Behaviour and Collaboration in the Social and Mobile Networks Era. 51, Part B: 1012–1023. doi:10.1016/j.chb.2014.11.079.
  3. ^ Fontinelle, Amy (February 6, 2017). "Social Data". Investopedia. Retrieved April 3, 2017.
  4. ^ Kaushal., Rishabh; Ghosh., Vasundhara (March 26, 2020). 2019 IEEE Intl Conf on Parallel \& Distributed Processing with Applications, Big Data \& Cloud Computing, Sustainable Computing \& Communications, Social Computing \& Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE. doi:10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00231. S2CID 214692247.
  5. ^ Bilal, Muhammad; Gani, Abdullah; Lali, Muhammad Ikram Ullah; Marjani, Mohsen; Malik, Nadia (2019). "Social Profiling: A Review, Taxonomy, and Challenges". Cyberpsychology, Behavior, and Social Networking. 22 (7): 433–450. doi:10.1089/cyber.2018.0670. PMID 31074639. S2CID 149444514.
  6. ^ a b c Saoud, Zakaria; Kechid, Samir (April 1, 2016). "Integrating social profile to improve the source selection and the result merging process in distributed information retrieval". Information Sciences. 336: 115–128. doi:10.1016/j.ins.2015.12.012.
  7. ^ Lawrence, Steve; Giles, C. Lee (July 8, 1999). "Accessibility of information on the web". Nature. 400 (6740): 107–9. Bibcode:1999Natur.400..107L. doi:10.1038/21987. ISSN 0028-0836. PMID 10428673. S2CID 4347646.
  8. ^ D. Warren, Samuel; D. Brandeis, Louis (December 1890). "The Right to Privacy". Harvard Law Review. IV.
  9. ^ a b Fogues, Ricard; Such, Jose M.; Espinosa, Agustin; Garcia-Fornes, Ana (May 4, 2015). "Open Challenges in Relationship-Based Privacy Mechanisms for Social Network Services" (PDF). International Journal of Human–Computer Interaction. 31 (5): 350–370. doi:10.1080/10447318.2014.1001300. hdl:10251/65888. ISSN 1044-7318. S2CID 16864348.
  10. ^ a b c d Gan, Diane; Jenkins, Lily R. (March 23, 2015). "Social Networking Privacy—Who's Stalking You?" (PDF). Future Internet. 7 (1): 67–93. doi:10.3390/fi7010067.
  11. ^ "Social Media And Crime". Retrieved April 23, 2017.
  12. ^ Park, Namkee; Lee, Seungyoon; Kim, Jang Hyun (September 1, 2012). "Individuals' personal network characteristics and patterns of Facebook use: A social network approach". Computers in Human Behavior. 28 (5): 1700–1707. doi:10.1016/j.chb.2012.04.009.
  13. ^ a b c d van Dam, Jan-Willem; van de Velden, Michel (February 1, 2015). "Online profiling and clustering of Facebook users". Decision Support Systems. 70: 60–72. doi:10.1016/j.dss.2014.12.001.
  14. ^ Zhu, Feng; Zhang, Xiaoquan (Michael) (May 29, 2013). "Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics". Journal of Marketing. 74 (2): 133–148. CiteSeerX doi:10.1509/jmkg.74.2.133.
  15. ^ a b Nuwer, Rachel (November 17, 2015). "Money Talks—and Tweets". Scientific American. 313 (6): 17. Bibcode:2015SciAm.313f..17N. doi:10.1038/scientificamerican1215-17.
  16. ^ a b c Segalin, Cristina; Cheng, Dong Seon; Cristani, Marco (March 1, 2017). "Social profiling through image understanding: Personality inference using convolutional neural networks". Computer Vision and Image Understanding. Image and Video Understanding in Big Data. 156: 34–50. doi:10.1016/j.cviu.2016.10.013.
  17. ^ a b Cagliero, Luca; Fiori, Alessandro; Grimaudo, Luigi (January 1, 2013). "A Rule-Based Flickr Tag Recommendation System". In Ramzan, Naeem; Zwol, Roelof van; Lee, Jong-Seok; Clüver, Kai; Hua, Xian-Sheng (eds.). Social Media Retrieval. Computer Communications and Networks. Springer London. pp. 169–189. doi:10.1007/978-1-4471-4555-4_8. ISBN 9781447145547.
  18. ^ "Facebook Dominates, the Emergence of reddit and Hulu: Taking a Look at 4 Years of Distracting Websites at RescueTime". RescueTime Blog. October 3, 2011. Retrieved April 7, 2017.
  19. ^ Engineers., Institute of Electrical and Electronics; Society., IEEE Communications (January 1, 2011). 2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application : [IMSAA 11] : December 12-13, 2011, Bangalore, India. IEEE. ISBN 9781457713286. OCLC 835764725.
  20. ^ Dave., Evans (January 1, 2012). Social media marketing : an hour a day. Wiley. ISBN 9781118227671. OCLC 796208293.
  21. ^ a b c Tom, Funk (January 1, 2013). Advanced Social Media Marketing How to Lead, Launch, and Manage a Successful Social Media Program. Apress. ISBN 9781430244080. OCLC 981044629.
  22. ^ "What's In A Score? Altimeter Group Explains What Brands Really Need To Know About Influencers". March 26, 2012.
  23. ^ Edwards, Chad; Spence, Patric R.; Gentile, Christina J.; Edwards, America; Edwards, Autumn (September 1, 2013). "How much Klout do you have … A test of system generated cues on source credibility". Computers in Human Behavior. 29 (5): A12–A16. doi:10.1016/j.chb.2012.12.034. S2CID 295841.
  24. ^ a b Westerman, David; Spence, Patric R.; Van Der Heide, Brandon (January 1, 2012). "A social network as information: The effect of system generated reports of connectedness on credibility on Twitter". Computers in Human Behavior. 28 (1): 199–206. doi:10.1016/j.chb.2011.09.001.
  25. ^ Descôteaux, Josée (September 1, 2009). "[Laura Archer. Her empty hands, her eyes, her instinct...and her passion]". Perspective Infirmière. 6 (5): 7–8. ISSN 1708-1890. PMID 20120298.
  26. ^ a b "China invents the digital totalitarian state". The Economist. December 17, 2016. Retrieved April 14, 2017.
  27. ^ a b Hatton, Celia (October 26, 2015). "China 'social credit': Beijing sets up huge system". BBC News. Retrieved April 14, 2017.
  28. ^ Laband, Jake (February 3, 2017). "How Can Individuals, Companies be Limited by Bad Social Credit in China?". China Business Review.