Methodology of big data in studying orthodox communities
Abstract
The article discusses the emergence of "digital sociology" as a new scientific direction, its main methodological principles, and their implementation in empirical research in the social network VKontakte. The working process of the researcher with such a tool as a data parser is considered, as well as the main principles of using this method.
Within the framework of the "big data" paradigm, we analyze the principles of building a study sample, including the selection of communities and their participants. Using the example of Orthodox VKontakte communities dedicated to the family, we show such techniques for minimizing the sampling error and selecting the most relevant audience, as searching for users in selected communities; finding users in several communities at the same time, which contributes to the uniformity of the sample; clearing bots and users who have not installed an avatar; searching for users with an "open" date of birth.
The article identifies socio-demographic criteria for analyzing the audience of Orthodox communities (distribution by gender and age groups, geography of community members by country and city, marital status, number of children), as well as the main behavioral criterion – the engagement rate.
The engagement rate as a research tool allows to take into account the behavioral activity of community members, including likes, republications, and comments over the entire lifetime of certain communities. This criterion allows you to assess the degree of influence of communities on their members, based not on the number of VKontakte groups, which may differ at times, but on the degree of participation of subscribers in the life of the community. The article shows that the engagement rate in Orthodox family communities is higher on average than in secular communities of similar subjects. This is primarily due to the very religious orientation of Orthodox communities, which allows both active engagement of existing subscribers in various communication activities, and involvement of new ones.
Information for citation: Pisarevskiy, V. G. (2020), “Methodology of big data in studying orthodox communities”, Research Result. Sociology and management, 6 (1), 16-28. DOI: 10.18413/2408-9338-2020-6-1-0-2
Keywords: Digital sociology, big data, Internet research sampling, Orthodox communities in VKontakte, engagement rate
Introduction. Internet research in modern social reality is developing at an increasingly rapid pace. This is not only about online projections of traditional sociological methods, such as online surveys and online focus groups, but also about the emergence of a fundamentally new research method based on the "big data"paradigm (Couldry, Fotopoulou, 2014).
For the first time, the term "digital sociology" appears in the works of Sydney University researcher Deborah Lupton in 2012, and three years later, her textbook on this new scientific field is published (Lupton, 2015).
Mark Carrigan, a researcher from the UK, emphasizes that "digital sociology can be viewed in the broadest sense as revealing the opportunities that digital tools (including social media communities) provide for rethinking the structure of sociological knowledge" (Carrigan, 2013).
Let's consider the main methodological principles of digital sociology. The first principle is formulated as follows: the process of transmitting information is equated with the process of influence. To the maximum extent, this principle is manifested in social networks, where we can observe a "viral" type of content distribution through likes, republications, and comments.
The second methodological principle follows from the first: if the process of transmitting information in modern conditions is identical to the process of influence, then this influence can not be spread exclusively in the online environment, it inevitably manifests itself in offline, that is, the usual social reality.
It is clear that even before the formation of digital sociology as a separate scientific field, various Internet studies were conducted aimed at studying the social space that is being formed in the network. However, it is only within the framework of digital sociology, which development coincided with the development of social networks, that it became possible to focus on the study of digital media (primarily, communities in social networks) in order to determine what impact these digital media have on real social relations and processes.
The implementation of the second methodological principle of digital sociology is described in the works of the famous American sociologist Manuel Castels. He describes the relationship of social reality on the Internet (online) with the usual social reality (offline) as a phenomenon of "real virtuality" (Castells, 2004). In addition, Castels also refers to the first methodological principle of digital sociology, defining the ability to change, reconfiguration as a "decisive feature in society". At the same time, the generation, processing and transmission of information become fundamental sources of power and influence.
Methods and methodology. Finally, the third methodological principle of digital sociology is that the novelty of the object under study is determined primarily by new digital technologies.
Methods of digital sociology are based on the big data paradigm, which analyzes arrays of millions and tens of millions of Internet users, examines their views, habits, and behavioral factors. For communities in social networks, this includes automated analysis of user profiles completed during registration, using the source code of the social network to analyze users 'value orientations, content analysis of the community based on user preferences, and analysis of users' social relationships within the community (Ruppert, Law, Savage, 2013a).
Big data sources are Internet documents, social networks, blogs, measurement devices, radio frequency identification, audio and video recording devices, including mobile devices and various wearable gadgets (for example, fitness bracelets) (Baym, 2013a).
The key properties of big data are usually referred to by the abbreviation "3V", which stands for the volume of data, the variety of data, and the high speed of data update (velocity) (Akhmedov, 2018).
Research based on big data, according to the British "digital sociologist" Rogers, is connected with the future of sociological research because, on the one hand, while empirical studies of big data to develop new research approaches, on the other hand, these studies enable sociologists to see the correlation of disparate facts, such as cultural preferences of study participants and their political choice during voting, and then link them within a single social space of the information society (Rogers, 2013).
Based on the methodological principles of digital sociology, which were described above, we will consider Orthodox communities in the VKontakte social network dedicated to the family.
Currently, the largest social network in Russia is VKontakte with a monthly audience of 97 million people, while generating more than one and a half billion messages per day (VKontakte , 2020a). Research in social networks based on the big data paradigm has been developing in the Western scientific community for a long time (Mayer-Schonberger V., Cukier, K., 2013). at the same time, such research in Russian social networks, primarily in VKontakte, is not widely distributed yet.
In order to determine the socio-demographic profile of users of specific VKontakte communities, the functionality of the "advertising Cabinet" of this social network is used. You can use it to determine: gender, age, country and city of the user, availability of higher education, marital status, profession. In addition, using The VKontakte advertising Cabinet, you can estimate the total number of the audience of the communities of interest to the researcher (in this case, if the user is in several communities at the same time, he will still be counted as one person), so you can analyze up to 25 communities in total.
However, for more detailed studies, it is necessary to use specialized software – the so-called parsers, which can be used to solve a wider range of research tasks, including testing hypotheses for the target audience that interests us (Marres, 2012: 140).
The parser performs automated analysis ofVKontakte communities, their audience, as well as online social actions carried out by this audience (likes, republications, comments).
Today, one of the most powerful parsers of the VKontakte social network is the TargetHunter service. This article describes the principles of forming a sample of the audience of communities in the VKontakte network using the tools provided by TargetHunter (more than 150 criteria are available for analysis) (VKontakte , 2020b).
We will consider the basic principles of sampling and working with it on the example of Orthodox family communities in the VKontakte social network. Let's note at once that it is better to narrow down the topics on which information is sought – for example, we do not analyze all Orthodox communities in the VKontakte network (although this task is technically quite feasible), but only those that are dedicated to the Orthodox family. But this is not the limit for "narrowing" the subject – within the family, you can focus on the following topics – Dating, a young family, the birth and upbringing of children, communication with adult children and grandchildren, etc. If the task is to get a General idea of the behavioral characteristics of the audience on a certain issue, then there is no point in" narrowing " the topic (Baym, 2013b).
The sample should include communities with different audience sizes – those with tens or hundreds of thousands of participants, and those with several thousand members. At the same time, the optimal lower limit for the size of the community is the limit of 1000 people – for the community administration, this is the first significant milestone in the formation of the audience.
How do we select specific communities to sample? In the TargetHunter parser, there is a "Search" tab where we select the "Communities" subsection and then "by keywords". Next, we enter the keywords "Orthodoxy" and "Orthodox", set the lower limit of the number of community members-from 1000 people and study the search result. We found 1027 different communities with the specified keywords in their names. Of course, there are many more Orthodox family communities, so we not only select communities that are called, for example, "Orthodox family", but also look for recommended groups of similar topics in these communities in the "links" section. It is considered that at least 10 groups must be selected for relevant analysis.
Research results and discussion. If we are creating a sample that contains hundreds or thousands of communities, then we need to take into account another criterion for the final selection: the homogeneity of the sample. To do this, we go to the "collection-participants" section and specify a condition that the community members should consist of at least two groups (in some cases, you can specify a larger number of groups that must contain representatives of the target audience at the same time).
Our task is to gather as many different communities as possible, so here is a group dedicated to the husband as the head of the family, and the community "Orthodox family – the Foundation of Russia", in which the topics we are interested in are analyzed from the Orthodox positions not only at the micro-but also at the macro level.
After creating a list of communities that our target audience consists of, you need to get a list of all the members of these groups. In the social network VKontakte, each participant has a unique number that the network and parser programs use to identify a specific person. It is important here that, unlike traditional sociological studies, where respondents are depersonalized and only meet certain socio-demographic criteria for sampling, digital sociology takes into account each respondent with the entire set of unique characteristics inherent in them.
It should be noted that the size of our sample is not equal to the sum of participants in all the selected communities, since the same person can be a member of several communities at the same time, and it should be taken into account as a single Respondent. To get the result, in the "collection" tab of the parser, select the value "participants" and enter links to the selected groups. The initial sample consisted of 354 969 people.
Next, we need to clear our selection of so-called "bots". Bots are VKontakte pages created with the help of special software that try to imitate the actions of live people, but are controlled by a robot, or they are hacked pages of live people, which are also further controlled by a robot. If you do not clear the selection from bots, the sampling error can be quite large and reach up to 10%.
Clearing bots is the most resource – intensive operation. in the Target Hunter parser, it is performed using the method of Alexander Volkov (Volkov, 2016). According to this method, the parser checks all the communities that our sample members belong to, with the number of participants from 50 to 10,000, and there must be at least 5 representatives of the sample in each community found. Next, communities without the original image (avatar) are identified, and this list is compared with the one received earlier. The resulting list of participants is subtracted from the initial sample and as a result we have a sample cleared of bots. Applying the described method, we obtained a sample of 307 767 people.
The next step in creating a sample is to clear our target audience of people who have been deleted by the VKontakte social network itself, and those who have not installed any avatar. As a result of this purification we obtained the number in the 307 754 people. This means that the previous stage of working with the sample was performed very carefully.
Next, you need to select those who made their date of birth open. The fact is that when registering on the VKontakte network, you need to specify the date of birth, but in the future it can be hidden-completely or leaving only the number and month, hiding the year. To work correctly with the selection, we need to select only those who have specified their full date of birth. To do this, in the "tools-filter profiles" section, select the option "leave those whose age is hidden or not specified". Thus, we get those who did not specify their age, and then using the section "tools-crossing bases" from our sample obtained after filtering out bots, we subtract those who did not specify their age. As a result, we have a sample of 105,665 people. This is our final audience, which we will analyze in the future.
The table shows that the sample is dominated by older age groups of30-34, 35-44, and 45 years of age. This is due not only to the topics of the groups we are considering, but also to the fact that the audience of the entire VKontakte social network has grown significantly over the past 5 years (Especially the VKontakte audience).
Let's look at the countries where our target audience is located.
Traditionally, Russia is in the first place, since it is the leader of the social network VKontakte, and Ukraine, Belarus, and Kazakhstan are also in 2-4 places. The United States and Germany have the largest Russian-speaking communities in the world, so their presence in the list is also not surprising. As for other countries, such as Israel and Italy, we can assume that other social networks, such as Instagram, are more relevant for our compatriots in these countries.
Let's look at the cities where our target audience is present.
Among the presented cities, Moscow is the leader and the "Northern capital" – St. Petersburg. Next, we see cities with millions of people, and not only Russian ones-the list includes Kiev and Minsk. The list includes a lot of large cities: Novosibirsk, Tolyatti, Izhevsk, Cheboksary, Yaroslavl, Belgorod, Arkhangelsk. This distribution of the audience can be explained by the fact that, as noted by a number of experts, Orthodoxy is becoming the religion of large cities and megacities (Russ, 2015).
In the "tools-profile filter" section, we can see how many children our target audience has. This criterion is interesting because if, for example, we are interested in a segment of Orthodox parents with many children in the topic of family, we can form this segment and work with it in the future, including traditional sociological methods, such as a survey and an expert interview. It is worth noting that the majority of VKontakte users generally do not specify information about their children.
Let us consider the marital status of the sample members.
Most of the sample members who indicated their marital status are married (more than 17%), while only 4% said they were unmarried. All other statuses of marital status that VKontakte offers are difficult to analyze, because it is not clear what is meant by the criterion "everything is complicated" or "civil marriage" (how it is understood by the sample participants). At the same time, as already noted, most of the sample members are married, which is not surprising for users of Orthodox family communities.
Most of our target audience members who provided information about children have 1-2 children. With each new child, the number of representatives of the target audience who have the corresponding number of children decreases by a multiple. On the other hand, we can hypothesize that parents with many children should be sought in other Orthodox communities. And it's not just groups that have the word "multiple children" in their name. In order to correctly solve this research problem, it is necessary to select most of the Orthodox communities (thousands of groups with a total audience of several million people), form a sample of their participants, according to the principles described above, and then look at the number of children in each individual segment of the audience.
We looked at the socio-demographic characteristics of the audience. Now let's analyze the main behavioral criterion of communities in the VKontakte network – the engagement rate (ER).
If we want to assess the impact of an Internet community, we can't focus solely on the size of its audience (i.e., the number of participants). In large communities, users are often nominally subscribers, but the priority of the community for them is extremely low, and therefore they practically do not participate in its activity. On the other hand, smaller communities tend to have a higher level of activity, which leads to a higher engagement rate. At the same time, as can be seen from the data presented in the table below, the engagement rate is extremely low in some small communities, which can be explained by the low communication activity of community members.
As can be seen from the table, in Orthodox communities, the involvement rate ranges from 4.72 to 89.14% (these values are highlighted in yellow in the table), the average value is 34.47%. The highest engagement rate for the Orthodox family community!", which consists of almost 27 thousand people.
In order to assess how high the involvement rate is in Orthodox communities, it is necessary to conduct a comparative analysis of this indicator with secular communities.
As can be seen from the table, in secular family communities, the involvement rate ranges from 2.01 to 69.29% (these values are highlighted in yellow in the table), the average value is 12.08%. At the same time, the maximum value of the considered criterion (69.29%) for a small (relative to communities of millions) community " Love. Family. Children».
In General, the values obtained are much lower than in similar family-related Orthodox communities on VKontakte. This suggests that the religious Orthodox theme itself involves users in discussions and republications much more than the secular theme.
Conclusions. Let's present the main conclusions.
First, digital sociology is a relatively young branch of sociological science, but it is one of the most dynamically developing. Digital sociology is based on the "big data" paradigm, which involves working with huge amounts of unstructured information with its further "structuring" in accordance with the goals and objectives of empirical research.
If we talk about the direction for the study of social networks in the digital sociology, then there are many directions of research – social ties between key members of the target audience (identifying "opinion leaders), automated content analysis of the tone and nature of the statements on huge samples of millions and tens of millions of people, finally, analysis of various audience characteristics, as described in this article.
At the same time, approaches to data analysis also differ. If traditional sociological studies first put forward certain hypotheses that will have to be confirmed or refuted in the future, then digital studies search for correlations across all data before obtaining the desired information, which allows you to identify a number of significant relationships.
Secondly, for mass empirical research in the framework of the methodology of "digital sociology", it is necessary to use automated means of collecting and analyzing "big data" – parsers. The greatest opportunities are provided by the TargetHunter parser, where you can select and analyze the audience according to 150 different criteria. Until recently, this method was widely used in commercial research of audiences of various brands, and there were not so many scientific studies based on it. This is due to both the cost of using the parser (note that the cost of research based on parsing is minimal compared to full-fledged big data research) and the complexity of developing methodological approaches to solving certain empirical problems.
Thus, the quality of the sample is critically important – in order to get the most relevant audience, it is necessary to use a number of sampling principles described in the article. It should be noted that this methodology can be used in the study of any topics in the social network VKontakte. The palette of using the parsing method in sociological research is extremely wide – it is predicative research (for example, using this automated analysis, it is possible to outline the time limits of the occurrence of influenza epidemics in a particular region of the country), research of political preferences of the population, attitudes to various socio-political and socio-economic initiatives, and much more.
Combining traditional research methods with digital methods is also promising, as the British "digital sociologist" Rogers wrote (Ruppert, Law, Savage, 2013b). For example, using the parsing method, you can identify opinion leaders within the target audience in order to conduct expert interviews or focus groups with them in the future.
Third, the article analyzed users of Orthodox family communities in VKontakte based on socio-demographic and behavioral criteria. As for the socio-demographic criteria, attention is drawn to the shift in the age of participants towards older age groups, namely 27-29, 30-24, 35-44. Meanwhile, in the study of Orthodox communities in VKontakte, conducted by us in 2016, the largest number was shown by age groups from 22 to 30 years. On the other hand, this shift can be explained by the fact that the greatest interest in family communities in social networks is traditionally shown by middle-aged people, and for the full picture it would be correct to study the entire set of Orthodox communities in VKontakte (more than 10 thousand communities).
The most interesting results were obtained by the main behavioral criterion-the involvement coefficient. In Orthodox family communities, it is significantly higher than in secular communities of the same subject (in the first, the average rate of involvement was almost 35%, while in the second-12.08%). This gap cannot be explained by the size of communities, the communication strategies of different communities, or other similar factors. In our opinion, the main reason for greater involvement in Orthodox family communities is precisely the religious orientation of the content. This helps to maximize the involvement of existing community members in the community's activities (through likes, republications, and comments), as well as to attract new active members.
Reference lists
Akhmetov, S. (2018), "Big data: where to start", available at: https: //VK.ru/flood/37763-big-date-with-what-to-start (Accessed 1 January 2020).
Volkov, A. "How to clear the database of bots?", available at: https://vk.com/evo_marketing?w=page-41179708_51894717/ (Accessed 1 January 2020).
Brand Analytics Blog (2018), "Social networks in Russia: figures and trends, autumn 2018", available at: https://br-analytics.ru/blog/socseti-v-rossii-osen-2018/ (Accessed 2 January 2020).
Castells, M. (2020), "Our life is a hybrid of virtual and physical space", available at: http://ria.ru/interview/20120622/679289114.html (Accessed 02 January 2020).
Castells, M. (2004), The Internet Galaxy: Reflections on Internet, business and society, Translated by Matveev, A., U-Factoriya, Yekaterinburg, Russia.
Demis Group (2020), "Features of the VKontakte audience", available at: https://www.demis.ru/articles/celevaya-auditoria-VKontakte / (Accessed 3 January 2020).
VKontakte (2020), "Official VKontakte social network statistics", available at: https://vk.com/about (Accessed 3 January 2020).
Pisarevskiy, V. (2016), "Orthodox communities in the Internet resource space and their impact on social networks", Ph.D. Thesis, Institute of sociology of Russian academy of sciences, Moscow, Russia.
Russ, K. (2015), "Lent in Russia": statistics. Conversation with analysts of the research service "Wednesday", available at: http://www.pravoslavie.ru/jurnal/78328.htm (Accessed 04 January 2020)
VKontakte (2020), "Search service audience in social networks "target hunter", available at: https://vk.targethunter.ru (Accessed 4 January 2020)
Baym, B. K. (2013), "The data not seen: the uses and shortcomings of social media metrics", First Monday [Electronic], 10-7, vol. 18, available at: http://firstmonday.org/ojs/index.php/fm/article/view/4869/3750, (Accessed 5 January 2020)
Boyd, D. and Crawford, K. (2012), "Critical questions for big data", Information, communication and society, 15 (5), 662-679.
Boellstorff, T. (2013), "The creation of big data, in theory", First Monday [Electronic], 10-7, vol. 18, available at: http://firstmonday.org/ojs/index.php/fm/article/view/4869/3750 (Accessed 5 January 2020).
Carrigan, M. (2013), "What is digital sociology?", available at: HTTP://markcarrigannet/2013/01/12/что-это-цифровой-социология/ (Accessed 11 January 2020).
Castells, M. (2009), Communication power, Oxford University Press, Oxford, UK.
Kavanagh, A. (2007), Sociology in the Internet age, Open University Press, Berkshire, UK.
Crawford, K. (2013), "Hidden biases of big data", Harvard business review [Electronic], available at: https://hbr.org/2013/04/the-hidden-biases-in-big-data (Accessed 1 January 2020).
Couldry, N. and Fotopoulou, A. (2014), "Social Analytics: Digital phenomenology in the face of algorithmic power", available at: http://www.emeraldinsight.com/doi/full/10.1108/S1042-319220140000013002 (Accessed 9 January 2020).
Hand, M. (2014), Big data?Qualitative approach to digital research, Emerald Publishing.
Ruka, M. (2014), Digitization and memory: a study of practices for adapting to visual and textual data in everyday life, available at: https://www.researchgate.net/publication/289398324_Digitization_and_Memory_Researching_Practices_of_Adaption_to_Visual_and_Textual_Data_in_Everyday_Life, (Accessed 9 January 2020).
Kitchin, R. (2014), The Data Revolution: big data, open data, data infrastructures and their consequences, Sage, London, UK.
Lupton, D. (2012), Digital sociology: An introduction, University Of Sydney, Sydney, Australia.
Lupton, D. (2020), "Toward a Manifesto for public understanding of big data", available at: https://www.researchgate.net/publication/282871735_Toward_a_manifesto_for_the_'public_understanding_of_big_data' (Accessed 13 January 2020).
Lupton, D. (2015), Digital sociology, Routledge, London, UK.
Mayer-Schonberger, V. and Cukier, K. (2013), Big data: a revolution that will change the way we live, work, and think, John Murray, Lindon, UK.
Marres, N. (2012), "Redistribution of methods: on intervention in digital social research, widely conceived, Sociological Review Sociological commentary, 60, 139-165.
Orton-Johnson, K. and Prior, N. (2013), Digital sociology – critical perspectives, Palgrave Macmillan, London, UK.
Rogers (2013), Digital methods, MIT Press, Cambridge, MA, UK.
Ruppert, E., Law, J. and Savage, M. (2013), "Reassembling social science methods: the challenge of digital devices", available at: http://in the TCS.sagepub.com / content/start/2013/05/13/0263276413484941.multi-selection (Accessed 15 January 2020).