Keywords
Search engines, search engine optimization (SEO), paid search marketing (PSM), online survey, user studies, searcher attitudes, awareness, external influences
This article is included in the Data: Use and Reuse collection.
Search engines, search engine optimization (SEO), paid search marketing (PSM), online survey, user studies, searcher attitudes, awareness, external influences
As can be seen from our responses to reviewer comments, we have made minor changes to the data note, mainly relating to formatting errors or language inaccuracies. Some of the comments need to be clarified with the reviewers. As soon as this is done, we will happily address the remaining comments in further revisions.
See the authors' detailed response to the review by Melius Weideman
See the authors' detailed response to the review by Lluís Codina
Representative surveys are suitable for gaining a better understanding of how users interact with search engines, how they understand them, and what opinions they have about them. However, such studies are quite rare and usually refer to individual subareas, such as frequency of use (Beisch & Schäfer, 2020) or trust in search engines (Edelman, 2020), while ignoring other areas, such as paid-search marketing (PSM) and search engine optimization (SEO).
SEO “is the practice of optimizing web pages in a way that improves their ranking in the organic search results” (Li et al., 2014). The SEO industry is one of the major stakeholder groups regarding search results of commercial search engines like Google (Röhle, 2010). Although the SEO industry generates billions in revenue (tbrc.info, 2021), little is known about whether search engine users are aware of SEO and what they think about it.
To close this gap, we conducted an online survey in 2020 with a sample representative of the German online population. Questions on SEO are the focus of the survey, as it was conducted as part of the SEO Effect project, funded by the German Research Foundation. The overall goal of the project is to describe and explain the role of SEO from the perspective of the participating stakeholder groups, one of them being the users. A total of 999 people participated in the online survey on a large screen (e.g., desktop PC), and 1,013 on a small screen (smartphone). The online survey included several search engine-related sections (Schultheiß et al., 2022). Some of the questions were self-developed and others were adopted from other studies. This data set contains the full data from the online survey.
We conducted a representative online survey with German internet users. The survey was carried out as part of the SEO Effect project in cooperation with the market research company Fittkau & Maaß Consulting (hereinafter abbreviated as F&M) between March and April 2020. F&M performed the following services, all in consultation with the project team:
• programming of the survey using FileMaker as a database (January 13 - February 27, 2020)
• conducting of the survey (March 2 – April 9, 2020)
• data analysis and reporting (April 2020)
The subjects were recruited through the online panel provider respondi, which is in cooperation with F&M. An online panel is a sample database with a large number of people (often one million or more). These people have agreed to be available as potential respondents in surveys, as long as they meet the selection criteria for the particular study (Callegaro et al., 2014). In the next section, the sample is discussed in detail.
We used a sample that is representative of the German online population according to the criteria applied by “Arbeitsgemeinschaft Onlineforschung” (working group online research; AGOF). For sampling, the characteristics age, gender, and state were used. The population includes German internet users from the age of 16 to 69 years. Based on two subsamples to be formed (see below), both of which had to meet the same requirements regarding representativeness, we intended a minimum sample size of N = 2,000 subjects (recommended by F&M) and achieved a sample size of N = 2,012 subjects.
From the total sample, two sub-samples of N = 999 subjects (large screen) and N = 1,013 subjects (small screen) were formed, which meet the same requirements regarding representativeness described above. Sample 1 attended the survey with a large screen (e.g., desktop PC, laptop, tablet; group “large screen”), sample 2 with a smartphone (group “small screen”).
To assign the subjects to one of the two groups, the panel provider detected the user agent string to determine which device and browser the potential subject was using and assigned the participants accordingly. The correct assignment of the test persons was checked by respondi and F&M. The online panel provider respondi checked the devices used by the subjects before forwarding them to the questionnaire. In addition, the devices used by the subjects were verified by F&M as part of the plausibility check of the data by using the user agent string. The subjects were invited to the survey by e-mail. Each participant received 0.75 euro for complete participation. Since we used a sample that is representative of the German online population, we do not assume biases regarding the composition of the sample. However, it should be mentioned that the online survey may have also addressed people who participated solely because of the compensation.
First, we developed a catalogue of questions. We derived questions for the survey from the objectives of the “SEO Effect” project, from findings of expert interviews (Schultheiß & Lewandowski, 2021d), and from literature research (In Scopus, we searched for surveys that included “search engine” and “information literacy” (or synonyms)). After preparing the questions, we sent them to the market research company (F&M). F&M made recommendations regarding the sequence and formulation of the questions as well as suggestions for new questions, which we included.
In several feedback rounds, we jointly created the final version of the questionnaire (see Table 1). In the introduction to the survey, we first welcomed the respondent and thanked him/her for participating. We also pointed out that the questionnaire is used exclusively for research purposes and that by participating, the respondent agrees to the attached privacy policy of F&M.
To give the subjects the opportunity to obtain background information on the survey and to be able to contact the project team, e.g., for feedback purposes, we provided a link to our website at the end of the survey.
The subjects completed 12 sections within the survey as shown in Table 1:
I. Screening
II. Usage behavior
III. Self-assessed search engine literacy
IV. Trust in search engines
V. Query match
VI. Knowledge of search result influences
VII. Knowledge of keyword-related advertisements (i.e., paid search marketing (PSM), (Li et al., 2014))
VIII. Knowledge of SEO
IX. Ability to distinguish ads from organic results
X. Assessments and opinions regarding SEO
XI. Personalization
XII. User profile
The authors in collaboration with F&M have taken care to ensure that the questions are formulated in a way that is understandable for all respondents in the sample. Most of the questions are closed questions. They include rating-scale questions, single and multiple response questions, and questions with marking options for search engine results page (SERP) screenshots. In addition, the survey includes open-ended questions, e.g., “What do you think: Where does Google generate most of its revenue from?” Open-ended questions are particularly suitable for knowledge questions, since in contrast to closed questions, it is not possible to answer a question correctly by chance. A disadvantage of open-ended questions is the required subsequent coding of the answers (Krosnick & Presser, 2010).
The survey was conducted in the German language. The translated questionnaire is shown in Table 1. The names of the corresponding variables within the data set is included in our research data (Schultheiß et al., 2022) and the original questionnaire in German can be found as part of the research data (Schultheiß et al., 2022).
We created eight SERP screenshots for the marking tasks A-D (each task in variants “large screen” and “small screen”). The screenshots are available as part of the research data (Schultheiß et al., 2021).
SERPs A and B were assigned to block I (simple), SERPs C and D to block II (difficult). Two blocks were created to address a variety of SERP elements and to differentiate between basic and complex SERPs. The structure of the two SERPs per block is identical in terms of the elements on the SERP.
Each participant received two tasks, one from block I and one from block II, as shown in Table 2. The SERP for each task was shown two times. First, all ads were to be marked and second, all organic results.
Block | Task | Query English (German) | Elements on SERP |
---|---|---|---|
block I (simple) | A | tax return help (steuererklärung hilfe) | |
B | legal advice (rechtsberatung) | ||
block II (difficult) | C | apple iphone | |
D | samsung galaxy |
The screenshots were created using the desktop version of the Chrome browser:
1. User agent: The browser extension User-Agent Switcher for Chrome version 1.1.0 was used to simulate the smartphone (group “small screen”) within the desktop browser (group “large screen”):
2. Window size and page zoom: To create screenshots with high resolution, we used the following settings:
3. Screenshot: The add-on GoFullPage version 7.1 was used to capture full-page SERP screenshots as PNG files. For each query, the first three SERPs were saved to be able to exchange results during later image processing.
4. Image processing: We used GIMP version 2.10.14 (GIMP development team, 2020) (RRID:SCR_003182) to reduce the SERPs to the elements we wanted to investigate (see Table 2). We also matched the small screen SERPs with the large screen SERPs in terms of results and their positions. Otherwise, different selection behavior in the survey might not have been due to the SERP layout (large vs. small screen), but to partially different results (positions):
a. Large screen:
b. Small screen:
i. The results of the small screen SERPs as well as their positions were aligned with the large screen SERPs. Consequently, the large and small screen SERPs for a query only differed in terms of layout, but not in terms of results and their positions.
ii. Due to the specifications of F&M, the final large screen SERPs were reduced to a width of 360 px.
Before the survey was conducted, pre-tests were carried out in February 2020 by the members and student assistants of the research group (N = 7) and by the panel provider. This enabled us to test whether problems arose, e.g., regarding comprehensibility, and to eliminate them beforehand.
In the pre-test, problems arose regarding the plausibility of the questionnaire which needed to be fixed before launching the survey. The panel provider checked the survey internally with colleagues to ensure that it was coherent and comprehensible. The duration of the survey was also checked. The maximum duration of 15 minutes as recommended by F&M was met in the pre-tests. Suggestions of the pre-test subjects were also incorporated. These concerned some minor aspects, such as the optical highlighting of relevant parts of a question (e.g., “Are there any search results on this page that can be influenced by search engine optimization?”). These recommendations were also implemented. After the pre-test, the soft launch started, in which the responses of those subjects who completed the survey first were carefully analyzed. Since the soft launch was successful, the survey could start as planned and the data of the soft launch subjects could also be included in the analysis.
Due to the design of the research, we consider the study to be of very low risk for participants. Accordingly, we did not obtain ethical approval. The market research company (F&M), which carried out the survey in cooperation with us, operates according to the principles of the UN Global Compact. This means that F&M operates in a way that fulfils fundamental values regarding human rights, labour, environment, and anti-corruption. Written consent to process their data was obtained from all participants. When registering with online panel provider respondi, participants agreed to the use of their data. For those participants who were minors (16 and 17 years old), parental consent was not required, since “the processing of the personal data of a child shall be lawful where the child is at least 16 years old” (see Article 8 EU GDPR). Data were analysed anonymously. We had no direct contact to the subjects.
Coding and grouping
Table 3 lists the open-ended questions and the coding specifications. The answers to the knowledge questions were only differentiated into “correct”, “partly correct”, and “incorrect”, since no specifications were made regarding the number of elements (e.g., SEO techniques; question no. 7.3) to be mentioned. The coding of the open-ended questions was done by one coder, which we considered adequate because the coding did not leave any significant room for interpretation.
Table 4 shows how the topics from professional activity, training, and studies have been grouped in terms of SEO affinity (low, average, high). To group the topics, we examined module handbooks of the studies for intersections with the SEO topic. In the case of training and professional activity, e.g., pedagogy, we examined corresponding studies, e.g., educational science.
Success rates for marking tasks
Table 5 shows the search results to be marked on the SERPs according to the task, device, and area (SEO or PSM).
Based on the marked elements, a success rate was calculated for each participant per task (A-D), device (large, small), and area (SEO, PSM). This rate accounts for correctly marked (true positive) and incorrectly marked (false positive) results using the formula .
Two examples follow, the first for achieving a positive success rate for task A, large screen, SEO results. In this case, 10 organic results are to be marked, of which the subject marks 8 results (8 true). In addition, the subject incorrectly marks 2 ads (2 false). This results in a success rate of 0.6. Negative success rates are also possible, if a subject makes more incorrect than correct markings, exemplified by task B, small screen, PSM results. In this case, a total of 4 text ads are to be marked. If a subject identifies all 4 text ads (true), but additionally marks 6 organic results (false), the subject achieves a success rate of -0.5.
For the calculation of the success rates and the corresponding variables of the data set, see Appendix 1: Calculation of success rates.
OSF: SEO-Effekt/Online survey. https://doi.org/10.17605/OSF.IO/PG82E (Schultheiß et al., 2022)
This project contains the following underlying data:
OSF: SEO-Effekt/Online survey. https://doi.org/10.17605/OSF.IO/PG82E (Schultheiß et al., 2022)
This project contains the following extended data:
- SERPs.zip (screenshots of SERPs for marking tasks)
- variables English (names and descriptions of all variables; English)
- variables German (names and descriptions of all variables; German)
- Working Paper_online survey.pdf (Working paper with information on background, methods, and results of the survey)
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: SEO, information retrieval
Is the rationale for creating the dataset(s) clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Yes
Are sufficient details of methods and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Search Engine Optimization, Digital News Media
Is the rationale for creating the dataset(s) clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Partly
Are sufficient details of methods and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: SEO, information retrieval
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 12 Sep 22 |
read | |
Version 1 31 Mar 22 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)