The Dutch Techcentre for Life Sciences: Enabling data-intensive life science research in the Netherlands [version 1; peer review: 3 approved with reservations] the other founding members of DTL Data (see

We describe a new national organisation in scientific research that facilitates life scientists with technologies and technological expertise in an era where new projects often are data-intensive, multidisciplinary, and multi-site. The Dutch Techcentre for Life Sciences (DTL, www.dtls.nl) is run as a lean not-for-profit organisation of which research organisations (both academic and industrial) are paying members. The small staff of the organisation undertakes a variety of tasks that are necessary to perform or support modern academic research, but that are not easily undertaken in a purely academic setting. DTL also represents the Netherlands in the ELIXIR ESFRI, and the office supports this task. The organisation is still being fine-tuned and this will probably continue over time, as it is crucial for this kind of organisation to adapt to a constantly changing environment. However, already being underway for several years on the path to professionalisation, our experiences can benefit researchers in other fields or other countries setting up similar initiatives. This article outlines the background, formation and to an extent the operation of the Dutch Techcentre for Life Sciences (DTL) with a focus on the data programme, DTL-Data. Given this type of organisation is of relevance to data researchers in other nations, researchers in other fields, and set against a background of increasing data complexity and institutional networking in the wider community, this could provide a model for other inititatives. In common with the other reviewers, there are some issues with the presentation which would help readers new to DTL to understand the background, structure and operation of this entity. To encourage similar organisations in other nations, the historical background is relevant, but could be split to a separate section - as a text box/diagram or similar to aid the flow of the description of DTL operations as they are now. the structure and role of the for (DTL), with a


Introduction
In this introduction we will explain the origin of DTL, the change in Dutch funding of life science technology that led to the start of the DTL foundation, and how the efforts of the DTL Data programme fit in the parallel development of professional data stewardship and knowledge structuring initiatives in science overall.

Timeline
During the preparatory phase of the ELIXIR ESFRI (elixir-europe. org), in 2012, several high profile bioinformatics and systems biology representatives started an initiative called DISC, The Data Integration and Stewardship Centre. They met several times to discuss the implementation of ELIXIR in the Netherlands. In parallel to this, the initiative to establish the Dutch Techcentre for Life Sciences was launched on the 31st of October 2012. The DTL organisation was started as a platform of leading universities, research institutes, university medical centres, science funders, government funding sectors ('topsectoren' in the Netherlands) and private companies from the health, nutrition, agrigenomics and industrial microbiology and information engineering sectors. We soon discovered that there was a significant overlap in the goals of the two initiatives, and it was decided to merge DISC into DTL as its Data programme. Starting from the 1st of January 2014, organisations have been signing up for formal membership of DTL.

Why was DTL started?
The initiative for DTL was based on the growing data challenge as well as the changing funding landscape in the Netherlands. From 2003 to 2013 significant funding in the Netherlands went to institutes developing technical services and techniques for life sciences. For example, the Netherlands Bioinformatics Centre (NBIC) operated between 2004 and 2014 as a nation-wide initiative of bioinformatics experts in academia and industry. These institutes were expected to foster technology research, drive the exchange of methodology among labs and translate these into technical services that other scientists could use. Around 2012 it became clear that in the future the development and use of these technologies would no longer receive similar direct funding, and that research projects that apply a technology would need to budget for that. Rather than letting the previous investment in the technological institutes go to waste and let life scientists each replicate the experience at their own institutes, the technology institutes decided to develop a work form in which they would continue to exchange expertise across technology disciplines, build up a collective and well-accessible research infrastructure (RI) and deliver the services required. This has led to the formation of DTL.
Next to the historical perspective, there are also forward-looking reasons for the start of the DTL organisation: • Technology programmes working together in a single organisation enable the application of what we call integrated life science research requiring the use of multiple technologies in a single research project, and the integration of generated and already available data.
• Members of DTL can collaboratively draw attention to the fact that the fundamental developments in the technology fields require more attention of both the collaborating research organisations as well as the national funding agencies. Together we can look for solutions to tackle these challenges.
• Establishing a collective technology platform of the major research organisations in the Netherlands provides further chances to establish international partnerships for individual member organisations or as a collective.
The organisational structure of DTL comprises three main programmes, as described below.
Governance and organisational programmes DTL is governed by a board that is advised by a scientific advisory committee, and operations are monitored by a board of representatives from the partner institutes. DTL has organised its actions in three areas, Data, Technologies and Learning, which run as individual but cross-connected programmes within the organisation.
DTL Learning, to start with the third area, manages an inventory of all training needs and offerings in life science technologies. It forms the bridge to the national Research School on Bioinformatics and Systems Biology (BioSB, biosb.nl) and other related research schools, and maintains contacts with all academic institutes that offer bioinformatics bachelor and master programs or postdoctoral training. DTL Learning also bundles expertise available in the DTL network, and organises both ad hoc and repeated training and courses on diverse subjects related to developments in the Data and Technologies programmes.
DTL Technologies bundles more than 100 research labs that offer support to life scientists with different technologies (so called technology hotels). These technology hotels include a wide coverage of a variety of experimental (e.g. next generation sequencing, proteomics, metabolomics, bioimaging) technologies as well as bioinformatics and systems biology expertise. DTL Technologies facilitates the contact between the technology hotels and external researchers as potential customers e.g. through the organisation of funding calls that encourage new collaborative projects. In the DTL Technologies programme we will also work on harmonising and optimising access to hotels to make it easier for life scientists to use the latest technological opportunities and access multiple facilities in parallel.
DTL Data brings together experts on every aspect of data stewardship, tools and databases, and e-Infrastructure. DTL Data builds relations for the people involved in the other DTL programs and partner organisations and connects to international initiatives such as ELIXIR, the pan-European life science research data infrastructure. The setup of DTL Data has gone hand in hand with more generic developments related to data and knowledge handling in the life sciences that we will address first.
Parallel developments: data stewardship and knowledge structuring The rise and wide application of modern data-intensive technological approaches in the life sciences has led to pressure on funders to provide support to keep acquired data around for longer than a project lasts. As such, initially in the US, and later in Europe, funding agencies have started demanding data stewardship to be an integral part of all scientific research projects. This is important because present-day research projects collect much data that intrinsically has more value than the first project will extract. Acquiring such data a second time is unnecessarily expensive, and this makes data stewardship a good investment. Furthermore, good data stewardship is required to make the work reproducible. In addition, proper structuring of knowledge sources that represent the aggregated and possibly curated findings of the body of previous research is of equal importance to fully enable integrative research.
DTL facilitates data stewardship and knowledge structuring in all associated projects through participation in the development and deployment of the FAIR initiative. The FAIR acronym stands for Findable, Accessible, Interoperable, and Reusable (datafairport. org). To allow data and knowledge sources to be findable and accessible by both humans and computer systems requires a standardised description of metadata and study capturing as well as longterm storage and proper licensing. Interoperability and reusability require the representation of data and knowledge in such a way that they can be easily combined and used for further analytical processing 1 .
To support practical implementation of good data stewardship, DTL and its Data programme are on a mission to bring together all experts that can help life scientists with different aspects of their data management, and to show life scientists that it is not efficient to do everything in house using local solutions.
The remainder of this paper describes the organisational structure and approaches of the DTL Data programme in more detail.

Content of the DTL Data programme
DTL-associated scientists and engineers are responsible for data integration and stewardship in various life science initiatives in different life science sectors. They bring expertise, reusable tools and databases that have been developed in the Netherlands or elsewhere, and have access to a shared e-infrastructure.
Bioinformatics and medical informatics expertise DTL brings together experts with a very diverse professional expertise in life science data management. This expertise is classified along four independent dimensions: • The life science sector: current activities are in health, agri/food, nutrition, and industrial biotechnology.
• Location: even though the Netherlands is a relatively small country, a local expert is sometimes preferred for an advice or in a collaboration.
• Phase in the data lifecycle: we distinguish expertise in planning an experiment, collecting data, data processing, data analysis, data and knowledge integration, and modelling. There is also underlying expertise in biostatistics, systems biology, instrumentation, data security, computing infrastructure, and computer science approaches.
All expertise can be classified along those four dimensions. To make all of this available to life scientists everywhere, we are working on setting up a network of local expert centres at different sites. Such expert centres can function as help desks: places where information can be obtained about the expertise available locally as well as elsewhere. Representatives of the expert centres are involved in frequent contact with each other to learn about new developments and learn of each other's experience (both in techniques and in organisation). Over time, DTL will also extend its own help desk that can guide people to the right expert centres.
A very important mission of DTL Data is to prevent projects from running into problems because of unconscious incompetence; we want to facilitate early interaction between life scientists with a specific plan and experts in all the technical fields that they need to engage, to avoid underestimating technological tasks or risks.

Tools & databases
Many of the experts collaborating in the DTL Data programme have (co)developed reusable tools and databases. For such tools there is ample experience to implement their use in different projects. Such tools can often be reused by a new project in an existing shared deployment with dedicated help for users. In other cases, specialised installations of the software can be made, tailored to the project. DTL has a strong preference for reuse of existing tools, which have proven their value in earlier national or international projects. Advantages of such tools are that they have overcome their teething problems, that their continued development benefits multiple projects, and that the reuse increases interoperability with other tools and existing data.

e-Infrastructure
In the past, many life science labs have each been taking care of their own needs for computing. More and more, however, the need for data processing becomes too large to handle. Furthermore, server system maintenance is not a core competence of a life scientist, and keeping a local cluster running should not be the task of a PhD candidate. Computing and data storage are becoming an infrastructure: equipment that nobody can do without, and which is inefficient to duplicate for every project. Many groups are therefore no longer willing to maintain the needed infrastructures themselves, and set up institutional services together employing specialised people for maintaining the computing equipment. Additional benefits of such centralisation efforts are flattening-off peak demands and allowing individual projects to be run at relatively short notice. Also, it reduces the need for synchronising new equipment purchase with the start of new projects, which without central facilities results in waste for short projects and the use of outdated computing resources for longer projects. DTL brings experience from centralisation efforts together, and ensures alignment with the national centres for computing. Together, these people work on harmonising the computer centres so that migration of computing work and federation of resources become easier. When a new data intensive life science project is started with new demands for computing or storage, the best solution for the location of such computing is found in collaboration.
The e-Infrastructure that can be shared is not limited to the computer racks (Infrastructure as a Service, IaaS). We also investigate possibilities for sharing higher level platforms (Platforms as a Service, PaaS), for example the workflow supporting software Galaxy 2 , which has been supported by the Netherlands bioinformatics centre in the past, and potentially other shared infrastructures for systems biology. We are also working together on a shared data publishing infrastructure based on experience from the Open PHACTS project 3 .

Organisation of the DTL Data programme
Organisational structure and facilitation The DTL Data programme is coordinated by a programme manager from the DTL Office. All projects are executed by DTL partners, outside of the office. The primary organisation of DTL Data is per sector of life science research ( Figure 1). We organise several kinds of meetings for different target groups, which we have identified as fulfilling an urgent need: project leader meetings, programmer meetings and so-called focus meetings. We also identify people with similar interest and facilitate interest groups and working groups with their own meetings. Each of these types of events will be described in more detail.

Project leader meetings
Within each of the life science Sectors, DTL Data brings project leaders together who are each functionally coordinating the progress in a particular project.
For the healthcare sector, this is a continuation of a weekly project leader meeting that has been running since 2009, and involves 10 project leaders meeting 60 minutes every week. These meetings are conducted as teleconferences where the participants collaboratively edit the meeting notes. This style of focused reporting of what has been accomplished and what is planned builds trust between the project leaders and leads to many accidental discoveries of potential synergy between their projects. This results in cost savings for the projects and does not stand in the way of healthy competition. These meetings also provide a direct connection to TraIT, the IT project for the Dutch translational medicine project CTMM.
The other sectors (agrigenomics, nutrigenomics and industrial biotechnology) are now in the process of setting up similar meetings. The principle project leaders who will be leading these meetings have been identified. These four principal project leaders will be meeting together on a monthly basis to discuss progress and to identify synergies between the sectors.

Programmers meetings
Many of the programmers involved in the bioinformatics projects in the different sectors of DTL Data are so-called embedded programmers, often the only bioinformatician in a biology or medical setting. Others work together in groups. In DTL Data, we call programmers from both settings together every two months for lectures and workshops on topics ranging from programming techniques to biological applications. Sometimes we invite external speakers, but most topics are presented by members of the group. This way they keep each other informed. At these meetings we also encourage interactions between programmers in smaller groups.

Focus meetings
During our work we regularly recognise similar problems or solutions being raised in more than one context. For such topics we organise focus meetings. A focus meeting brings together a group of people that preferably have never met in that composition, to discuss a subject that is either crossing borders between technologies or between sectors. Focus meetings are not only organised by DTL Data, but also by the DTL Technologies and DTL Learning programmes. A focus meeting often contains a few short lectures, followed by a well-prepared discussion that engages the whole audience. After the meeting, a white paper is written by the organisers of the meeting that is published on the DTL website.

Interest and working groups
If a group of people, e.g. after a focus meeting, feels the need to exchange experience more often, they can form a so-called interest group within DTL. DTL facilitates these interest groups with meeting rooms, and tries to find a young researcher as a champion of the group to keep it going. This is modelled after "Project and Area Liaisons" (PALs) from earlier EU and UK projects 4 . PALs are rewarded for introducing new ways of working: they are provided with extra support for their work and direct influence on the development of the new working methods.
An interest group that has identified an issue they want to work on together can form a working group. A working group needs to be supported by a part-time project leader to take the practical work out of the hands of the principal investigators. Each working group must deliver a practical result (deliverable) after a limited time. DTL is looking for ways to support the working groups by providing resources for the project leaders.
Both interest groups and working groups can be supported with a good software development environment, mailing lists, a website and a wiki to exchange information.

Relations with other DTL programmes
The data programme interacts with many organisations, both internal to DTL (other programmes and partners) as well as external (for instance IMI projects and RIs under the EC ESFRI scheme).

Help desk, training and education
In the day to day operations of the Data programme, we frequently come across needs for training: both training for data scientists to broaden their knowledge with newly developed technologies, as well as training for life scientists to make them aware of and teach them how to use solutions that are being developed in DTL Data projects. This is expected to become even more important once the development of local data desks in different institutions will be realised. The setup of these data desks will bring together experienced data scientists from different institutes, and they will find out that others have complementary expertise that they sometimes need to replicate. Also, life scientists with less experience will have a low barrier to approach their local data desk for advice, bringing in more demand for basic data awareness training. All of these training needs will be developed with the DTL training Programme, which is very well connected to people and organisations that can support this effort.

Data-related technology hotels
Many of the people involved in DTL Data offer their services to life scientists as a Data hotel in the DTL Technologies Programme. DTL Data works with DTL technologies to define the needs of and requirements for these data-specific hotels. An overview of current DTL hotels is available at www.dtls.nl/expertise-services/ hotels.

Relations with external programmes ELIXIR
Synchronous with the development of the DTL organisation, bioinformatics institutes and laboratories all over Europe have set up the European research infrastructure for life science data and bioinformatics, ELIXIR. ELIXIR is organised as a hub hosted at the EBI in Hinxton, UK, and nodes in each of the member countries. In the Netherlands, DTL hosts the ELIXIR node (ELIXIR-NL). Association with ELIXIR gives us the possibility to reach out to experts and tools all over Europe.
DTL and ELIXIR have developed the concept of so-called Bring Your Own Data (BYOD) parties as a platform to bring together data owners and data experts. Also biological domain experts are invited where relevant. The main goal of these meetings is to get data owners acquainted with the possibilities to connect and functionally interlink their data with other datasets and knowledge resources by applying FAIR principles. Researchers can suggest a BYOD party and DTL will assist with the logistics and invite data experts.

Other ESFRI programmes and national projects
Europe has many other Research Infrastructures in the life sciences, each with their own special focus. Also in the Netherlands several larger project organisations are active in life science research. All of these have their own research data and associated challenges. In the Netherlands we make sure that the people working with that data are co-developing and steering the DTL Data Programme. This ensures that the methods and tools they use are compatible with the ELIXIR choices and avoids unnecessary duplication of development efforts.

Conclusion
Life science research becomes more and more data intensive and cross-disciplinary at unprecedented scales. Individual research groups do not have the resources and the interest to keep in contact with all expert providers and keep informed of the progress of other related projects at such scales. In the Netherlands we have developed a networked approach to accommodate for the challenges posed by modern data-intensive life science research. The establishment of DTL as a collective platform that brings together experts in various technological disciplines across life science domains, facilitated by a small core team, allows projects to run efficiently. Already in the preparatory period and in the first year of operations we have identified synergies between parallel running research projects and found common interests from surprisingly

Author contributions
All authors have been involved in the setup of DTL Data and have contributed to the text of the manuscript.

Competing interests
No competing interests were disclosed.

Grant information
This work has been funded by the author's home institutes.

Acknowledgements
Next to the authors, the following people have played instrumental roles in setting up DTL Data, and all share a DTL affiliation: • Jan-Willem Boiten, CTMM-TraIT, Eindhoven

Open Peer Review
In common with the other reviewers, there are some issues with the presentation which would help readers new to DTL to understand the background, structure and operation of this entity. To encourage similar organisations in other nations, the historical background is relevant, but could be split to a separate section -as a text box/diagram or similar to aid the flow of the description of DTL operations as they are now.
Whilst DTL-Data is discussed in some detail, the other sections are briefly mentioned and its challenging to gain a picture of the holistic DTL and the competencies necessary to establish a similar organisation in other nations. Again, as noted by the other reviewers, an outlook comparing other national efforts, their relation to DTL, what may have particularly aided development of DTL in the Netherlands from both a structural/funding and cultural/operational perspective would be useful. Similarly examples of FAIR data principles, information on some of the tools and databases mentioned but not named could help researchers in other nations contextualise the assertions made. This may be of more value to those less familiar with the operations of DTL than detailed description of the meetings which would be better summarised in less detail.

Competing Interests:
No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 29 Dec 2015
Lars Eijssen, Maastricht University, 6229 ER Maastricht, The Netherlands Thank you for your feedback. Herewith our responses to the points raised.
Response to "In common with...as they are now": According to the suggestion, we have added a separate 'timeline' box to describe the historical context of DTL. Furthermore we have made more clear right from the opening of the paper that it is mainly about the Data programme, and described the remaining DTL context more in connection to this focus.
Response to "Whilst DTL-Data...in other nations": Even when trying to keep the text on the other parts of DTL brief, and making more clear in the text that this paper specifically addresses the DTL Data programme, we have added an extra short paragraph and a diagrammatic figure ( Figure 1 and the paragraph directly preceding it) on the organisation of DTL as a whole.
Response to "Again, as noted by...summarised in less detail": We rephrased the 'Why was DTL started' subsection to more clearly introduce its benefits and reasons for setup and to extend the description of the projects that were running in the Netherlands before DTL, to better sketch the national operational and funding perspectives. We have substantially reduced the level of detail in our description of the meetings. For (stewardship and other) tools and databases, we deliberately had not included examples in order not to select some above others, since DTL partners are the entities that are offering and supporting tools, rather than DTL centrally, which does not enforce specific ones. We now included a paragraph to the manuscript (also in response to a similar point made by the first reviewer) clarifying this aspect and mentioning the relationship between DTL and ELIXIR with respect to tool selection, referring to the latter's tool registry ('Tools & databases' subsection).
specific emphasis on the DTL Data programme. The DTL is an important initiative in the Netherlands to support effective and efficient data-intensive life science research throughout the Dutch research community by facilitating the connection of researchers with computational tools, expertise, and infrastructure.
As the first reviewer has identified, there are some structural issues in the presentation of this paper that could improve its focus and clarity. There are also some more minor issues, for instance a lack of background on certain referenced programs when first mentioned (ELIXIR is introduced in the second paragraph of the Introduction, but not defined until several paragraphs later; not all readers may be familiar with the program).
Elaborating on the first reviewer's point about e-infrastructures, it would be helpful for the authors to relate the model described in the paper, as developed for the DTL, to other possible models that exist, for instance the National Centers for Biomedical Computing in the US (see e.g., http://jamia.oxfordjournals.org/content/jaminfo/19/2/151) or bioinformatics core facilities in place at a more local level (see e.g., the discussion at http://bioinformatics.oxfordjournals.org/content/27/10/1345). Given the emphasis on data management and computing infrastructure of the DTL Data programme, cloud-based generic eresearch infrastructure supported at a national level (e.g. for social sciences research or the Australian National eResearch Collaboration Tools and Resources (NECTAR), e.g., the Genomics Virtual Laboratory and other Australian research infrastructure programs) are also relevant. I believe that understanding how the DTL Data model is different or unique in the global context is important to support the authors' goals of providing insight to new efforts. Furthermore, it would be helpful to more clearly distinguish the role of the DTL from the national centres for computing to which the DTL is aligned (p. 4).
There are a few minor issues with wording choices that the authors may wish to revisit, e.g. the phrase "the path to professionalisation" --in what sense is the organisation being "professionalised"? (this recurs in the phrase "professional data stewardship") --and the phrase "unconscious incompetence" which sounds perhaps more severe than the authors intend. Response to "Elaborating on ... DTL is aligned (p. 4).": An important distinction between DTL and other institutes with similar functions in other countries is that DTL was not set up as an institute by a (national) funding organisation (like e.g., the National Centers for Biomedical Computing in the US, and NECTAR in Australia), but as a collaboration institute funded primarily by partner organisations. Where such bottomup efforts to set up a supporting organisation is seen, they are often localised to a single research institute and rarely started as a public-private partnership. We have also included this text in the manuscript, including suggested references, in the 'Why was DTL started?' section.
Related to the last point, DTL has more a role as an orchestrator between research projects, generic computing initiatives, and national computing centres. As suggested by the first reviewer, we added some international initiatives to this phrase, widening the scope. Furthermore, we modified the next phrase ("Together these people work…") to "DTL links to and between the people that work…" to make DTL's role as a connector more clear.
Response to "There are a few minor...the authors intend.": We removed both occurrences of professionalisation/professionalised, as they were superfluous. The phrase 'unconscious incompetence' was a deliberate choice, as a term coined to indicate the issue not being aware of not doing something according to standards. We added a reference to a paper by Kruger et al. on this topic.
such an infrastructure, there are some bigger and smaller points that would need to be addressed to be able to give this paper an "approved". The focus is slightly unclear: the abstract and introduction describe the DTL in its entirety, including its history, while more than three quarters are focused on the services related to the last area of activity listed, "DTL Data". The manuscript would benefit hugely from a reorganisation that makes the main focus clearer right at the beginning. It might also be worth considering to put the description of the background and origin (timeline, why was DTL started) at the end of the manuscript. Alternatively, the authors might want to consider leaving the history out of the manuscript completely: at present, the first point that seems to be mentioned is that funding became more scarce, rather than focusing on the benefits.
Change to: "More and more, however, the data to be processed becomes too large to handle." ○ Conclusion: "...and found common interests across researchers with a focus in surprisingly different disciplines. ... makes sure NECESSARY data-related expertise.." ○ ELIXIR should be listed by its name (without "ESFRI"), with the full name at first mention ○ Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.