Neurosharing: large-scale data sets (spike, LFP) recorded from the hippocampal-entorhinal system in behaving rats

Using silicon-based recording electrodes, we recorded neuronal activity of the dorsal hippocampus and dorsomedial entorhinal cortex from behaving rats. The entorhinal neurons were classified as principal neurons and interneurons based on monosynaptic interactions and wave-shapes. The hippocampal neurons were classified as principal neurons and interneurons based on monosynaptic interactions, wave-shapes and burstiness. The data set contains recordings from 7,736 neurons (6,100 classified as principal neurons, 1,132 as interneurons, and 504 cells that did not clearly fit into either category) obtained during 442 recording sessions from 11 rats (a total of 204.5 hours) while they were engaged in one of eight different behaviours/tasks. Both original and processed data (time stamp of spikes, spike waveforms, result of spike sorting and local field potential) are included, along with metadata of behavioural markers. Community-driven data sharing may offer cross-validation of findings, refinement of interpretations and facilitate discoveries.

We recorded activity of neurons in these brain regions while animals performed various tasks, such as linear track, open maze, T-maze with wheel running delay, plus maze and zigzag maze, as well as recordings during sleep in the home cage. Extensive technical descriptions of the data sets described in this document are available in several published papers 6,21-27 .
Several questions related to memory, navigation, spike time patterns, population coding, neuronal interactions, neuronal classification, replay, sleep homeostasis and oscillations have been studied based on this dataset 6,21-41 . However, this dataset may provide valuable information if subjected to yet further analyses. Improved spike sorting, neuron classification and more sophisticated analyses may extend and refine the initial conclusions and offer insights that were previously missed. For these reasons we provide both unprocessed (wide band) and processed versions of our data. In our experience, all methods have limitations and must undergo continuous revision. We believe that community-driven data sharing, cross-validation of data, unified data formats and large collaborative efforts will facilitate discovery and benefit future progress in neuroscience.

Material and methods
Animal surgery All protocols were approved by the Institutional Animal Care and Use Committee of Rutgers University (protocol No. 90-042), and all experiments were performed at Rutgers University. Before surgery, one to four rats were housed in a single home cage (made of plastic; size L = 45 cm, W = 23.5 cm, H = 20 cm). Wood shavings were used as bedding and dry pellets were provided as food. The animals were housed in a temperature controlled (68°F), but not a specific pathogen free, environment under 12:12-hours light:dark cycle where light cycle was from 7AM to 7PM. After surgery, the rats were housed individually, and highly absorbent paper (Techboard, Shepherd Speciality Papers) was used as bedding, and the animal's health was assessed daily by the experimenters.
In two rats (f01_m and g01_m), two silicon probes were implanted (one in each hemisphere) and targeted CA1 region. In three rats (gor01, pin01 and vvp01), two probes (32-and/or 64-site silicon probes) were implanted in the left dorsal hippocampus, targeted to CA1 and CA3 separately, and advanced over sessions and days through overlying neocortical and hippocampal tissue. The probe positions were: rat pin01: CA3: at a 35 degree angle to coronal plane, centered on 2.8 mm posterior and 2.6 mm lateral to bregma. CA1: 26.5 degree angle to vertical, at a 35 degree angle to coronal, centered on 4.6 mm posterior and 2.4 mm lateral to bregma; rat vvp01: CA3: at a 26.5 degree angle to coronal plane, centered on 2.8 mm posterior and 2.6 mm lateral to bregma. CA1: 26.5 degree angle to vertical, parallel to sagittal plane, centered on 4.4 mm posterior and 2.3 mm lateral to bregma; rat gor01: CA3: at a 26.5 degree angle to coronal plane, centered on 3.1 mm posterior, and 3.0 mm lateral to bregma. CA1: 26.5 degree angle to vertical, at a 45 degree angle to coronal plane, centered on 4.9 mm posterior and 1.5 mm lateral to bregma. In four rats (ec013, ec014, ec016 and i01_m), 32-or 64-site silicon probe(s) were implanted in the right dorsal hippocampus and recorded from CA1, CA3 or dentate gyrus, and another 4-shank silicon probe was implanted in the right dorsocaudal medial entorhinal cortex. In one rat (ec012), one 4-shank silicon probe was implanted in the right dorsocaudal medial entorhinal cortex. In rat ec012, ec013, ec014, and ec016, the probe targeting the entorhinal cortex was positioned such that the different shanks recorded from different layers 21 (4.5 mm lateral from the midline; 0.1 mm anterior to the edge of the transverse sinus at a 20-25 degree angle in the sagittal plane with the tip pointing toward the anterior direction). In rat i01_m, the EC probe had 4 shanks and was positioned such that all shanks recorded from the same layer. For the hippocampus probe in rats ec013, ec014 and ec016, the shanks were aligned parallel to the septo-temporal axis of the hippocampus (45 degrees parasagittal), positioned centrally at 3.5 mm posterior from bregma and 2.5 mm lateral from the midline.
For all silicon probes used, each shank had eight recording sites (160 µm 2 each site, 1-3-MΩ impedance), and intershank distance was 200 µm. Recordings sites were staggered to provide a two-dimensional arrangement (20 µm vertical separation) 44,45 . The individual silicon probes were attached to respective microdrives and moved independently and slowly to the target. Two stainless steel screws inserted above the cerebellum were used as indifferent (reference) and ground electrodes during recordings. At the end of the physiological recordings during the behavioural tasks, a small anodal DC current (2-5 µA, 10 s) was applied to recording sites 1 or 2 days before rats were deeply anesthetized and euthanized by perfusion with 10% formalin solution. The positions of the electrodes were confirmed histologically and reported previously in detail 21,24 .

Behavioural testing
After the animals recovered from surgery (1 to 2 weeks), physiological signals were recorded during eight different types of behaviours mostly during light cycles (see Table 1).
(1) On an elevated linear track (250 cm × 7 cm), the animals were required to run back and forth to obtain water reward at both ends 21 . In three animals (gor01, pin01, and vvp01), a similar elevated track was used (170 cm × 6.2 cm, with 22 cm × 22 cm end platforms) that was shortened to 79 or 125 cm in some trials 23,24 .
(2) In the open field task, the rats chased randomly dispersed drops of water or pieces of Froot Loops (25 mg; Kellogg's) on an elevated open platform 21 (180 cm × 180 cm, 120 cm × 120 cm or 100 cm × 200 cm).
(3) In the rewarded wheel-running task, a wheel (diameter = 29 cm) was attached to a rectangular-shape box (39 cm × 39 cm × 39 cm). The rat was required to run in the wheel continuously for 10 seconds, after which time a piece of Froot Loop was dropped in the box as reinforcement 21 .
(4) In the alternation task in the T-maze (100 cm × 120 cm) with wheel running delay, the animal was required to run on a wheel attached to the waiting area for 10 sec, after which time the animal had access to the central arm of the T-maze, at the end of which the animal chose to turn right or left. The animal was rewarded with water if the choice was opposite to the previous one 6 .
(5) In the elevated plus maze (100 cm × 100 cm), the rats were motivated to run to the ends of four corridors, where water was given every 30 s.
(6) In the zigzag maze (100 cm × 200 cm) with 11 corridors, the animals learned to run back and forth between two water wells; 100 µl of water was delivered at each well 21,22,25,46 .
(7) In the wheel-running in home cage, a wheel (diameter = 29 cm) was attached to a rectangular-shape box (39 cm × 39 cm × 39 cm) which was used as a home cage during the experiment. Rats had free access to the wheel, and ran on the wheel with no reinforcement.
(8) In the sleeping session, the rat slept in the home cage.
For recording of behaviour (1) to (6), animals were water-scheduled for 23 hours prior to the experiment. Otherwise, both dry food and water were provided ad libitum. For tracking the position of the animals, two small light-emitting diodes, mounted above the headstage, were recorded by a digital video camera (SONY) at 30 Hz resolution.

Data collection and cell-type classification
Detailed information about the recording system and spike sorting has been previously described 21,24,42 . Briefly, signals were amplified (1,000×), bandpass-filtered (1 Hz-5 kHz) and acquired continuously at 20 kHz (DataMax system; RC Electronics) or 32,552 Hz (NeuraLynx, MT) at 16-bit resolution. After recording, the signals were down-sampled to 1,250 Hz (DataMax system) or 1,252 Hz (NeuraLynx system) for the local field potential (LFP) analysis. In electrophysiological recordings, positive polarity is from zero toward positive values. To maximize the detection of very slowly discharging ('silent') neurons 47 , clustering was performed on concatenated files of several behavioural and sleep sessions recorded at the same electrode position on the same recording day 22,25-27 . We made extensive use of publicly available analytical and display programs, which were developed in our laboratory (KlustaKwik 48 available at http://sourceforge.net/projects/klustakwik/, Neuroscope 49  Table 2- Table 4).
The tip of the probe either moved spontaneously relative to the brain or was moved by the experimenter between recording days to record from potentially different sets of neurons. However, we cannot exclude the possibility that some neurons recorded on different days were identical, because spikes recorded on each day were clustered separately, though in some instances neurons were recorded over multiple days. When we moved the electrodes, we waited for at least an hour before recording in order to stabilize the position of electrodes.

Data description
The data are available 50 at CRCNS.org (http://dx.doi.org/10.6080/ K09G5JRZ). Details of the data collection, processing and storage of data into files are included with the data set, including scripts useful for processing the data 50 . Here, we briefly summarize the data description.
The number of cells recorded from each animal and brain region is shown in Table 2.
Most of the recorded cells were classified as principal neurons or interneurons. The number of cells classified as principal and interneuron are shown in Table 3 and Table 4.
The 8 types of behaviours (see Behavioural Testing section) were further subdivided into 14 behaviour subclasses based on minor differences (e.g. size of maze) and used as behaviour identifiers in the dataset ( Table 1).
The data were obtained during 442 recording sessions. During each session the animal performed one of the 14 behaviour subclasses. The number of recording sessions and behaviour subclasses used with each animal is shown in Table 5. The description of each behaviour subclass is given in Table 1.

Data file organization
The data files for each recording session are stored in separate compressed tar archive files (i.e. with extension "tar.gz"). These files are organized into top-level directories, each of which contains data for sessions recorded on the same day using the same animal and electrode placement combination. Data from all sessions recorded from the same animal on the same day were merged for spike sorting. All merged sessions are stored in the same top-level directory in the data set at CRCNS.org. Therefore, the cell identification numbers assigned by the spike sorting are common to all sessions within a top-level directory, and are not specific to individual sessions. Details of the file organization are provided in the document "CRCNS.org hc3 data description" which is included with the data set.

Metadata organization
The metadata describing the data is stored in four tables that are included with the data set. These tables are provided in CSV (comma-separated values) format, Excel format, and as tables in an SQLite database. SQLite (http://www.sqlite.org/) is a free, open source, SQL data base engine available for all common operating systems. These tables are related to each other through a field (named "topdir"), which has the name of top-level directories described above and is common to all four tables. The fields in each of these tables are listed in Listing 1. As described in file "CRCNS.org hc3 data description" the SQLite command interface can be used with these tables to generate summary statistics from the metadata and to locate data files that satisfy particular search criteria (for example, find data for cells of a specific type from a particular brain region and experimental behaviour). rats gor01, pin01 and vvp01. EP collected data from rats f01_m, g01_m, i01_m and j01_m. KM carried out all spike sorting and classification of cell types in this dataset. JT prepared documentations for public data release (data sets hc-2 and hc-3) at CRCNS. org. AS prepared an earlier version of documentations for data set hc-2 at CRCNS.org. KM, JT and GB wrote the paper. All authors were involved in the revision of the draft manuscript and have agreed to the final content.

Competing interests
No competing interests were disclosed. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability
CRCNS: Multiple single unit recordings from different rat hippocampal and entorhinal regions while the animals were performing multiple behavioral tasks, http://dx.doi.org/10.6080/K09G5JRZ Terms of data usage: Data on this site is made available only for scientific purposes. Redistribution of the data is not permitted. Any publications derived from the data must cite the data contributors and CRCNS.org as being the source of the data and the original paper(s) that generated the data. Unnecessary downloading of large data files is not permitted. (To minimize demands on the server, only data expected to be useful for your scientific purposes should be downloaded).
Privacy notice: Occasionally the researchers who contribute data wish to know who has downloaded their data. Upon request we will provide this information to the data contributors. So, if you download data, there is a possibility that your name and email address will be provided to the data contributor. We request that the data contributors only use the information for legitimate scientific purposes (such as determining the frequency of downloads, or contacting users to providing updated information about the data or to explore possible collaborations). 1.

Current Referee Status:
Referee This data set will be a valuable resource for investigators who wish to test hypotheses about hippocampal function and interaction with entorhinal cortex at the level of single unit and LFP physiology. I have not investigated the data base carefully to ensure its utility, as I assume the authors have done so. My comments are limited to a few questions about their article describing the data base. F1000Research 'Animal surgery' -second paragraph: Do the authors mean "in the coronal plane?" It is not clear whether the tetrodes were angled medially or laterally in that plane, or whether the authors mean that the tetrodes were angled anteriorly or posteriorly to the coronal plane. Please clarify here and in other locations in the text how the tetrodes were angled.
'Animal surgery' -final paragraph: Does the data base contain histological figures? If not, are they easily identifiable and accessible from published reports? It would be very useful to ensure that the precise location of tetrodes could be made available to investigators. Mizuseki and colleagues provide a description of 442 datasets (more than 200 hours) of hippocampal in recordings. These datasets provide cell classification as well as the raw data, in case users wish to vivo return to the high-sample traces and re-cluster the data on their own. The results of different analyses from the database have been published, although all analysis possibilities have not been exhausted. While this is an atypical review to write (as any suggestions on improvement to the database seem to be akin to looking a gift horse in the mouth), I am hoping the authors can discuss the ramifications of providing such an extensive database. While any database has limitations (e.g. sometimes determining the EEG units is a bit of an exercise with these data in the present format), it is perhaps more advantageous and constructive to discuss what the community can do to make the most use of it. As the authors state "Community-driven data sharing may offer cross-validation of findings, refinement of and the challenges are now directed to those who are interested interpretations and facilitate discoveries" in participating in future analyses. In hopes that this will set-forth a new era of data-sharing, I hope that the authors can discuss their open database in a manner that parallels Giorgio Ascoli's discussion on sharing neural reconstruction files . It should be noted that I do not expect the authors to have (Ascoli, 2006) comprehensive answers to each of my comments below, but it might be beneficial if they would provide some initial thoughts to seed further discussion. Some points that the authors may wish to discuss include: One barrier to sharing data is "the fear of being scooped" (Ascoli, 2006). For example, scientific progress will be dramatically increased through parallel (and hopefully, collaborative) data analysis. Are the authors concerned about being "scooped"? What about a group of researchers unknowingly using the database to conduct an analysis that is also a student's PhD project? This hypothetical student may still be acquiring the expertise to keep pace with more seasoned researchers.
"An often unspoken resistance to the sharing… data is born out of concern for criticisms and mistakes" (Ascoli, 2006). This is an unprecedented event that Mizuseki and colleagues have set forth, by providing a comprehensive catalog of data in an unabashed manner. The first point I want to touch upon is the level of raw exposure in releasing data. I have never met a neuroscientist who believes that anyone else could've conducted their analyses better than themselves (save a modest few), but to find those who believe that the authors "didn't look for the right thing" could make up a small battalion. Mizuseki and colleagues invite critics to their doorstep. Perhaps this is more similar to posing nude as a model for artist. By placing data online, it comes with judgment and the potential to be proved wrong. What do the authors believe the convention is if others follow in their footsteps? For example, if group A shares their database after publishing their results, group B downloads and analyzes their data in new light and finds the opposite results, what is the appropriate manner in handling the situation? The self-correction aspect of science is also accelerated when data is openly-shared. It remains to be seen how situations like this should/will be handled.
There is also a chance for the data to be used in order to stifle or impede publication when the result is dubious. That is, should it be considered "fair-play" for a reviewer to use the same database and conduct a similar analysis with results that contradict a submitted manuscript's results? A tactic such as this only seems appropriate in "open-review formats".

5.
results? A tactic such as this only seems appropriate in "open-review formats".
"A final barrier to sharing digital reconstructions relates to the reluctance to lose or give a competitive edge" (Ascoli, 2006). The release of this immense database will surely be the stronghold of many new assistant professors who are still in the initial stages of setting up a physiology laboratory. Moreover, I can see these data being used in laboratories that are heavily analysis driven and limited in their own capacity for high-density recordings. This increase in in vivo the number of people analyzing data invites competitors for the authors as well as their neutral peers. Sheer logic dictates that the authors are not afraid of competition (otherwise, why share the data? Please note that I am a cynic and have been taught by many reviewers that "scientific altruism" is as abundant as snarks and unicorns). Why should researchers not be afraid of others using this database to compete? Is it believed that these data will be used for collaboration? What can be done to emphasize collaboration across laboratories when using the database? How is authorship handled when multiple groups use these data? If I spend my time analyzing these data, only to have it published under a "group project name" (similar to Sir William Timothy Gowers' ), do I put it on my CV? Polymath project The paper explicitly states that large downloads are prohibited. Does this mean that I should not download all the data? Is it OK to use these data for a class project? If so, is it more appropriate for the professor to disseminate it to the students or should the students make their own CRCNS account? Finally, more of an afterthought but along similar lines: should there be a collaborative processing code library that should be developed and maintained in parallel with the use of these data (similar to GitHub or SourceForge)? I absolutely do not expect the authors to have complete answers to these questions nor should they carry the sole responsibility of determining the general conventions of what constitutes use versus misuse, but I do think that it is worth hearing their general thoughts. As this is the largest and most comprehensive database of hippocampal and entorhinal physiology to become available to the general scientific in vivo community, there will be an immense ramification. Scientific replication/external validation is an immediate and positive application of this database. As I have cited Timothy Gowers above, I think it would be best to leave off with his opinion of open data-sharing and collaboration: "It feels as though this process is to normal research as driving is to pushing a car." ( ). http://gowers.wordpress.com/2009/02/01/questions-of-procedure/ As a field we have the opportunity to compete or collaborate. I hope that these data facilitate cross-laboratory collaboration where two groups are reticent to share their own data. For those that are interested in embracing the collaborative spirit, the CRCNS website also has a "marketplace" section where ideas and potential collaborations can be discussed. Finally, I applaud the authors for this unprecedented act of scientific altruism. I hope this will be a platform that accelerates our understanding of the entorhinal-hippocampal circuitry through collaboration.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests:

Shuzo Sakata
Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK