ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Note

A structured process to create datasets with nutritional information

[version 1; peer review: 2 not approved]
PUBLISHED 25 Jan 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

There is a lack of datasets in Colombia that characterize the nutritional components and other similar information about food items. This study describes a structured process to develop datasets that captures the preferences and purchases of food items by a selected group of people. The datasets would classify products according to their sodium and sugar content.  The outcome of this structured process would include three datasets, each with a different focus: the first contains data on food preferences, the second contains the purchase history according to the invoices obtained, and the third contains characteristics of the food items such as its brand, category, sodium and sugar content levels, among others.

Keywords

dataset, sodium, sugar, purchase invoices, survey

Introduction

In this day and age, there is an impressive amount of data traffic that is generated and shared over the internet. Researchers can utilize thousands of photos, hours of video footage, and consumer data to create datasets1. Some datasets are used in research with a specific goal in mind, whereas other datasets are used to create data and store information for future investigations. Some datasets are freely published, while others are for restricted use.

There are several studies that use data to analyse taste preferences around online shopping2, music3,4, movies5,6 or social relations, for example 7. However, a study about people’s preferences for food items in supermarkets in Colombia faces challenges due to the lack of datasets freely available on this topic. Additionally, various products that are present in some public datasets8 are not available in Colombia.

To address these gaps, this study describes the process of creating and describing a dataset that contains information on the food preferences and purchases of a group of people living in Colombia. An important aspect of the dataset is describing the sodium9 and sugar10 content of each food product and featuring and sorting out the nutritional information available in the Colombian market.

Methods

According to the STROBE guidelines, we have taken the following into consideration.

The purpose of the study is based on capturing the preferences of users in self-service stores. The study was carried out in the cities of Popayán and San Juan de Pasto, Colombia, across two months, where part of this period was used for participant recruitment.

A group of students, professionals and independent workers, all ≥ 18 years of age, accepted the invitation to participate in this research voluntarily, providing a signed agreement where they accepted to sharing their information as long as their identities would be protected and remained anonymous.

All data were analyzed and stored in text files available in the "Variables and Data sources" section. In that section, the structure and components are explained in more detail.

The study is exploratory and the aim is to obtain a dataset for future work. The general structure followed the principles outlined by Robert K. Yin11. Table 1 presents a summary of these elements.

Table 1. Summary of elements based on the Case Study Research by Robert K. Yin.

PhaseSummary
PlanWe want to answer the question: what are the items that people prefer when
making purchases in self-service stores?
DesignThe references in consumption of Items in a supermarket are selected as the
unit of analysis.
Type of simple case study
Exploratory nature
Set UpThe structure proposed in Figure 2
Data CollectionPeriod of data collection: July-August, 2017
analysisThe data are available according to the structure proposed in Table 3, Table 4,
Table 5 and Figure 1, Figure 2
ReleasePlacing the data in the public domain by means of this article

Data collection

Figure 1 illustrates the process of data acquisition, carried out using two methods. The first method involved collecting preferences using a survey, and the second method involved the acquisition of purchase records with invoices. All purchases were made in self-service stores, focused particularly on food self-service stores. Data collection was implemented over a one-month period when participants were actively involved in the data collection process.

518a3567-4088-41ea-bb90-e067f00eb74b_figure1.gif

Figure 1. Schematic of data collection.

User preferences

Data collection as described above was carried out through a survey, where people chose products based on their preferences. For this task, the Google Forms web tool was used, in which a series of questions were designed and classified into twelve sections. Participants were informed of the academic purpose of the survey, and the basic demographic data of each participant was registered. They identified their preferences out of the 708 food items presented in the survey. All items were classified into ten categories, created from observation of local self-service stores.

Table 2 shows the 10 categories the items were classified under. Classification of the items aimed to have participants interact in a more comfortable and conscious way with the questions, attempting to keep the process from becoming tedious.

Table 2. Names of categories used to classify products when listing user preference data.

SECTIONNAME OF THE SECTION
1Groceries
2Dairy products, sausages and
chilled
3Meat
4Fruits and Vegetables
5Fish and shellfish
6Drinks
7Liquors
8Candies and snacks
9Bakery
10Frozen products

The survey was available for one month, and was available online. During the collection process, 215 people participated and shared their preferences and other demographic data.

User purchasing history

The purchase history refers to a list of products purchased by a person within a period of time in a self-service store. 65 participants provided all of their purchase receipts for four weeks, in particular for food products. At the end of this period, all the invoices of the 65 people who participated in the study were collected. R-Studio v1.0.143.

12 was then used to transcribe the products of interest, taking into account the number of submitted receipts, non-food products, and the number of times each user purchased each item.

Data treatment

The second part of Figure 1 illustrates how the information collected from the surveys was processed to construct the datasets. The process involved manually removing irrelevant information such as repeated surveys, inconsistent data, and non-focused responses in the user preferences section. For the participants’ purchase receipts, some information was also manually removed, since some receipts contained purchases other than food products. The previously filtered information in both datasets was anonymized by assigning numerical codes to the users and the products to protect users’ identities and classify all the products. All food items were classified based on their sodium and sugar content (based on WHO and FDA recommendations)9,10. Figure 2 shows the final data structure after organizing the information13.

518a3567-4088-41ea-bb90-e067f00eb74b_figure2.gif

Figure 2. Schematic of data classification.

Survey_items represents the preferences of the user and Purchase_items represents the purchases themselves, along with the characteristics of each product.

Data structure

There are two columns in Table 3. The first column (“User Code”) shows the code assigned to each user, and the second column registers the products selected as each user’s favorite. Each user has one or more products registered in the table, where the first four numbers represent the type of product, and the last three numbers refer to the specific brand for each product.

Table 3. User preferences dataset.

User CodeItem Code
20000001000002
20000001002003
20000001003006
20000001003006
20000011014001
2000001
.
.
.
2000213
1019005
.
.
.
1006015
20002141007000
20002141013001

Similar to the previous table, Table 4 presents the same two columns, and has an additional column, which shows the ranking, or the number of times a user has purchased that product divided by the number of shopping invoices for that user over the four week period.

Table 4. User purchasing history dataset.

User CodeItem CodeRating
200015010520000.05
200015010130151
200015010560220.25
200015010560230.23
200022610340290.4
2000226
.
.
.
2000228
1002000
.
.
.
1056009
0.13
.
.
.
0.26
200004010000010.42
200004010590190.5

Table 5 has six columns, with each row representing a different product characteristic. Each product has an item code, the section to which the product belongs, the category to which the product belongs, brand, sugar content per 100 g, and sodium content per serving (classified into four levels, where 1 is the lowest and 4 is the highest).

Table 5. Product characteristics.

Item
Code
Categ.Sect.BrandSug.
level
Sodi.
level
105800511100218534
105201721204209833
103900231301224342
104500541401227711
1018008
.
.
.
1032002
5
.
.
.
6
1504
.
.
.
1604
2348
.
.
.
2410
1
.
.
.
1
4
.
.
.
3
103700271700242012
106001281800243511
103402391900246721

Ethical statement

Written informed consent was provided by all persons who volunteered in the research. Our study received approval in data management ethics according to the politics of the Telematics Engineering Group of the University of Cauca within the Electronics and Telecommunication Engineering Faculty. The proposed procedure is covered under approval number 8.4.2-90.14/274 of 2017. The study also complies with article 15 of the 1991 Colombian constitution on the right to privacy; and with the concepts of the Colombian Constitutional Court in Judgment No. T-414/92 of 1992 on the definition of data, computer freedom, and personal information.

Results

The numbers in the second half of Figure 3 represent a scale for measuring sodium and sugar contents, based on the quantity of sodium and sugar that each product contained according to the nutritional table. To better understand the graph, we note again that there are four levels that represent the sodium content, and four levels that represent the sugar content, which generates 16 possible combinations that are color coded differently. For instance, the green circle with the number 11 indicates that the sodium and sugar contents are very low, whilst the red circle with the number 44 indicates a product with very high sodium and sugar content. The pie chart illustrates the percentage of products in each sodium and sugar classification. There is a higher percentage of products with high sodium and low sugar contents.

518a3567-4088-41ea-bb90-e067f00eb74b_figure3.gif

Figure 3. Levels of sodium and sugars in food and drink products.

USER_CODEITEM_CODE
20000001000002
20000001000004
20000001001002
20000001002000
20000001002003
20000001003000
20000001003003
20000001003006
20000001004002
20000001004006
20000001005000
20000001005001
20000001006002
20000001006008
20000001006022
20000001007000
20000001007009
20000001008000
20000001008003
20000001008006
20000001009000
20000001010001
20000001012001
20000001013000
20000001013002
20000001014001
20000001014011
20000001014012
20000001015001
20000001015008
20000001015011
20000001016005
20000001017000
20000001017001
20000001017004
20000001018001
20000001018002
20000001019000
20000001019002
20000001019005
20000001063000
20000001063003
20000001020005
20000001021000
20000001021001
20000001022001
20000001022004
20000001023000
20000001023001
20000001024000
20000001024001
20000001025002
20000001026000
20000001026001
20000001027000
20000001028000
20000001028002
20000001028007
20000001029000
20000001029005
20000001029008
20000001032000
20000001032001
20000001032002
20000001032004
20000001034000
20000001034005
20000001034006
20000001034008
20000001034012
20000001034014
20000001034025
20000001035007
20000001035009
20000001035016
20000001036004
20000001036010
20000001037003
20000001038000
20000001038001
20000001039000
20000001039005
20000001040004
20000001041001
20000001043000
20000001044000
20000001044001
20000001044004
20000001045000
20000001045001
20000001045003
20000001048004
20000001050005
20000001051006
20000001052001
20000001052006
20000001052007
20000001052011
20000001052015
20000001053000
20000001053003
20000001053007
20000001053010
20000001053013
20000001054000
20000001055007
20000001055014
20000001055016
20000001056013
20000001056015
20000001056021
20000001057000
20000001057002
20000001058005
20000001059003
20000001059007
20000001060000
20000001060011
20000001061001
20000011000000
20000011000001
20000011000003
20000011001004
20000011001005
20000011001011
20000011002000
20000011003000
20000011003003
20000011003004
20000011003006
20000011003007
20000011004000
20000011004001
20000011005003
20000011006001
20000011006000
20000011006007
20000011006006
20000011006015
20000011006014
20000011007000
20000011007012
20000011008000
20000011009000
20000011010002
20000011013000
20000011013001
20000011013002
20000011013004
20000011013006
20000011013008
20000011014000
20000011014010
20000011014011
20000011014012
20000011014015
20000011015005
20000011015001
20000011015003
20000011015006
20000011015002
20000011015008
20000011016000
20000011016005
20000011017000
20000011017014
20000011017001
20000011017013
20000011018001
20000011019000
20000011019003
20000011019005
20000011063001
20000011063003
20000011020000
20000011020006
20000011021000
20000011021001
20000011021002
20000011022006
20000011022000
20000011022001
20000011024000
20000011024001
20000011025002
20000011026000
20000011026001
20000011027000
20000011028002
20000011028007
20000011032001
20000011032002
20000011034000
20000011034001
20000011034006
20000011034007
20000011034014
20000011034015
20000011034025
This is a portion of the data; to view all the data, please download the file.
Dataset 1.User preferences.
This file contains two columns (User_Code, Item_Code), the first column User_Code is the code assigned to each user and the second column Item_Code contains the encoded product that the user prefers.
User_CODEITEM_CODERATING
200001710530071,8993387508
200001710660071,8761762352
200001710020001,9960249501
200001710340361,9341850767
200001710410011,9212440718
200001710030061,9940229670
200001710320011,9379990911
200001710440091,9157085811
200001710530151,8993243211
200001710550101,8957327419
200001710530111,8993315359
200001710050001,9900666667
200001710040051,9920388843
200001710040011,9920468207
200001710340151,9342243584
200001710130271,9742978223
200001710000012,0000150000
200001710180111,9646320128
200001710150061,9704484506
200001710560221,8939160358
200001710560091,8939393509
200000810580051,8903577960
200000810400101,9230661244
200000810670011,8744199865
200000810590051,8885727641
200000810390091,9249188409
200000810520191,9011139533
200000810440101,9156981255
200000810440011,9157146401
200000810580011,8903649429
200000810500001,9047695238
200000810650181,8779100447
200000810380011,9267881245
200015010550201,8958408371
200015010400041,9232137569
200015010390001,9250721848
200015010520001,9012832700
200015010130151,9744525007
200015010000002,0001500000
200015010560221,8940419802
200015010560231,8940401866
200015010030061,9941555684
200015010580181,8904687822
200015010340151,9343529833
200015010400021,9232174554
200015010580161,8904723558
200015010560091,8940652968
200015010340291,9343267935
200021510410011,9214342734
200021510200081,9609797178
200021510580011,8905605949
200021510400081,9232688595
200021510180111,9648265097
200021510400041,9232762566
200021510520071,9013324056
200021510520211,9013071032
200021510220001,9571575342
200021510140001,9725986193
200021510200001,9609950980
200021510420041,9195847617
200021510240001,9533349609
200021510590071,8887646635
200021510660191,8763408532
200021510210001,9590744368
200021510590071,8887646635
200021510650171,8781061711
200021510190051,9629098974
200021510170141,9667526701
200021510200021,9609912530
200021510590001,8887771483
200021510550221,8958988533
200021510670001,8746157451
200021510380011,9269875463
200021510040021,9922420473
200021510190051,9629098974
200021510560161,8941142937
200021510560211,8941053256
200021510520011,9013432497
200021510190081,9629041185
200021510220011,9571556192
200021510190021,9629156763
200021510200011,9609931755
200021510530151,8995123526
200021510660211,8763373329
200021510660221,8763355728
200021510220111,9571364692
200021510280021,9457306503
200021510280101,9457155086
200021510320051,9381834390
200021510320011,9381909514
200021510220051,9571479592
200021510220121,9571345542
200021510220061,9571460442
200021510280191,9456984744
200021510280201,9456965818
200021510050081,9902478388
200021510060021,9882813354
200021510060461,9881943768
200021510050081,9902478388
200021510030001,9942323031
200021510660141,8763496539
200021510660131,8763514141
200021510040011,9922440316
200021510150001,9706551724
200021510010121,9981928289
200021510340281,9343915252
200021510340251,9343971374
200021510020001,9962225549
200021510070021,9863068792
200021610570001,8923519395
200021610380051,9269810839
200021610340011,9344430034
200021610380001,9269903661
200021610190021,9629166577
200021610220041,9571508526
200021610560041,8941367646
200021610560241,8941008916
200021610400061,9232735196
200021610550081,8959249598
200021610580041,8905561794
200021610590231,8887370718
200021610350091,9325590405
200021610660001,8763752345
200021610560071,8941313836
200021610000032,0002099994
200021610020001,9962235529
200002510340291,9342059072
200002510220001,9569716243
200002510050051,9900647260
200002510340111,9342395777
200021710140191,9725636305
200021710580081,8905499769
200021710600141,8869722475
200021710130011,9745459284
200021710150061,9706454937
200021710020001,9962245509
200021710090011,9823736547
200021710050011,9902636913
200021710390091,9251199941
200021810320021,9381919802
200000610340101,9342230733
200000610340011,9342399089
200000610010041,9980000080
200000610030031,9940179641
200000610100001,9802039604
200000610000032,0000000000
200000610340001,9342417795
200000610340141,9342155909
200000610340361,9341744388
200021910560251,8941019389
200021910520231,9013072908
200021910170131,9667585370
200021910160051,9687097997
200021910110071,9784422858
200021910150051,9706494057
200021910230081,9552329992
200021910120011,9764990351
200021910560021,8941431929
200021910660071,8763657274
200021910320011,9381948273
200021910170021,9667798097
200022010370011,9288505990
200022010340041,9344412594
200022010340001,9344487427
200022110130141,9745245377
200022110130091,9745342835
200022110660001,8763799250
200022110340361,9343823619
200022110340101,9344310016
200022110340021,9344459682
200022110360041,9307077965
200022210010181,9981878448
200022210220151,9571356585
200022210660001,8763808630
200022210170141,9667595530
200022210560021,8941460338
200022210000012,0002199998
200022310340291,9343973912
200022310340021,9344479024
200022310340011,9344497733
200022310340001,9344516441
200022310340121,9344291942
200022310340041,9344441608
200020110000002,0002010000
200020110360041,9306884916
200020110340001,9344303675
200020110340041,9344228842
200020110550261,8958783954
200020110560031,8941243538
200020110340101,9344116595
200020110390091,9251045949
200020110400041,9232627951
200020110560161,8941010363
200020110550161,8958963656
200020110580051,8905402148
200020110580161,8905205592
200020110160081,9686862702
200020110660251,8763171595
200022410340231,9344095828
This is a portion of the data; to view all the data, please download the file.
Dataset 2.User purchasing.
This file contains three columns (User_Code, Item_Code, Rating), the first column User_Code is the code assigned to each user and the second column Item_Code contains the encoded product that the user prefers and Rating is the value obtained from dividing the number of total product invoices by the number of times the user purchased a product.
ITEM_CODECATEGORYSECTIONCODE_BRANDSUGAR_LEVELSODIUM_LEVEL
100000001000200011
100000101000200111
100000201000200212
100000301000200312
100000401000200412
100000501000200521
100000701000200612
100000801000200712
100001001000200812
100001101000200911
100001301000201012
100001401000201111
100001501000201212
100001601000201311
100001701000201412
100100001001201531
100100101002201531
100100201001201631
100100301002201631
100100401001201731
100100501002201731
100100601001201831
100100701001201931
100100801002202011
100100901003202131
100101001003202231
100101301001202331
100101401001201031
100101601001201131
100101701004201731
100101801004201531
100101901004201031
100200001005202414
100200101005202514
100200201005202414
100200301005202414
100200401005202614
100200601005201114
100300001006202713
100300101006202811
100300201006202912
100300301006203011
100300401006203111
100300501006203211
100300601006203311
100300701006203411
100300801006203542
100301001006203611
100301101006203712
100301201006203811
100301301006203942
100301401006201111
100301501006201011
100400001007204012
100400101007204112
100400201007204212
100400301007204314
100400401007204412
100400501007204512
100400601007204611
100400701007204721
100400801007204814
100400901007204912
100401001007205012
100401101007205114
100401201007205211
100401301007205311
100401501007205412
100401601007205514
100500001008205612
100500101008205711
100500201008205812
100500301008205911
100500401008206012
100500501008206112
100500601008206212
100500701008206312
100500801008206411
100500901008206514
100501001008206613
100501101008206713
100600201011206844
100600301011206934
100600401011207034
100600501012206813
100600601012206913
100600701012207013
100600801013206823
100600901013206921
100601001013207024
100601101014206823
100601201014206923
100601301014207023
100601401015206931
100601501015207041
100601601016206824
100601701016206914
100601801016207023
100601901017206844
100602001017206921
100602101017207023
100602201018206834
100602301018206913
100602401018207033
100602501019206844
100602601019206913
100602701019207013
100602801020206844
100602901020206913
100603001020207014
100603101021206813
100603201021206934
100603301021207013
100603401022206823
100603501022207033
100603801021201034
100604001019207114
100604201011201034
100604301011202311
100604401023207012
100604501024206811
100604601024205612
100604701012201013
100700001025207211
100700101025207311
100700201025207411
100700301025207511
100700401025207611
100700601025207711
100700701025207811
100700801025207911
100700901025208011
100701001025208111
100701101025208211
100701301025208311
100701601025208411
100701801025208511
100701901025207811
100702001025208611
100702401026208711
100800001027208831
100800101027208931
100800201027209011
100800301027209111
100800401027209241
100800501027209311
100800601027209421
100800701027200331
100800901027208432
100900001028209533
100900101028209632
100900201028209732
100900301028209832
100900401028209933
100900501028208832
100900701028210011
100900801029208811
101000001030210133
101000101030210233
101000201030210333
101000301030210433
101000501030200933
101000601030210233
101100001031210231
101100101031210231
101100201031210333
101100301031210532
101100401031210232
101100501031210631
101100601031210713
101100701031210831
101200001032210911
101200101032211011
101200201032211111
101200301032202311
101200401032211211
101200701032211311
101200801032211414
101300001033211533
101300101033211534
101300201033211534
101300301033210633
101300401033210633
101300501033211544
101300601033209532
101300701033211544
101300801033210633
101300901033211643
101301001033211533
101301201033211743
101301301033211533
101301401033211833
101301501033211332
101301601033211233
101301701033211933
101301801033211533
101302001033212034
101302101033211031
101302401034212144
This is a portion of the data; to view all the data, please download the file.
Dataset 3.Product characteristics.
This file contains six columns (Item_Code, Category, Section, Code_Brand, Sugar_Level, Sodium_Level). Item_Code is the code assigned to each item and the other columns represent how they have been classified and coded according to their characteristics.
NoITEM_CODEPRODUCT
11000000Rice 1
21000001Rice 2
31000002Rice 3
41000003Rice 4
51000004Rice 5
61000005Rice 6
71000007Rice 7
81000008Rice 8
91000010Rice 9
101000011Rice 10
111000013Rice 11
121000014Rice 12
131000015Rice 13
141000016Rice 14
151000017Rice 15
161001000Sugar 1
171001001Sugar 2
181001002Sugar 3
191001003Sugar 4
201001004Sugar 5
211001005Sugar 6
221001006Sugar 7
231001007Sugar 8
241001008Sugar 9
251001009Sugar 10
261001010Sugar 11
271001013Sugar 12
281001014Sugar 13
291001016Sugar 14
301001017Sugar 15
311001018Sugar 16
321001019Sugar 17
331002000Salt 1
341002001Salt 2
351002002Salt 3
361002003Salt 4
371002004Salt 5
381002006Salt 6
391003000Flour 1
401003001Flour 2
411003002Flour 3
421003003Flour 4
431003004Flour 5
441003005Flour 6
451003006Flour 7
461003007Flour 8
471003008Flour 9
481003010Flour 10
491003011Flour 11
501003012Flour 12
511003013Flour 13
521003014Flour 14
531003015Flour 15
541004000Grain 1
551004001Grain 2
561004002Grain 3
571004003Grain 4
581004004Grain 5
591004005Grain 6
601004006Grain 7
611004007Grain 8
621004008Grain 9
631004009Grain 10
641004010Grain 11
651004011Grain 12
661004012Grain 13
671004013Grain 14
681004015Grain 15
691004016Grain 16
701005000Pasta 1
711005001Pasta 2
721005002Pasta 3
731005003Pasta 4
741005004Pasta 5
751005005Pasta 6
761005006Pasta 7
771005007Pasta 8
781005008Pasta 9
791005009Pasta 10
801005010Pasta 11
811005011Pasta 12
821006002Sauce 1
831006003Sauce 2
841006004Sauce 3
851006005Sauce 4
861006006Sauce 5
871006007Sauce 6
881006008Sauce 7
891006009Sauce 8
901006010Sauce 9
911006011Sauce 10
921006012Sauce 11
931006013Sauce 12
941006014Sauce 13
951006015Sauce 14
961006016Sauce 15
971006017Sauce 16
981006018Sauce 17
991006019Sauce 18
1001006020Sauce 19
1011006021Sauce 20
1021006022Sauce 21
1031006023Sauce 22
1041006024Sauce 23
1051006025Sauce 24
1061006026Sauce 25
1071006027Sauce 26
1081006028Sauce 27
1091006029Sauce 28
1101006030Sauce 29
1111006031Sauce 30
1121006032Sauce 31
1131006033Sauce 32
1141006034Sauce 33
1151006035Sauce 34
1161006038Sauce 35
1171006040Sauce 36
1181006042Sauce 37
1191006043Sauce 38
1201006044Sauce 39
1211006045Sauce 40
1221006046Sauce 41
1231006047Sauce 42
1241007000Coffee 1
1251007001Coffee 2
1261007002Coffee 3
1271007003Coffee 4
1281007004Coffee 5
1291007006Coffee 6
1301007007Coffee 7
1311007008Coffee 8
1321007009Coffee 9
1331007010Coffee 10
1341007011Coffee 11
1351007013Coffee 12
1361007016Coffee 13
1371007018Coffee 14
1381007019Coffee 15
1391007020Coffee 16
1401007024Coffee 17
1411008000Chocolate 1
1421008001Chocolate 2
1431008002Chocolate 3
1441008003Chocolate 4
1451008004Chocolate 5
1461008005Chocolate 6
1471008006Chocolate 7
1481008007Chocolate 8
1491008009Chocolate 9
1501009000Chocolate powder drink 1
1511009001Chocolate powder drink 2
1521009002Chocolate powder drink 3
1531009003Chocolate powder drink 4
1541009004Chocolate powder drink 5
1551009005Chocolate powder drink 6
1561009007Chocolate powder drink 7
1571009008Chocolate powder drink 8
1581010000Jelly 1
1591010001Jelly 2
1601010002Jelly 3
1611010003Jelly 4
1621010005Jelly 5
1631010006Jelly 6
1641011000Instant Flavoured Powder 1
1651011001Instant Flavoured Powder 2
1661011002Instant Flavoured Powder 3
1671011003Instant Flavoured Powder 4
1681011004Instant Flavoured Powder 5
1691011005Instant Flavoured Powder 6
1701011006Instant Flavoured Powder 7
1711011007Instant Flavoured Powder 8
1721012000Oats 1
1731012001Oats 2
1741012002Oats 3
1751012003Oats 4
1761012004Oats 5
1771012007Oats 6
1781012008Oats 7
1791013000Cereal 1
1801013001Cereal 2
1811013002Cereal 3
1821013003Cereal 4
1831013004Cereal 5
1841013005Cereal 6
1851013006Cereal 7
1861013007Cereal 8
1871013008Cereal 9
1881013009Cereal 10
1891013010Cereal 11
1901013012Cereal 12
1911013013Cereal 13
1921013014Cereal 14
1931013015Cereal 15
1941013016Cereal 16
1951013017Cereal 17
1961013018Cereal 18
1971013020Cereal 19
1981013021Cereal 20
1991013024Cereal 21
This is a portion of the data; to view all the data, please download the file.
Dataset 4.Products.
This file contains three columns (No, Item_Code, Product"), where Item_Code represents the code assigned to each product and Product represents the product type without specifying the brand.
NoCODE_BRANDBRAND
12000Roa
22001Florhuila
32002Carolina
42003Diana
52004Blanquita
62005Dona Pepa
72006Alejandra
82007Supremo
92008Fino Patia
102009D1
112010Olimpica
122011Exito
132012Sabroson
142013Medalla De Oro
152014Boluga
162015Incauca
172016Riopaila
182017Manuelita
192018Dona Pura
202019Providencia
212020Splenda
222021Colombia
232022Del Fonce
242023Ekono
252024Refisal
262025Natusal
272026Himalaya
282027Haz De Oros
292028Farallones
302029La Nieve
312030PAN
322031Dona Arepa
332032La Americana
342033Promasa
352034Maizena
362035La Vecina
372036La Otra Arepa
382037Nevada
392038Super Arepa
402039Flor Suprema
412040Frijol
422041Lenteja
432042Arveja
442043Garbanzo
452044Blanquillo
462045Maiz
472046Maiz pira
482047Soya
492048Quinoa
502049Linaza
512050Semillas de chia
522051Cebada
532052Arrocillo
542053Alpiste
552054Mani
562055Cuchuco
572056La Muneca
582057Doria
592058Comarrico
602059Monticello
612060Conzazoni
622061Zonia
632062De Cecco
642063San Remo
652064El Dorado
662065Maruchan
672066Bucatini
682067Santali
692068Fruco
702069San Jorge
712070La Constancia
722071Respin
732072Sello Rojo
742073Aguila Roja
752074La Palma
762075Bemoka
772076Franco
782077Rico
792078Nescafe
802079Colcafe
812080Juan Valdez
822081Lukafe
832082Morasurco
842083Buen Dia
852084Maxima
862085La Bastilla
872086Aroma
882087Instacrem
892088Corona
902089Sol
912090Tesalia
922091Luker
932092La Especial
942093Cruz
952094Chocolyne
962095Milo
972096Chocolisto
982097Nesquik
992098Colombina
1002099Toddy
1012100Chocoexpress
1022101Levapan
1032102Quala
1042103Royal
1052104JBO
1062105Tang
1072106Nestle
1082107Clight
1092108Hindu
1102109Don Pancho
1112110Quaker
1122111Miller�s
1132112La Tinaja
1142113Toning
1152114Qikely
1162115Kellogg�s
1172116Tosh
1182117Muesli
1192118Flips
1202119Nutrikids
1212120Zooreals
1222121Vitarrico
1232122Quinua
1242123Van Camps
1252124Isabel
1262125Alamar
1272126Sancho
1282127Gustamar
1292128Carolina
1302129Soberana
1312130Zenu
1322131Alkosto
1332132Buen Gusto
1342133Calidad
1352134Don Sancho
1362135La Alemana
1372136Tinapa
1382137Sabor Del Mar
1392138Knorr
1402139Dona Gallina
1412140Maggi
1422141Ricostilla
1432142Caldo Rico
1442143El Rey
1452144America
1462145Santa Elena
1472146Trisason
1482147Del fogon - Trifogon
1492148Guisamac
1502149Calima
1512150Don Gustico
1522151La Sopera
1532152Comino
1542153Adobo
1552154Oriental
1562155Klim
1572156Rodeo
1582157Proleche
1592158Ensure
1602159Alpina
1612160Pediasure
1622161Colanta
1632162Huevos
1642163Huevos Codorniz
1652164La Garza
1662165Z
1672166Girasoli
1682167Purisimo
1692168Oliosoya
1702169Gourmet
1712170Frescampo
1722171Oleocali
1732172Premier
1742173Ricapalma
1752174Nutri Canola
1762175La Espanola
1772176Vivi
1782177Manteca
1792178La Coruna
1802179Clavos
1812180Pasas
1822181Canela
1832182Laurel
1842183Carve
1852184Ricolada
1862185Ramo
1872186Bimbo
1882187Marinela
1892188Mama-ia
1902189Colpan
1912190Comapan
1922191La Gitana
1932192Guadalupe
1942193Tia Rosa
1952194Susanita
1962195Pullman - Willian
1972196Milenio
1982197Super
1992198Adams
This is a portion of the data; to view all the data, please download the file.
Dataset 5.Brands.
This file contains three columns (No, Code_Band, Brand), Code_Brand represents the code assigned to each brand and Brand represents the brand of each product.
NoCODE_CATEGORYCATEGORY
10Groceries
21Bakery
32Candies and snacks
43Drinks
54Liquors
65Dairy products, sausages and chilled
76Meat
87Fish and shellfish
98Frozen products
Dataset 6.Categories.
This file contains three columns (No, Code_Category, Category), Code_Category represents the code assigned to each category and Category represents the assigned to the product.
NoCODE_SECTIONSECTION
11000Rice
21001Sugar
31002Light sugar
41003Panela
51004Raw cane sugar
61005Salt
71006Flour
81007Grain
91008Pasta
101011Ketchup
111012Mayonnaise
121013Mustard
131014Pink sauce
141015Maramalade
151016Tartar
161017Bbq
171018Meat sauce
181019Black sauce
191020Pepper hot sauce
201021Soy sauce
211022Mostaneza
221023Mustard honey
231024Pasta Sauce
241025Coffee
251026Coffee cream
261027Chocolate
271028Chocolate powder drink
281029Cocoa
291030Jelly powder
301031Instant flavoured powder
311032Oat
321033Cereal
331034Granola or quinoa cereal
341035Canned Tuna
351036Canned Sardine
361037Canned sausages
371038Canned Grains and Vegetables
381039Broth
391040Soups And Creams
401041Color Condiment
411042Seasoning
421043Aromatic
431044Tea
441045Milk Powder
451046Egg
461047Oil
471048Fat
481049Vinegar
491050Champignon
501054Cloves, Raisins and Cinnamon
511055Laurel
521056Carve (vegetarian meat)
531057Soup bowl
541060Pancakes Mix
551100Pony (Soda with malta)
561101Cake
571102Tortillas
581103Toasts
591104Halved bread
601105Wholemeal bread
611107Bread for hot dogs
621200Gum
631201Chewing gums
641202Mint candy
651203Millows
661204Candies
671205Peppermint candy gum
681206Sweet Chocolate
691207Arequipe
701208Chips
711209Cookies
721210Nuts Package
731300Water
741301Soda
751302Bottle Juice
761303Isotonic Drinks
771304Energy Drinks
781305Bottled Iced Tea
791400Beer
801401Schnapps
811402Whiskey
821403Wine
831404Champagne
841405Tequila
851406Geneva
861407Vodka
871408Ron
881500Whole milk
891501Lactose-free milk
901502Fitness milk
911504Arepa
921505Sausage
931506Salami
941507Ham
951508Mortadella
961509Yogurt
971510Flavored Milk
981511Butter
991512Margarine
1001513Prepared jelly
1011514Yogurt with cereal
1021515Koumiss
1031516chorizo sausage
1041517Sweet dairy product
1051518Condensed milk
1061519Milk cream
1071520Chantilly cream
1081522Mozzarella cheese
1091523Whole cheese
1101524Cream cheese
1111525Chopped cheese
1121526Parmesan cheese
1131528Curd
1141600Beef
1151601Pork Meat
1161602Mutton
1171603Rabbit Meat
1181604Chicken Meat
1191605Turkey Meat
1201700Fish
1211701Crustacean
1221702Marine mollusc
1231800Ice Cream
1241801Frozen chicken products
1251802Frozen Empanada
1261803Precooked potato and cassava
1271804Bacon
1281900Fruit
1291901Vegetables
Dataset 7.Sections.
This file contains three columns (No, Code_Section, Section). Code_Section represents the code assigned to each section and Section represents the section in which a product can be found.
NoSUGAR_LEVELCODE_SUGAR_LEVEL
1Very low Sugar1
2Low Sugar2
3Moderate Sugar3
Dataset 8.Sugar.
This file contains three columns (No, Sugar_Level, Code_Sugar_Level). Sugar_Level classifies the products by sugar content and Code_Sugar_Level represents the code assigned to each level.
NoSODIUM_LEVELCODE_SODIUM_LEVEL
1Sodium-free1
2Very low Sodium2
3Moderate sodium3
Dataset 9.Sodium.
This file contains three columns (No, Sodium_Level, Code_Sodium_Level). Sodium_Level classifies the products by sodium content and Code_Sodium_Level represents the code assigned to each level.

Conclusions

This work was carried out to construct a valid dataset with food items available in Colombia. Future academic studies can perform statistical analysis using the data collected. Using the information from the nutritional labels of food items, we classified products using aspects like sodium and sugar content, following WHO and FDA recommendations to inform us whether the products contain levels above or below the recommended levels.

Data availability

Dataset 1: User preferences. This file contains two columns (User_Code, Item_Code), the first column User_Code is the code assigned to each user and the second column Item_Code contains the encoded product that the user prefers. DOI, 10.5256/f1000research.12979.d18837314.

Dataset 2: User purchasing. This file contains three columns (User_Code, Item_Code, Rating), the first column User_Code is the code assigned to each user and the second column Item_Code contains the encoded product that the user prefers and Rating is the value obtained from dividing the number of total product invoices by the number of times the user purchased a product. DOI, 10.5256/f1000research.12979.d18837415.

Dataset 3: Product characteristics. This file contains six columns (Item_Code, Category, Section, Code_Brand, Sugar_Level, Sodium_Level). Item_Code is the code assigned to each item and the other columns represent how they have been classified and coded according to their characteristics. DOI, 10.5256/f1000research.12979.d18837516.

Dataset 4: Products. This file contains three columns (No, Item_Code, Product"), where Item_Code represents the code assigned to each product and Product represents the product type without specifying the brand. DOI, 10.5256/f1000research.12979.d18837617.

Dataset 5: Brands. This file contains three columns (No, Code_Band, Brand), Code_Brand represents the code assigned to each brand and Brand represents the brand of each product. DOI, 10.5256/f1000research.12979.d18837718.

Dataset 6: Categories. This file contains three columns (No, Code_Category, Category), Code_Category represents the code assigned to each category and Category represents the assigned to the product. DOI, 10.5256/f1000research.12979.d18837819.

Dataset 7: Sections. This file contains three columns (No, Code_Section, Section). Code_Section represents the code assigned to each section and Section represents the section in which a product can be found. DOI, 10.5256/f1000research.12979.d18837920.

Dataset 8: Sugar. This file contains three columns (No, Sugar_Level, Code_Sugar_Level). Sugar_Level classifies the products by sugar content and Code_Sugar_Level represents the code assigned to each level. DOI, 10.5256/f1000research.12979.d18838021.

Dataset 9: Sodium. This file contains three columns (No, Sodium_Level, Code_Sodium_Level). Sodium_Level classifies the products by sodium content and Code_Sodium_Level represents the code assigned to each level. DOI, 10.5256/f1000research.12979.d18838122.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jan 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Rodriguez-Montúfar F, Ordoñez-Buitron B, Duran D et al. A structured process to create datasets with nutritional information [version 1; peer review: 2 not approved]. F1000Research 2018, 7:110 (https://doi.org/10.12688/f1000research.12979.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Jan 2018
Views
8
Cite
Reviewer Report 17 Apr 2018
Irina Kovalskys, International Life Sciences Institute (ILSI), Autonomous City of Buenos Aires, Buenos Aires, Argentina 
Not Approved
VIEWS 8
The authors had a good idea and implemented accordingly. Despite of that, scientific information is weak and reproducibility in not ensured with the information provided 

1. The goal of the study is not enough clear: Two phrases ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kovalskys I. Reviewer Report For: A structured process to create datasets with nutritional information [version 1; peer review: 2 not approved]. F1000Research 2018, 7:110 (https://doi.org/10.5256/f1000research.14074.r31839)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
18
Cite
Reviewer Report 03 Apr 2018
Georgina Gómez, Departamento de Bioquímica, Escuela de Medicina, Universidad de Costa Rica, San José, Costa Rica 
Yadira Cortes Sanabria, Pontificia Universidad Javeriana, Bogotá, Colombia 
Not Approved
VIEWS 18
Even though the information obtained in this study could be used to further research, we consider that the simple attempt to provide a data base is not enough for publication. According to methodology, they should explain in detail about the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Gómez G and Cortes Sanabria Y. Reviewer Report For: A structured process to create datasets with nutritional information [version 1; peer review: 2 not approved]. F1000Research 2018, 7:110 (https://doi.org/10.5256/f1000research.14074.r31841)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jan 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.