Input file format¶
This section discusses how raw data should be organized so that
able to read it. It requires the user to format data from the
segmentation/tracking software to the accepted input.
So far, full compatibility exists only for text-based input, which is described here.
The name of the main folder is taken as the label of the experiment, i.e. as a unique name that identifies the experiment.
The scaffold to be used in the main folder is:
<experiment_label>/ containers/ descriptor.csv metadata.yml
If you executed the
tunasimu script (see 10 minute tutorial) you can look in
the newly created directory
tmptunacell in your home directory:
there should be a folder
storing data from the numerical simulations:
$ cd simutest $ ls
and check that the structure matches the scaffold above.
There is a subfolder called
containers where raw data files are stored,
and two text files:
descriptor.csv describes the column organization of raw
data text files (see Raw data description),
metadata.yml stores metadata about the experiment
(see Metadata description).
Both files are needed for tunacell to run properly.
Time-lapse data is stored in the
containers folder. If you ran the
10 minute tutorial you can check what you find in this folder:
$ cd containers $ ls
You should see a bunch of
.txt files (exactly 100 such files if you stuck to
default values for the simulation).
Each file in this
containers folder recapitulates raw data of cells
observed in fields of view of your experiment, which have been reported
by your image analysis process.
Your experiment may consist of multiple fields of view (or even subsets thererof), and we call each of these files a container file. Within a given container file, cell identifiers are univocal: there cannot be two different cells with the same identifier.
The container file is tab-separated values, and each column corresponds to a cell quantifier exported by the image analysis process. Each row represents one acquisition frame for a given cell. Rows are grouped by cell: if cell ‘1’ was imaged on 5 successive frames, there should be 5 successive rows in the container file reporting for raw data about cell ‘1’.
The column name and the type of data for each column is reported in the
descriptor.csv file, a comma-separated value files, where each line entry
f8are floating point numbers coded on 8 bytes (this should be your default datatype for most quantifiers, except cell identifiers),
i4means integer coded on 4 bytes,
u2usually refer to the Irish band. For our purpose it also means unsigned integer coded on 2 bytes (this is the default for cell identifier, it counts cells up to 65535, which can be upgraded to
u4pushing the limits to 4294967295 cells—after that let me know if you still haven’t found what you’re looking for)
cellID: the identifier of a given cell. In our example, cells are labeled numerically by integers, hence the type is
u2(Numpy shortname that means unsigned integer coded on 2 bytes);
parentID: the identifier of the parent of given cell. This is mandatory for
tunacellto reconstruct lineages and colonies;
time: time at which acquisition has been made. Its type should be
f8, that means floating type coded on 8 bytes. The unit is left to the user’s appreciation (minutes, hours, or it can even be frame acquisition number—though this is discouraged since physical processes are independent of the period of acquisition).
All other fields are left to the user’s discretion.
simutest experiment, one could inspect
time,f8 ou,f8 ou_int,f8 exp_ou_int,f8 cellID,u2 parentID,u2
In addition to the mandatory fields listed above one can find the following
ou, ou_int, exp_ou_int. These are explained in Numerical simulations in tunacell.
Experiment metadata is stored in the
metadata.yml file which is parsed using
the YAML syntax. First the file can be separated in documents (documents are
separated by ‘—’). Each document is organized as a list of parameters
(parsed as a dictionary). There must be at least one document where the entry
level should be set to
experiment (or synonymously,
It indicates the higher level experimental metadata (can be date of experiment,
used strain, medium, etc…). A minimal example would be:
level: experiment period: 3
which indicates that the acquisition time period is 3 minutes. A more complete metadata file could be:
level: experiment period: 3 strain: E. coli medium: M9 Glucose temperature: 37 author: John date : 2018-01-20
When the experiment has been designed such that metadata is heterogeneous, i.e. some fields of view get a different set of parameters, and that one later needs to distinguish these fields of view, then insert as many new documents as there are different types of fields of view. For example assume our experiment is designed to compare the growth of two strains and that fields of view 01 and 02 get one strain while field of view 03 get the other strain. One way to do it is:
level: experiment period: 3 --- level: - container_01 - container_02 strain: E. coli MG1655 --- level: container_03 strain: E. coli BW25113
A parameter given in a lower-lover overrides the same experiment-level parameter, which means that such a metadata could be shortened:
level: experiment period: 3 strain: E. coli MG1655 --- level: container_03 strain: E. coli BW25113
such that it is assumed that the strain is
E. coli MG1655 for all container
files, unless indicated otherwise which is the case here for
that gets the
Another option is to store metadata in a tabular file, such as comma-separated
values. The header should contain at least
The first row after header is usually reserved for the experiment level metadata,
and following rows may be populated for different fields of view. For example
the csv file corresponding to our latter example reads:
level,period,strain experiment,3,E. coli MG1655 container_03,,E.coli BW25113
Although more compact, it can be harder to read/or fill from a text file.
When a container is not listed, its metadata is read from to the experiment metadata. Missing values for a container row are filled with experiment-level values.
If you’d like to start analysing your dataset, your first task is to organize data in the presented structure. When it’s done, you can try to adapt the commands from the 10 minute tutorial to your dataset. When you want to get more control about your analysis, have a look at Setting up your analysis which presents you how to set up the analysis, in particular how to define the statistical ensemble and how to create subgroups for statistical analysis. Then you can refer to Plotting samples to customize your qualitative exploration of data, and then dive in Statistics of the dynamics to start the quantitative analysis.