Data import¶
The following document describes the data-import from a developer perspective. It gives inside in the
structure of the source code files and the workflow in the background, when executing the
data_import
django management command for a specific django app.
Configuration of DataImportApp¶
The DataImportApp
is the django app specific implementation of the data_import
custom management command.
There it is specified, how to process the structured data file, which is given as an argument to the data_import
-command.
The class-constant MAPPING_EXCEL_DB
maps a column-header of the structured input file to a attribute of the specified django ORM-class.
The following listing shows an example:
DJANGO_MODEL = "collectedDatasets"
DJANGO_APP = "Datasets"
MAPPING_EXCEL_DB = {
"name": ("name", None),
"applicationArea": ("applicationArea", ApplicationArea),
"classification": ("classification", Classification),
"focus": ("focus", Focus),
"provider": ("provider", None),
}
MAPPING_EXCEL_DB
defines a mapping of the columns name
, applicationArea
, classification
, focus
and provider
to the attributes
name
, applicationArea
, classification
, focus
and provider
, which are atributes of the django model class specified in DJANGO_MODEL
, which lies inside the django app DJANGO_APP
.
Note
Only in the specified example the column names and the attribute names are the same. But that is not always the case. You can update the the names of the columns, if the column names differ in your structured data file. If you want to change the names of the django ORM attributes, please check the dev-guide.
Each DataImportApp
holds a getOrCreate
-method, which holds the logic for importing the structured data of the data file into the django ORM model. As arguments it gets the data as a list of rows and the header as a list of header name strings.
These 2 data structures are then processed and objects of the corresponding ORM-model are instantiated. The process can be described as the following:
Process the row-data and bring it in a dictionary form, whereby each key of the dictionary corresponds to a attribute of the ORM model where it should imported to. If a field is a
ManyToManyField
objects of the corrsponding ORM-class are instantiated.Instantiate the ORM-class from the dictionary. That will create an object of type e.g.
Tools
orDataset
but it wont be saved in the database untilsave()
is called on the object.Search in the database if a item with the given
name
is already present in the database.Compare the fields of the newly created object and the object present in the database. If they are not equal, save the new object with the
id
of the old object and remove the old object from the database. Furthermore update theManyToManyField
-relations to the relations of the ones from the new model. Serialize the old object into JSON and save it in theHistory
model.
Update of an item¶
When using the bulk import with a structured data file, the items in the database are automatically updated to the state of the data inside the data file. The old state of an item is saved in a History
-model in each app. There, the item is saved as a stringified JSON object. That allows to easily rollback to the old state, if the data-import did not work as expected.
The following section describes the process of the data_import
and how the update works in general.