Database history

From ECHSA Congenital Database
Jump to navigationJump to search

This document describes the technical database history

The database until 2003

The database was once using Lotus Notes, then it was moved to MySQL on the central server, while using Microsoft-based engine for local data collection.

The database change in 2003

The central database switched to the PostgreSQL RDBMS, which is the most advanced open source database engine.

The database structure

The old (up to year 2004) database structure diagram

This diagram shows the database structure. It uses standard database structure notation, except for that there are no dashed lines here.

See the article: database structure.

The history

Up to summer 2003 the central database has run MySQL and it's structure was the same as the structure of the echdd6's. The problems with the referential integrity pushed us to more advanced DBMS, and our choice was PostgreSQL]]. We also re-designed the structure, and made it more modern and flexible.

The data, when being imported, needed to be converted from old to new structure. There were a special piece of software that did it. The web interface avoided this problem, as it operated directly on the proper database structure.

The main ideas

Formerly the diagnoses, procedures, GPRF and NCA were stored in separate tables and separate interconnection tables. In new database structure they were generalized to be factors.

The idea of factors is that the given factor (in the database) is, or is not, connected with given object. For instance, patient P1 was or was not given the D1 diagnosis. In other words, for a given set of objects (e.g. patients) and a set of factors (e.g. diagnoses) there is a defined relation over the cartesian product of these two sets.

Diagnoses, procedures, complications, GPRF and NCA became the factor classes. There is also a new class, called "aristo", which is intended to contain Comprehensive Score (Aristotle Score) data. As the classes are represented by the rows of the factor_class table, new classes can be introduced in future, without any need to redesign the database's structure.

The dictionary tables

...have a "dict_" prefix, so they can be easily distinguished from the main tables. This is the listing of the database's tables:

ecdb=# \d
List of relations
Schema |          Name          | Type 
--------+------------------------+-------
public | dict_casecategory      | table
public | dict_continent         | table
public | dict_country           | table
public | dict_gender            | table
public | dict_is_reop           | table
public | dict_status            | table
public | factor                 | table
public | factor_class           | table
public | factor_subtype         | table
public | factor_type            | table
public | hosp_submitted         | table
public | hospital               | table
public | operation              | table
public | operation_cause_result | table
public | operation_factor       | table
public | patient                | table
public | patient_factor         | table
public | score                  | table
public | surgeon                | table
public | v_big                  | view 
public | v_oper_pat_diag        | view 
public | year                   | table
(22 rows)

You can see several tables and two "views". One of them is the emulation of the "BIG" table, which is described in next section.

The "BIG" table

outdated information:

Once upon a time, the computers were too slow to perform joins of many tables on the fly in acceptable amount of time. The programmers created a special join of several tables from the ECDB database and called it the "BIG" table. It was used, and it's still being used as the source of data for Gold standards reports. Before, it was created from MySQL tables, which don't exist now, and the "BIG" table is emulated in PostgreSQL, and then copied to MySQL, from where the (some of the) online reports are served.

The creation of the BIG table is a complex query, which involves many joins, conditional expressions and subqueries, and it's intended to produce exactly the same output as the former BIG-creating query. Unfortunately, due to its complexity, the output is only 99% the same, which means that there are always small differences between reports generated directly from PostgreSQL and reports generated from BIG.

2004 update: This table doesn't exist anymore. All reports are now driven by PostgreSQL alone.

The database change in 2005

The database undewent one more redesign. The admission was introduced as a new table in the database. More on this subject in User's manual of the EACTS Database Software.