Database refactoring
From Dokeos
As part of an on-going effort to refactor the whole of Dokeos, the database will be refactored as well. Changes can include the structure and use of the database, but as a first step we want to make several changes to the database fields. There is currently much discussion about the possibility of moving to an optimised single-database system.
In this page we will try to present and summarise the main areas of discussion. The decisions section contains the topics that we have reached an agreement on. Everyone is always free to take up a discussion on the Dokeos forum.
The database documentation on the Dokeos website might be a great source of inspiration for this refactoring.
An overview of all the new / removed database tables and fields can be found in database changes Dokeos 1.6
Subtopic 1: database, table, field names
We don't want every tool script to create its own database layer, constructing tables, naming fields etc. For one thing, this presents problems when e.g. a table or field name changes. For another thing, this is very errorprone, everyone has to constantly take into account multiple and single database configurations, etc.
Solve a problem once, and once only. Make this solution reusable.
The problem gets solved once, in the database library, codefile inc/lib/database.lib.php .
Solution for the database and table names
This problem is already solved: tool programmers include the database.lib.php code library and get access to lots of functions to get the correct database and table names. If the names change, we only have to change them in one place.
Solution for the field names
There is no solution yet, but we can create one: create constants in the database library; e.g. USER_ID_FIELD, COURSE_ID_FIELD, COURSE_FAKE_CODE_FIELD, CREATION_DATE...
and use these constants in queries:
$sql_query = "SELECT `".USER_ID_FIELD."` FROM ... ";
and in accessing the results
echo $result[USER_ID_FIELD];
Subtopic 2: Remove obsolete tables and fields
There has been little talk on this matter, but I think everyone agrees that some tables should be removed (I'm particularly aiming at the pma_* tables which were used at the time PhpMyAdmin was a key tool to the Dokeos admin interface). As for unused tables, there are some unused fields. The general idea 'Less is More' will be used, and there will be no unused fields left.
What if some users are using one of these fields for other purposes in their specific implementation? They will lose their information stored in these fields! We might think about an option in the installation/upgrade script that allows the platform administrator to do the cleanup or not (and of course stating which tables / fields are going to be deleted).
view forum thread: remove unused fields view forum thread: remove unused tables
Subtopic 3: Renaming of fields (and tables)
Many database table and field names are still the same as the older Claroline software. These names are often not real english but franglais (meaning a sort of frenchified english). Part of us are dutch-speaking (maybe most of us) and another important part is french-speaking. English is used everywhere in the Dokeos code and it is the developpers language (so use english variable names also). There is no reason why there should be some english-named fields and some french-named fields. All fields will thus be renamed to english, self-explanatory names.
- all database fields should be in English
- Each table should have a comment to describe what it's used for
- Naming convention for the table and field names: we always use under_score, not camelCase.
Subtopic 4: Resizing of fields
As many more companies are interested in Dokeos nowadays, there is a increasing amount of migrations. In order to realise those migrations, some text or ID fields could be enlarged to ease the job of migrators (there is often a need to enlarge some fields here and there and then we loose the ability to upgrade the database structure properly).
- For consistency database fields from different tables that contain the same entity will all be given the same size / length. We agree to enlarging the fields to get the same length
- consistency in the field types. For instance: COURS.link.category int(4) <-> COURS.link_categories.id int(6), cours_user.user_id int(11) < > user.user_id meidumint(8), faculte.id int(11) < > course.faculte varchar(12)
- never reduce fields in size
- we might consider switching from varchar to char (for performance). I attended a presentation of MySQL-officials and they also used char (where Dokeos would use varchar).
Subtopic 5: unique course and user key
As the unique identifier for the courses, some parts of the code use the code e.g. PSY101, and other parts use the course id (an integer). We should standardize on one of these two.
Same goes for the users: some parts of the code use the username (janprog), other parts use the id (5105). For the moment, the forum cannot reference the id, but this will be modified by the forum cleanup done by Thomas Berton.
Subtopic 6: multiple and/or single database modes
The current version of Dokeos has two possibilities for database design.
- First of all you have the "multi database design" where you have a main database (a list of the users, a list of the courses, ...) and besides this dokeos_main database each individual course has its own database where the course material is stored (the agenda of that course, the announcements of that course, the links, ...)
- Secondly you have a "single database design" which only uses one database. The course material is stored inside the dokeos_main database (each course having a separate table I think).
There is now a discussion going on if we should shift away from this dual database design and have only one logic database design. I guess everybody knows that the single database design is not very elegant so refactoring this would indeed be very useful. The hard discussion is now whether we should abandon the "multi database design"
The proposal
A "logical" database structure, by only having one table for each kind of data. Only one "course_description" table, which would contain the descriptions of all courses, and for each description the short ID of the course...
The reasons
It is all about performance: is there a large difference in efficiency between a proposed optimal single database mode and the current multiple database mode? In the multidatabase design each course has its own database which contains about 50 tables. MySQL creates a folder for a database and uses three files for a table (.frm, .MYD and .MYI). So if you have 1000 courses you have 1000 MySQL data folders each containing 150 files. As the information in the course database is really small you only get very small files. In the proposed design we would have only one database with about 70 tables (20 of the dokeos_main and 50 for the courses). This would result in having one folder and 210 files on the filesystem (independant of the number of courses you have). The files will get a lot bigger though.
More technical
- according to Olivier B and Yannick Warnier, MySQL keeps a pointer allocated in memory for every open table. If I understood well, 512 of such pointers might eat up to 2GB RAM. Concerning the pointers the number of open tables is limited by the mysql variable 'table_cache' (with default value of 64, depending on)
- using a multiple database design you have a lot of different connects to make (from dokeos_main to the course database and back again), where in the proposed design you would not have this.
- the selects will go over much more entries in the proposed design (as all of the announcements of all courses are stored in one announcements table).
- real life experience at Ghent University show no MySQL performance problems. Some data:
- courses: 2300 active courses and 2000 sleeping courses (of the previous academic year, databases are not deleted)
- MySQL performance: 30 logins/minute, an average of 3 mysql-connects/sec and 35 queries/sec.
- server: Dell dual processor (Xeon 2.4GHz) system with 4GB memory, approx. 500 GB disk space in a raid5 config, running on a redhat linux 2.4 kernel.
Final thoughts
- Dokeos has this dual database design for quite a while now.
- the number of tables for a course will drop significantly when the forum redesign is implemented (from 19 to 3 or 4 tables)
- Benchmarking is needed
- database corruption: When database corruption occurs in the case of the single database campus you might loose all the information of your campus. When it occurs in the multi-database campus you might loose only the course (if the affected database is not the dokeos_main database)
- quid tracking (which is a really huge database)
- if the number of files is a problem then we might consider a solution for the documents also. When you create a course you also created a folder for that course.
- Something what we cannot forget is that this is a major change and it is a lot of work to change everything. Is the difference in performance large enough (if any) to justify the huge refactoring effort, because this would probably mean changing every SQL query in the code?
Subtopic 7: Support for other databases
Currently Dokeos only supports MySQL. Can we generalise this to use also e.g. PostgreSQL or use ODBC? Doing this will be a lot of work, but this will alkso lead to a much cleaner separation of the business and database layer.
Some code to work with the ADOdb Database Abstraction Library has been posted on the forum.
Subtopic 8: Foreign key integrety refactoring
As many of you might know, the integrity in the Dokeos database is not something we can be proud of. Users are referenced by their username somewhere while they still have a unique ID, some courses are referenced by their course_code somewhere and by their course_id somewhere else. This has lead to misunderstandings or duplication of code in the database library. It also makes plugin facilities harder to develop, as the database structure is not quite as clear as it should be. Anyway, there is much to say about this, but only one thing to do: add foreign key integrity. Many of the fields to resize or replace are already mentionned in the database documentation. The fields will be discussed here also as this has a MAJOR impact on the Dokeos code.
Subtopic 9: unified table structure
The idea is that common fields are moved into a separate table. These common fields are
- visibility (visible or invisible)
- insert_date (when was the item inserted)
- inser_user_id (who has inserted the item)
- lastedit_date (when was it edited)
- lastedit_user_id(who has edited the item)
- lastedit_type (what has changed: for instance: made invisible)
- to_group_id (which groups are allowed to see the item)
- to_user_id (which users are allow to see the item)
- start_visibility_window (start of the time window)
- end_visibility_window (end of the time window)
- metadata_ref (metadata)
The Group based agenda is the first development that is developed in this way. For the moment the table lasttooledit was used but this table should be renamed to tool_item_properties or something like this.
Decisions
- No great changes are taken without first trying to reach a concensus, e.g. about the multiple/single database system
- Naming convention for the table and field names: we always use under_score, not camelCase. We will use singular where possible e.g. a user table holds entities of type user, a course table holds entities of type course, and so on.
- For consistency database fields from different tables that contain the same entity will all be given the same size / length. We agree to enlarging the fields to get the same length, and never reduce fields in size.
- we try to limit the size of the table/database names to 18 characters to make the names portable to other SQL servers
See also
- Database refactoring
- Database fields
- Database schema (Dokeos 1.5)
- Fulltext search on courses databases

