Content:
Information technology is largely based on exact sciences; so, most of the terms have single interpretation. But still, there is no generally accepted definition of a database. And that means it's time to turn to the glossary. In our case, to Oracle. This is one of the largest manufacturers of server hardware; since 1977 they have been developing Oracle Database, the object-relational DBMS. For the last 10 years Oracle has been leading in this area. So, they know their business.
"A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS).
Data within the most common types of databases in operation today is typically modeled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized." Oracle.com
The most obvious analogy to relational databases, I think, is a library section, where book racks are database tables, book shelves are columns, and books are individual records. All this has logic, structure, and is interconnected. All literary works are divided by periods, countries, genres, writers and other parameters. At the same time, in the next section, the book storage can be organized differently (another database).
Storing information in databases is also logical, structured and solves specific problems.
Now remember some bookcases in a cafe or at your grandparents. There are few books, some authors stand by a certain logic; but in general it is chaotical. Anyone can take a book from the shelf, and after thumbing through it, put it back randomly. You can lend a book to someone and completely forget about this. These aspects are the main difference between a database and the usual data storage on an HDD.
- Where do you have Freud's works?
- In one of the boxes on the floor, somewhere next to "Fifty Shades of Grey" and "Harry Potter".
So, a database is a large amount of ordered information you can work with according to certain rules.
Now let's consider another important concept, a database management system (DBMS).
I need your book, leather bastard.
Let's continue the analogy. We agreed that databases are similar to book sections. Then DBMS are librarians, without whom a library cannot exist. Databases in a vacuum don't make sense. If no one ever visits the library, then you get an abandoned book storeroom.
That is why databases also need a database management system.
DBMS features |
Librarian duties |
Giving users access to the database |
Opening the library in the morning |
Installing database protection |
Closing the library in the evening, keepeing it silent during the day |
Filling and editing the database |
Making orders for new literature, books disposing |
Data sorting |
Processing of incoming literature, books disposing, putting things in order |
Extracting information from the database, logging |
Lending books and filling out library cards |
When a large application works with data, it needs instructions, logic, and APIs. Yes, you can store data right in the memory, like on your PC, but if the volume increases drastically, this becomes inefficient.
"A database typically requires a comprehensive database software program known as a database management system (DBMS). A DBMS serves as an interface between the database and its end users or programs, allowing users to retrieve, update, and manage how the information is organized and optimized.
A DBMS also facilitates oversight and control of databases, enabling a variety of administrative operations such as performance monitoring, tuning, and backup and recovery." Oracle.com
In summary, a DBMS is a program that is needed to create, manage, and otherwise interact with a database.
Major vendors and their products.
There are at least fifty database types, and much more specific solutions. The topic is so extensive that you can devote a couple of lives to it :)
And I will only mention the two most common DBMS classes:
Relational, SQL — Structured Query Language.
All data is interconnected and distributed in pre-created tables (book rack). All tables have columns (book shelves) and rows (books). In general, everything is ordered like in a real library. That is why such databases are often called tabular.
Pros: integrity, safety (compliance with ACID requirements) and structured data, a rich SQL background gives great functionality and does not depend on a specific DBMS.
Cons: complexity, poor sparse data optimization, problems with multiple execution of the same queries (N+1).
Examples: SQLite, MySQL, PostgreSQL, Microsoft SQL Server.
As you may have guessed from the name, there are no tables, rows and columns. In NoSQL, the data is optimized more subtly, for a specific task, and has a flexible untyped schema. Need to add random data to a document or field? Feel free. This approach allows you to work more efficiently with sparse data. Non-relational databases can be divided into two types: network and hierarchical.
Pros: high data processing speed, work with unstructured data (any kind), decentralized systems, easier and cheaper scalability than with SQL, advantages in sharing and replication.
Cons: binding to a specific DBMS, limited languages and APIs, unexpected errors in the database development (since a database schema is not required), migrating between NoSQL databases can be a quest with obstacles, you will need proprietary tools to work with the database.
Examples: MongoDB, CouchDB, GemFire, Cassandra.
Prefix "No" in the name of NoSQL, as well as the fundamental differences with SQL, might make you think that system architects are in the agonies of choosing. But in fact, these types of databases solve different tasks.
Choose SQL if you work with structured data.
Choose NoSQL if your data is unstructured.
To make a DB and DBMS work, you need resources and infrastructure like premises, server(s), administrators, software and other equipment. Some non-professionals combine all this into one concept, a database.
Don't do like this ¯\_(ツ)_/¯
When we talk about the complex of all components, including infrastructure, we should refer to "database system". In a narrower sense, a database system is a database, DBMS, and related applications.
Let us summarize.
DB is a book section in a library.
DBMS are librarians.
A database system is the whole library.
As a company grows, so does the number of employees and the amount of data. In a classical data storage scheme the file stays where it is "put" untill you retrieve it, but databases additionally solve such tasks as structuring, processing, analysis, simultaneous access and data security. Many processes are automated, and users do not even know how everything works.
In general, nowadays almost everything in IT world is based on database systems:
Websites, online stores, blogs, social networks and other resources. (By the way, this article is stored in the database, from where it is loaded for reading.)
Any services, websites and applications where you need to sign in, since this data must be kept safe and not mixed with each other.
Operating systems.
Let's imagine a situation:
A team of architects began working simultaneously on a 5 GB building project. They need to make changes in the project, and these changes must be displayed in real time for every team member without causing collisions (consolidated building model).
Here we need a database system with a client-server DBMS. Developers (and other pros) have a resource management method that illustrates well what would happen without a database system. It is called COW (Copy-on-Write) — when reading data, the original is used (a copy is not needed), but when changes are made, a new instance (snapshot) is created that does not affect the source data:
Simultaneous editing according to the COW principle.
In practice, the workflow will be as follows:
An architect opens the project file from the local disk; now their colleagues can view the original, but not change it.
The architect edits the project; colleagues do not see changes in real time, but only the original file version.
One of the colleagues also started making changes to the project; the API creates a new file instance that is not linked to the original. All changes will be saved in a new file.
The architect saves their changes to the original file and closes it.
Now, if the colleagues reopen the project, they will see the latest changes.
The one who opens the file first will work with the original.
In some programming scenarios the COW method can optimize resources, but in the example above this logic is inconvenient because the work slows down. You can get confused with file versions; it is difficult to synchronize changes, which will probably lead to fatal inconsistencies (collisions) in the building design, etc.
A more functional alternative to a PC and project on a local disk is a client-server database system. Let's consider BIM system as an example.
In the center we have a Server that connects users, BIM system and all data, creates backups, provides and/or restricts access to client applications, and so on.
All team members can work simultaneously without interfering with each other. There is only one project file in the current version (excluding backups). A centralized BIM model will prevent collisions. In general, here we see only advantages.
Information security is a headache for millions of professionals, engineers and system architects around the world. Cybercrimes are increasing every year, and the main problem lies in the trilemma of security, functionality and availability (the so-called Anderson paradox):
The more secure a database is, the less functional and accessible it is to the end user.
The more functional and accessible a database is, the less secure it is.
So, database security is a big trade-off. But you need it to avoid unpleasant situations like Facebook (Meta) and other companies, when personal data of thousands and millions of users leak into the network. And, as you know, everything that gets into the network remains there forever.
Well, security is a complex measure and is organized at several levels:
Physical level security. This includes secure locations of database server, as well as reliable hardware. A lot of articles are written about this, but in short, organize an ACS-protected room, choose equipment from trusted brands, and provide redundancy. Your second option is cloud, but in this case, forget about complete privacy and data security.
Restriction of rights and access control. To ensure maximum security, you should provide the database access only to those who really need it. And the rights must be restricted to the minimum set necessary to complete the tasks. Remember Anderson's paradox? This is how it looks in practice.
Monitoring. Even the most secure systems can be subject to attacks and other malicious activities. Pre-configured monitoring systems can automatically send alerts about abnormal events and suspicious actions with data.
Encryption. In the cybercrime era, sniffering is one of the most popular way of data theft. So, all stored and transmitted data must be protected by encryption. It is very important to follow the rules for Encryption Key Management. If you lose your keys, you lose your data.
Database software security. Some people use pirated and/or outdated software. Security patches are released not only to create headaches for your IT team.
Node security. The entire chain is judged by the weakest link. That is why attackers are looking for such links. If you have many devices connected to the database, such as a web server or an application server, then each of them should be continuously tested for security and vulnerabilities.
Backup security. Often, all security efforts go for the main system, but don't forget about backups. Attacks on them can cause no less harm and damage to the enterprise.
A database is created and managed by special software, DBMS. Together, they form a database system, which affects most applications and business processes performance.
When designing database systems, errors can become a big problem, especially in case of relational databases. Unfortunately, avoiding them is difficult, given that companies use database systems for many years.
Most likely you will choose either relational or non-relational DBMS. In some cases, large enterprises use hybrid systems to optimize performance.
You choose database hardware based on your task. There are many aspects, but special attention should be paid to disk subsystem, connectivity, RAM and, of course, CPU.
And don't forget about database security; this will save you and your data a world of problems.