Databases store, organize, and process information in a way that makes it easy for us to go back and find what we’re looking for. We encounter data sets, both simple and complex, all the time, whether in the form of library card catalogs, financial records, and even contact directories. But what are databases in the context of a website?
In this quick guide to modern database technology, you’ll get an understanding of how databases work, common terms to know, a look at SQL vs. NoSQL, and how to determine which database is best for your web application.
Use the links below to quickly get to where you want to start:
A quick overview of modern database technology
Spreadsheets process numbers and databases process information—specifically, structured information. Databases can be designed to do just about anything with information, such as:
- Track, organize, and edit data
- Collect data and produce reports, or
- Be the foundation for information-rich, dynamic websites
The most common database technology today is the relational database. Relational databases store data in a normalized way—that means the data is split up into different tables to avoid redundancy. While relational databases have been around for quite a while now, they offer a versatile tool for both data storage and data management. Both user-facing applications with high demands on performance and reporting software can be backed by a relational database.
However, there are cases when relational databases might not be your first choice. One case is when the rigid structure of the data in a relational database does not fit the data that you want to store. For example, you might want to store a JSON document without a specified schema. It could, for example, be a configuration file or some form of user-generated data. This is when NoSQL databases are useful. The data is usually queried using an API or SDK instead of SQL, hence the name NoSQL. These databases usually also provide very fast access to the stored data. Instead of the database engine having to parse SQL and join together the data you specified in your query, NoSQL databases are tuned to instantly fetch the requested data through their API.
Increasing complexity: Single-file vs. multi-file databases
Take the contact directory, for example: It’s got items of information like names, addresses, and phone numbers, all organized in the same format—in this particular case, we’ll say it’s all collected on a single spreadsheet. In database terms, the spreadsheet with the information is the table, each person is a record, and their name, address, and phone number are all fields. The last name—how the directory is organized, alphabetically—is the key field, which sorts the records. Because the last name was chosen as the key field, that’s how the directory is sorted; sorting by phone number or house number wouldn’t generate what we were looking for and we’d never find anything out of the thousands of entries.
That’s one cornerstone of database technology: smart sorting.
But a directory is just a flat, single-file database. When would you need more complex databases with multiple tables that can interact with one another? Let’s say you want a shipment status update on an order you placed with an e-commerce site. That website has multi-file databases set up for orders, dates, payments, shipment tracking, inventory, suppliers, and customers. By linking these tables together, if a query is made about an order’s status, the database can generate a report with data from the tables:
“[Customer]’s order of purchased with a [payment method] on [purchase date] is being shipped to [address] via [method], due to arrive [tracking date].”
This type of structure usually means that a relational database is the preferred choice. Querying a relational database to get the orders for a customer could look like this:
SELECT * FROM customers WHERE id=1 INNER JOIN orders ON orders.customer_id = customers.id
Relational databases and database management systems: Building powerful data-driven websites
The foundation for modern database technology began in the 1970s with the first “relational data model.” Its emphasis was on careful organization. Today, relational databases remain important to how websites are built; any website that displays data from a database has to have:
- Server-side scripting
- HTML and CSS
- SQL, a database language
- A database management system (DBMS)
For example, if we go back to the e-commerce site above, a page on that site might display a customer’s orders by using the example SQL query. The page would be rendered to the user with HTML and CSS while the server-side handles the connection to the database, runs the query, and returns the data to be displayed to the user. The database management system is the engine that runs the database and enables the server side code to communicate with the database.
Here’s a quick look at how both SQL and DBMSs affect database technology:
Relational databases consist of two or more tables with connected information, each with columns and rows. These connected tables are called database objects; to create and manage them, you need a relational database management system (RDBMS). RDBMSs allow relational database developers to create and maintain a database program, including tools to:
- Query data
- Edit data
- Design the entire database structure
- Produce reports
- Validate data points and check for inconsistencies
- They often include a built-in programming language to automate some of these functions, such as SQL.
Querying for data in a relational database is, as mentioned, most commonly done using a variant of SQL. Using a query language such as SQL, you can fetch data based on the value of a specific column, join related data onto the result, perform advanced calculations, format the data in the way you prefer, and more.
Editing data can also be done using SQL. By executing a query, you can create rows, update rows, and delete rows based on criteria that you define. For example, if you want to update the shipping status of all orders that were placed on a specific date, you could use the below query:
UPDATE orders SET status = “SENT” where date = “2023-01-23”
Designing the structure of the database is also done using SQL. The structure of a relational database is made up of tables, columns, indices, and constraints. A table stores data of a certain type, for instance, orders in the e-commerce example above. Columns are the properties for the entities in the table. A table that holds orders might have columns like date, status, and customer_id. Indices optimize the way you can query the data. For example, if orders are usually queried for by the customer_id, you most likely want to define an index for the customer_id column. Constraints are rules for the data stored in the columns. For example, some columns might not be allowed to be left empty. For the orders example, customer_id again might not be allowed to be empty so that we make sure all orders are connected to a customer. Constraints are helpful to make sure no invalid data is allowed to be stored in the database.
SQL: The language of database access
Structured Query Language (SQL) is a standardized programming language for accessing and manipulating databases. In an RDBMS like MySQL, Sybase, Oracle, or IMB DM2, SQL writes programming that can manage data and stream data processing. SQL is like a database’s own version of a server-side script and is responsible for:
- Executing queries, which are “questions” asked of the database
- Retrieving data
- Editing data: inserting, updating, deleting, or creating new records
- Creating views
- Setting permissions
- Creating new databases
SQL is a standard programming language but has a number of variations—including some databases’ own proprietary SQL extensions.
Note: When it comes to creating new databases from the ground up, planning ahead is key. In the same way you need to plan ahead for the future of your site when choosing a software stack, how your database is structured from day one will have major implications for the health of your site down the road. Questions to consider include:
- What information will you have?
- How should it be stored?
- What data will your site need to retrieve regularly, and how?
When you have decided on the information that you will store in your database, it is time to define a data model or schema. This means you translate the different kinds of information into tables and columns and connect them to each other. The goal for the schema when you are working with a relational database is to make sure no information is stored twice in the database. Instead, one piece of information should be stored in it’s own table and referenced by other tables. One example of this is the orders table that references the customers table using the customer_id key.
NoSQL databases: Non-relational and distributed data
Relational databases are great at organizing and retrieving structured data, but what happens when your data is inconsistent, incomplete, or massive? In these cases, you need a more flexible database solution. As the kinds and amounts of data that we gather have exploded, the NoSQL database has evolved to solve the challenges of Big Data. These databases are non-relational and distributed. They deviate from the traditional relational model, addressing the issue that most modern data harvested from the web is not structured information. NoSQL lends flexibility, scalability, and variety—major advantages from a business standpoint, when you consider that growing data is a direct result of a growing business.
How does a NoSQL database work? Instead of tables, NoSQL databases are document-oriented. This way, non-structured data (such as articles, photos, social media data, videos, or an entire blog post) can be stored in a single document that can be easily found but isn’t necessarily categorized into a bunch of pre-set fields.
While NoSQL databases provide a lot of flexibility when developing modern applications, they might not always be your first choice. Relational databases with their strictly defined schemas enable easy querying of the data in almost any way you can think of. You can query any column in the database, join any related data onto the result, and use the result set in your application or export it as a spreadsheet. NoSQL databases are usually limited to querying by a predefined primary key that retrieves the entire document stored using the key. Some NoSQL databases, however, enable you to define multiple keys that you can query and sort by. They do not reach the level of query flexibility that a relational database does, however. The upside of this technical design is that performance is highly optimized and can reach single-digit millisecond response times if you define your keys and queries according to best practices. Read more about the difference between SQL and NoSQL databases.
An in-memory database is one that uses memory for data storage instead of on a disk or an SSD. The benefits of using this type of database include:
- Speed: In-memory databases don’t have to rely on disks or SSDs to retrieve data, so it’s much faster.
- More cost-effective: Typically, in-memory databases are more cost-effective than their counterparts because you can increase storage more easily and at a cheaper rate.
Some disadvantages of using in-memory databases include:
- No one-size-fits-all: If you’re switching to in-memory computing, you can’t always simply swap out whatever you were using before; it will take some time and the help of an expert.
- Not a permanent data storage solution: You cannot expect the data to be persisted over long periods of time. A solution to this is to periodically persist the contents of the in-memory database to a more permanent place, such as a disk.
The use cases for these types of databases are those that require microsecond response times and see large spikes in traffic, such as real-time applications and caching layers.
Common database terms to know
Here are a few major database characteristics that are helpful to know when weighing one type of database against another—aspects like how databases grow, protect against failure, duplicate data for speed, safety, and accessibility.
Scalability refers to the ability to scale out or scale up your database so you can hold more data without sacrificing performance. Some questions to think about when deciding how you scale include:
- How much do you expect your data to grow (and how soon)?
- Do you need a highly scalable database that will be reliable even as the amount of data you’re processing grows exponentially?
- Will one server be enough or do you anticipate needing to add additional ones?
- Do you need horizontal scaling or vertical scaling?
When considering scalability, make sure to research the possibilities that you have with your hosting provider. If you are using one of the largest cloud providers, you should be able to set up automatic scaling of your database from a single cheap instance for your development workloads to multiple large instances for your production workloads.
Related to horizontal scaling, sharding is a technique for storing massive databases across multiple servers. It achieves this by splitting different rows into different tables. For instance, a database of customer names might store customers with last names starting with letters A through M on one shard, while N through Z are stored on another. Sharding can help minimize response times for queries while also allowing data to be stored across a large number of cost-effective servers.
Replication is the process of frequent copying of data from one database onto other databases on other servers. Using replication, you could, for example, have one database instance that is used for writing data and one instance that is used for reading data. This can improve the response times and capacity of your database significantly if your site needs real-time access to update and synchronize data.
Latency refers to the time it takes for data to complete a “round trip” between the database server and the application server. When an app queries its database for data, this is how long it takes the server to return that data. The lower the latency, the better, but low latency often comes at a cost to other features, like consistency and availability.
A simple way to ensure that the latency is as low as possible between your application and your database is to make sure your database server is as close to your application server as possible. Another way is to investigate how your application queries data in your database and optimize the database schema accordingly by using indices, for example.
When writing to a database, it’s important that changes to the data don’t violate the rules of the database. Consistency ensures that transactions don’t produce errors that can make the entire database invalid. A fully consistent system means that as soon you successfully write a record to a database, you’ll also be able to request it. This is especially important for things like financial transactions. Consistency comes at a cost to speed and availability, however. That’s why many NoSQL databases opt for an eventually consistent model that allows for faster reading and writing.
Availability refers to whether the system is able to quickly respond to a request, even when failures occur. For example, you could set up a secondary database instance that is periodically synced with your primary instance. This second instance could then be used as the primary instance should it go down. The downside of this is that databases that are spread across multiple servers can result in out-of-date or incorrect data being displayed, especially in an eventually consistent system. Depending on your business needs, however, slightly out-of-date data may be preferable to delays that prevent the whole system from functioning.
Failures are inevitable, but there are plenty of contingency plans you can put into place to ensure that data is still available and your app doesn’t crash. Having “no single point of failure” ensures that an app can keep functioning without interruption, usually through replication or redundancy. Databases do this differently, with varying degrees of cost and footprint.
To help you evaluate which database to choose based on these characteristics, make sure to look for database freelancers on Upwork. There, you can find database professionals who have worked with many different database technologies.
New trends in database technologies in 2021
In some industries, the primary databases used have barely changed in the last decade. However, with the ever-increasing amounts of data we store, new ways to store and retrieve that data are needed. In this section, we will go through some of the newest trends we’ve seen.
Traditionally, database servers were run in data warehouses owned by the company using the database. In recent years, the public cloud providers’ solutions for hosting databases have become the standard. When hosting a database in the cloud, you do not need to worry about the infrastructure backing the database and you can utilize many different tools to scale, configure, and manage it.
Some ways businesses are using cloud databases is to host information you want multiple team members to have access to—particularly if you are a hybrid workforce—and to easily expand the amount of storage they need quickly instead of taking the time to buy new servers.
Database automation refers to different processes and tools used to make administrative tasks of a database more efficient. Instead of manually creating backups, scaling infrastructure, and updating the schema, there are many different tools to automate these types of tasks. Most cloud hosting providers include automation tools in their offerings.
When it comes to your business, an example of database automation is continually organizing your data when anything new is added. Another form of automation is running automatic backups to ensure your collection of data doesn’t go missing in case there is a server failure.
Scaling database data for businesses
As your business scales, so will the data that you manage within your business. In the past, this might mean that you would simply add a new table to the huge relational database that holds all of your business data. Today, there might be better solutions depending on the type of data that you want to store. For example, you might need a NoSQL database for unstructured business data that you need to access in a highly performant way. This new database can then live in parallel with other specialized databases and the data can be accessed by different applications as they need it.
In practice, this could mean rethinking how you scale your data storage. For instance, your business could scale up and buy more physical storage. You can also choose to scale horizontally, meaning you add more memory instead of physical storage.
Increase in database security
Information security has never been as relevant as it is today. The many leaks that we have seen from huge companies have impacted millions of people and increased the awareness of security concerns for the average person.
During the last few years, many new tools for database security have emerged. For example, many businesses rely on encryption provided by large cloud providers to help protect their data. There are also more traditional ways you can approach protecting your data, like limiting how many people have access to your data.
Now that you’ve got an overview of database technology, who do you need to help you build and maintain your database systems? Relational and non-relational database management systems can get extremely complicated and definitely require upkeep.
While anyone can manage a single-file database on their own through Microsoft Azure, you’ll want to hire a capable database architect to manage your RDBMS or NoSQL database management. Explore database administrators on Upwork today.