Data Warehouse Thinking and Practice (1) – Why We Should Learn Data Warehouse

I used to briefly understand big data-related technologies such as Hadoop and Spark, but I always felt that there was nothing missing. I didn’t know where to start. How to apply it to practice. Until a while ago, I accidentally discovered “The Data Warehouse Toolkit,” The book “3rd Edition” is very enjoyable after reading. Many problems that are lingering in my mind are solved, and some inspirations have been solved to solve the problems encountered in the current business.

In this context, I intend to combine my own experience, and continue to write a series of articles on the data warehouse, recording my practice and thinking in the data warehouse. As the beginning of a series of articles, this article mainly talks about my understanding of the data warehouse from a macro level. If there are any flaws, I welcome you to enlighten me. The following is the main content of this article:

  1. What is a data warehouse?

  2. The relationship between data warehouse and big data

  3. Transactional system and data warehouse system

  4. The process of building a data warehouse

  5. Why do we say that we should learn a little about data warehousing?

What is a data warehouse?

Data warehousing is nothing new. It has been around for decades, dating back to the 1960s, and then by the data warehouse father Bill Inmon and the 1980s proposed a widely accepted data warehouse definition:

A data warehouse is a collection of data that supports subject-oriented (Non-Volatile), integrated (Integrate), and Time Variant to support management decisions.

The core of this sentence is that the purpose of the data warehouse is to support management decisions.

Without the support of the data warehouse, usually the management decisions are more dependent on the manager’s feelings or experience, and there is a great possibility that the decision-making mistakes will cause significant losses.

With the data warehouse, we can make more reasonable analysis through data, combined with historical records, multi-dimensional analysis, help us make more accurate judgments, thus helping companies seize opportunities, reduce losses, and maximize revenue. Recently, the concept of fire data-driven is also inseparable. By fully exploiting the value of data, data can help us make decisions.

The relationship between data warehouse and big data

There was no big data concept when the data warehouse was born. So in essence, data warehouses are not necessarily related to big data, and the existence of data warehouses does not depend on big data.

However, with the development of the Internet, people are generating more and more data. TB PG-level data is becoming more and more common. Traditional technologies cannot handle such a large amount of data. Therefore, data warehouses also need to introduce big data technology as a data warehouse. A loop to solve the large number of storage and analysis problems, so that the data warehouse is still efficient under the large amount of data.

Personally, I think data warehouses are more of a set of ideas and architectural methods, regardless of the amount of data, and big data technology is to serve this set of ideas and architecture.

Transactional system and data warehouse system

The transactional system is also the OLTP application system. It is the system we contact the most. The system is designed to meet fast transaction processing, such as shopping, buying tickets, and killing. Each request processed by the system generally involves only one or more resources, and can complete the request processing and return the result in a very short time.

However, when it comes to complex aggregate statistical queries, such systems are more difficult to deal with. For example, to query certain types of users to purchase the most products in the past three months, because the large amount of data needs to be queried at the same time, the OLTP system is not good at processing. This type of demand.

But the data warehouse is different. The data in the data warehouse is organized by theme. The storage is optimized. The data of each topic is serviced for the analysis of this topic. It is born to carry on complex aggregation. And statistics, so the data warehouse system can respond to a variety of complex statistical queries in a shorter time.

The so-called theme here can be sales, and can be user behavior. Through the analysis of sales, we can know when those products have the highest sales to those people. By analyzing the behavior of users, we can know which people are most commonly used for those functions, the highest viscosity, and the like.

The process of building a data warehouse

The construction of the data warehouse has some best processes and practices. The correct process can make the data warehouse construction more efficient and reduce the cost. Building a data warehouse system mainly involves the following steps.

Collect business requirements (Business Requirements)

As we said earlier, the purpose of data warehousing is to support management decisions, so the first thing we need to consider when building a data warehouse is to collect business needs, understand the business processes of the enterprise, and grasp the core requirements. To provide guidance for the establishment of the data warehouse, in order to provide better decision data support for the business, and maximize the value of the data warehouse.

Data modeling

Once you understand the business in depth, you can start building a data model because a company may have multiple product lines, and a product typically involves multiple business processes. Therefore, to establish a data model, you first need to select a business process, and then select appropriate data and reasonable latitude for this process to model. The rationality of modeling directly affects the effect of data analysis and the scalability of the system in the future.

Because the analysis needs are always diverse and unpredictable, only by establishing a suitable data model can we more easily face the ever-changing analysis needs in the future. The importance of modeling can be seen.

Data processing and processing

After the data modeling is completed, the data can be processed and processed. In the data warehouse, we call it the ETL (Extraction, Transformation, and Loading) process. The main task is to extract data from various business systems or log systems, and do the necessary The conversion ensures that the data is correct and then loaded into the data warehouse system for later query and analysis.

This process varies greatly depending on the business and the amount of data. For small systems, there may be several SQL statements, a simple script can be done, and then for large systems (large amounts of data), we may need to use a big data framework such as Hadoop, Spark to do well.

Data query and analysis

After the previous process is completed, the data query can be analyzed. This is the final stage and the value generation stage.

Data analysis can be done in the query language provided by the data warehouse system, or you can write Business Intelligence applications based on these query languages ​​to provide visual reports to support management decisions.

Why do we say that we should learn a little about data warehousing?

I learned that the data warehouse is because of the problem of statistical analysis of data in the business, so I tried to find a better solution. After finding out the clues, I finally found some clues, and then I learned about the data warehouse and started to learn. Convinced by his thoughts, can not extricate themselves.

By modeling and reorganizing data, data warehouses make data easier to be statistically analyzed and analyzed, providing a new idea and concept for statistical analysis of data. Even if we don’t use the data warehouse to do the more general purpose mentioned above, the data warehouse can provide us with a different idea, which can be used in the system design process to make the system simpler and more efficient.

Therefore, learning a data warehouse does not necessarily require building a data warehouse system, and may also provide different ideas and inspirations for the design of business systems. So maybe you can also take a look and learn a lesson~

This article is all about this, in general terms, the next article will write about data warehouse data modeling related things.


This article first appeared in the “code writing poetry” WeChat public number: