Archive for September, 2009

Things to Consider When Buying Data Warehouse Tools

Wednesday, September 30th, 2009

Summary: Data warehousing tools are the tools that support data warehouse functions in different stages. These are basically software applications needed for the ETL (Extract, Transform, Load) process. These tools extract and transform data from operational systems and then help load that data into the data warehouse that further assists managers or users of an organization in the business decision-making process. The ETL process involves very complex activities. In order to facilitate the ETL process it is important to employ the right data warehouse tools.

Companies that consider having their data warehouses may buy data warehouse tools from third parties or they can develop their own tools. For this job they often engage in-house programmers. However, when data transformation requirements seem challenging companies are recommended to seek the help of third party tools that would be more advantageous. But when buying data warehousing management tools a few things should be kept in mind. Below are a few aspects that need your attention:

- Functional capability: The tools that you choose should be able to handle both the transformation as well as cleansing part of a data-warehousing project. If a tool has strong capability for both the tasks, then you may consider buying it. It is very important for a data warehouse tool to have strong capability.

- Ability to read directly from data source: A tool should have the ability to read directly from the data source. As we all know a data warehouse gets its data from varied sources and when a tool has this ability it would make the processing faster and more efficient.

- Metadata support: A warehouse management tool must have the capability to handle metadata. This aspect is very important because metadata of a data warehouse is used to map the source data to its destination.

There are plenty of data warehouse management tools developers in the market. To make your search for ETL tools easier below is a piece of information about popular ETL tools:

- The IBM WebSphere DataStage is an ETL tool, formerly known as Ardent DataStage and Ascential DataStage. This is a part of the IBM WebSphere Information Integration suite and the IBM Information Server, which is very easy to use, thanks to its visual interface. The tool is available in many versions including the Server Edition and the Enterprise Edition.

- Business Objects is a French company known for its enterprise software products. Its Data Integrator, integration and ETL tool is a popular product that was previously known as Acta. The ETL tool features the Data Integrator Job Server and the Data Integrator Designer.

Online Analytical Processing For Data Warehousing

Friday, September 18th, 2009

Summary: Data warehouses have played a very important role in organizational settings in the recent times. These can be used for sophisticated enterprise intelligence systems that process queries required to discover trends and analyze critical factors in the marketplace. These systems are known as online analytical processing (OLAP) systems. OLAP systems help designers organize data in the warehouse distinctively. The data in data warehouses is organized differently than in traditional transaction processing databases.

OLAP systems are designed in an intention to handle the queries in an organization required to discover trends and critical factors. This type of queries basically requires large amounts of data. OLAP data is always organized into multidimensional cubes. In other words an OLAP structure created from the operational data is called an OLAP cube. The cube is created from a start schema of tables. In this type of schema, the fact table is placed at the center and linked to numerous dimension tables. The fact table contains the core facts, which make up the query. Dimension tables indicate how the aggregations of relational data can be analyzed.

The multidimensional cube structure of data gives better performance for OLAP queries as compared to the structure where data is organized in relational tables. The basic unit of a multidimensional cube is called a measure. Measures are the units of data that are being analyzed. Take the example of a corporation that operates hardware stores. Suppose it wants to analyze revenue and discounts for the different products it sells. In this case, the measures would be the number of units sold, revenue and the sum of any discounts. These measures are organized along dimensions. A three dimensional cube in this example would have time, store and products as the three dimensions.

Further, each dimension is divided into units called members and the members of a dimension are typically organized into a hierarchy. Similar members are then grouped together as a level of the hierarchy. For example, the top hierarchy level of a time dimension can be years, with months at the next level, then weeks, days and finally hours at the bottom level of the hierarchy. At each intersection of the three dimensions, the values for the measures that match those three dimension values are recorded.

When it comes to the specific dimensions and measures for the cubes in an OLAP system, the kinds of analysis come across as an important aspect. An OLAP system operates on OLAP data in data warehouses. The reason behind using OLAP in data warehousing is speed. OLAP systems provide rapid access to large amounts of performance data from different viewpoints in order to assist business analysts and managers throughout an enterprise.