Session 7-10. Modern IT Architecture

Document Sample
Session 7-10. Modern IT Architecture Powered By Docstoc
					PART Ⅱ Introduction to the
 Foundation of Information
      Technologies


        Session 8-9
Data Resource Management

       HUANG Lihua, Fudan University
                 Content
• Database Concepts & Technology
   – Experiment: ACCESS
• Database Trends
• Data Trends of Application
   – Data Warehouse
   – OLAP
   – DATA Mining
• Creating Database Environment
              HUANG Lihua, Fudan University
                    DATA
• Streams of raw facts representing events
  such as business transactions, simple
  observations of the state of the world




               HUANG Lihua, Fudan University
               FILE ORGANIZATION
• A computer system organizes data in a hierarchy
  that begins with bits, and proceeds to bytes, fields,
  records, files, and database.
                                  File
               Record                                  Record

       Field             Field                Field                Filed

Byte           Byte                Byte                     Byte

        Bit                Bit                        Bit              Bit
                      HUANG Lihua, Fudan University
           FILE ORGANIZATION
• BIT: Binary Digit (0,1;Y,N;On, Off)
• BYTE: Combination of BITS which
  represent a CHARACTER
• FIELD: A logical grouping of characters into a
  word, a group of words, or a complete number.
• RECORD: Collection of FIELDS which reflect a
  TRANSACTION
• FILE: A Collection of Similar RECORDS
• DATABASE: An Organization’s Electronic
  Library of FILES
               HUANG Lihua, Fudan University
      FILE ORGANIZATION                         For Example,
   • Filed: Student’s name;
   • Record:
         Number    Name       Course           Date    Grade
  A      9525012   Zhang        MIS          1998.9     A
record              Yan
     •File:
         Number    Name        Course          Date    Grade
         9525012 Zhang           MIS          1998.9     A
                   Yan
         9525018 Jeff Yu         MIS          1998.9     A

         9525027   He Hui        MIS          1998.9     B
            …        …            …              …      …
                      HUANG Lihua, Fudan University
       FILE ORGANIZATION : Another way of
       thinking about database components——

• ENTITY: Person, Place, Thing, Event about
  Which Data Must be Kept ( a record
  describes an entity )
• ATTRIBUTE: Description of a Particular
  ENTITY (corresponds to fields)
• KEY FIELD: Field Used to Retrieve,
  Update, Sort RECORD
                        *

                HUANG Lihua, Fudan University
                 FILE ORGANIZATION
                                                                           Record
Key Field     Key Field                     Attribute

    FDU NO.    HKU NO.       NAME           SEX       TEL(O)    TEL(H)

    98HM001     93835     Xie Mingqiang      M                  58702331

    98HM002     93840        Yu Bing         F                  65110968

    98hm003     93841       Wang Pei         F       58711001   63568504
                                                      -23306

    98HM004     93842       Ge Ruijin        M       56938860   56873143

    98HM005     93843      Wang Xintao       M       58611828   65352394

    98HM006     93844       Fu Qiang         F       58666060   58836304
                                                       -6007


                 File
                          HUANG Lihua, Fudan University
            KEY FIELD
• Field in Each Record
• Uniquely Identifies THIS Record
• For RETRIEVAL
           UPDATING
               SORTING
                    *
             HUANG Lihua, Fudan University
  Accessing Records from Computer
 Files:Sequential vs. Direct or random
           file organization
• SEQUENTIAL: Data records must be
  retrieved in the same physical sequence in
  which they are stored. (Magnetic tape )
• DIRECT: Data can be accessed without
  regard to physical sequence. (Disk)
                       *
                                              Sequential file
        Direct file
                                              organization
        organization

                       HUANG Lihua, Fudan University
       Traditional File Processing & File
                 Organization
registration       Class programs                Class file




accounting         Accounts programs             Class file




athletics           Sports programs              Class file


                 HUANG Lihua, Fudan University
Traditional File Processing & File
          Organization




         HUANG Lihua, Fudan University
        Problems Arising from the File Organization

• Data Redundancy: The same piece of information could be
  duplicated in several files.
• Data Inconsistency:
• Data Isolation: Data files are likely to be organized
  differently, stored in different formats, and often physically
  inaccessible to other applications.
• data integrity problem: It is difficult to place data integrity
  constraints across multiple data files.
• Application and Data Independence: In the file
  environment, the applications and their associated data
  files are dependent on each other.
• Poor security: is difficult to enforce in the file environment.
• Lack of data sharing & availability                               Flat
                       HUANG Lihua, Fudan University
                                                                    File
Problems Arising from the File Organization
•   Data Redundancy;
•   Data Inconsistency;
•   Data Isolation, data integrity problem;
•   Application and Data Independence;
•   Security, data sharing problem.




          These problems led to the
          development of DATABASE
                    HUANG Lihua, Fudan University
             DATABASE
• A Database is an organized logical grouping
  of related files.
• In a Database, data are stored & managed in
  a convenient form, and integrated and
  related so that one set of software programs
  provides access to all the data.



               HUANG Lihua, Fudan University
              DATABASE
• Collection of centralized data
• Controls redundant data
• Data stored so as to appear to users in one
  location
• Services multiple application



                HUANG Lihua, Fudan University
    DATABASE MANAGEMENT SYSTEM
         (数据库管理系统DBMS)
•    Software to create & maintain DATA
     enables business applications to extract
     data independent of specific computer
     programs.




                 HUANG Lihua, Fudan University
registrar                            Class programs                               Class file


accounting                          Accounts programs                             Class file


athletics                            Sports programs                              Class file

     Computer based files of this type cause problems such as redundancy, inconsistency, and data isolation.



 registrar                      Class programs
                                                                                       Database
                                                                                       Class file
accounting                      Accounts programs                   DBMS              Accounts file
                                                                                       Sports file

                                Sports programs                                 DBMS provides access
 athletics                                                                      to all data in the
                                      HUANG Lihua, Fudan University             database
Database Environment




     HUANG Lihua, Fudan University
  COMPONENTS OF DBMS:
• DATA DEFINITION LANGUAGE:
  – Defines Data Elements in Database
• DATA MANIPULATION LANGUAGE:
  – Manipulates Data for Applications
  – For Example: For extracting data from database, e.g.
    SQL
• DATA DICTIONARY:
  – Formal Definitions of all Variables in Database;
    Controls Variety of Database Contents
                                *

                   HUANG Lihua, Fudan University
Sample data dictionary report




         HUANG Lihua, Fudan University
Fundamental Database Structures




          HUANG Lihua, Fudan University
        HIERARCHICAL DATABASE

ROOT                               Employer


FIRST     Compensation          Job                 Benefits
CHILD                       Assignments

2nd
        Ratings   Salary              Pension       Insurance   Health
CHILD




                    HUANG Lihua, Fudan University
  Type of RELATIONS

ONE-TO-ONE:          STUDENT                   ID


                                 CLASS
ONE-TO-MANY:
                 STUDENT        STUDENT        STUDENT
                    A              B              C
                        COURSE          COURSE
MANY-TO-MANY:
                          1                2


                  STUDENT STUDENT              STUDENT
              HUANG Lihua,A
                          Fudan University B      C
  NETWORK DATA MODEL
• Variation of Hierarchical Model
• Useful for many-to-many relationships

          COURSE              COURSE
            1                   2


     STUDENT     STUDENT               STUDENT
        A           B                     C
               HUANG Lihua, Fudan University
 Disadvantages of Hierarchical and
         Network DBMS

• Outdated
• Less flexible compared to RDBMS
• Lack support for ad-hoc and English
  language-like queries



               HUANG Lihua, Fudan University
       RELATIONAL DATA MODEL

•   DATA IN TABLE FORMAT
•   RELATION: TABLE
•   Tuple(元组): ROW (record 记录) IN TABLE
•   Field: COLUMN (attribute 属性) IN TABLE
               HOURS * RATE         TOTAL
     ABLE      $ 40.50 $ 10.35 $ 419.18
    BAXTER $ 38.00 $ 8.75 $ 332.50
     CHEN      $ 42.70 $ 9.25 $ 394.98
    DENVER $ 35.90 $ 9.50 $ 341.05
                HUANG Lihua, Fudan University
           Example DB: Fortune 500
                 Companies
• company
   Comp. name sales      assets     netincome        empls    indcode     yr

     allied    9115000   13271000   -279000          143800   37          85

     boeing    9035000   7593000    292000           95700    37          82

     ...



                                                indcode       indname

                                                42            pharmaceuticals

• industry codes                                44            computers

                         HUANG Lihua, Fudan University
                                             ...
The Relational Data Model




       HUANG Lihua, Fudan University
Current DBMS: Relational Database
• DBMS Vendor
  –   MS: Access, SQL Server
  –   Oracle
  –   Sybase
  –   DB2
  –   Informix
  –   MySQL


                 HUANG Lihua, Fudan University
 The Relational Database Model
 The relational model is based on a simple concept
  of tables in order to capitalize on characteristics of
  rows and columns of data, which is consistent
  with real-world business situations.

 One of the greatest advantages of the relational
  model is its conceptual simplicity and the ability
  to link records in a way that is not predefined.


                   HUANG Lihua, Fudan University
     The Relational Abstraction
• Information is in tables
    – Also called (base) relations
• Columns define attributes(属性、字段、数据项)
    – Also called fields or domains
• Rows define records
    – Also called tuples(元组)
• Cells contain values
    – All cells in column have information of same type
       • e.g., integer, floating point, text, date

                       HUANG Lihua, Fudan University
          Operations on Tables
• Add new rows (or sometimes columns)
     – Modify existing rows
•   Choose a subset of columns
•   Choose a subset of rows
•   Combine rows (e.g., sum values in a column)
•   Combine columns
•   Combine two tables (join)
•   No operations to combine individual cells
     – Unlike spreadsheet
                   HUANG Lihua, Fudan University
       Three Basic Operations in a
          Relational Database
• Select:
   – Creates subset of rows that meet specific criteria
• Join:
   – Combines relational tables to provide users with
     information
• Project:
   – Enables users to create new tables containing only
     relevant information


                    HUANG Lihua, Fudan University
The three basic operations of a
      relational DBMS




         HUANG Lihua, Fudan University
   Operating on Databases: SQL
• Every abstraction needs an interface through
  which users invoke abstract operations
    – graphical interface
    – language
• Structured Query Language
    – Select …(content)…. From… (table)..Where
      (condition)
• We'll focus only on queries
    – Query = question
    – Extract some data from one or more tables to answer a
      particular question Lihua, Fudan University
                    HUANG
 Physical vs. Logical Data View
• Minimizes these problems by providing two
  “views” of the database data:
  – The physical view deals with the actual, physical
    arrangement and location of data in the direct access
    storage devices (DASD).
  – The logical view, or user’s view, represents data in a
    format that is meaningful to a user and to the software
    programs that process that data.
     • Entity-relationship diagram (ER图): Methodology for
       documenting databases illustrating relationships between database
       entities
     • Normalization (范式): Process of creating small stable data
       structures from complex groups of data
                      HUANG Lihua, Fudan University
Entity-relationship diagram




        HUANG Lihua, Fudan University
  Experiment: Microsoft Access
• Features:
    – Create/Modify databases
    – Specify/Run queries
    – Design/Print reports
    – Design graphical user interfaces around databases
       • Forms for entering, viewing data
• Assignment: P136 APP. Exer 3
              P.169 App. Exer 1

                     HUANG Lihua, Fudan University
                 Content
• Database Concepts & Technology
   – Experiment: ACCESS
• Database Trends
• Data Trends of Application
   – Data Warehouse
   – OLAP
   – DATA Mining
• Creating Database Environment
              HUANG Lihua, Fudan University
       2. Database Trends(1)
• The evolution of Database System
• Data
  – Simple data => Multimedia data, Knowledge
• Model
  – Relational model => OO model
                  Object relational model


                HUANG Lihua, Fudan University
         Databases Trends (2)
• Application
  – OLTP => OLAP
• Data organization
  – Database => Data warehouse, Data Marts
• Query language
  – SQL => Deductive


                HUANG Lihua, Fudan University
  Emerging Database Models
The most common database models are:
    Multimedia database
    Deductive databases
    Object-oriented databases
    Multimedia and hypermedia
    databases
   Multidimensional Database
           HUANG Lihua, Fudan University
Object-Oriented Database Model
• Object-oriented (OO) databases store both data
  and procedures acting on the data, as objects.
• Encapsulation Capability
   – The OO database can be particularly helpful in
     multimedia environments, such as in manufacturing
     sites using CAD/CAM.
   – OO databases can be particularly useful in supporting
     temporal and spatial(时空) dimensions.
• Terminology in the OO model includes:
   – objects, attributes, classes, methods, and messages.

                    HUANG Lihua, Fudan University
   Hypermedia Database Model
 The hypermedia database model stores
  chunks of information in the form of
  nodes connected by links established by
  the user.
 The nodes can contain text, graphics,
  sound, full-motion video, or executable
  computer programs.
 Users can branch to related information in
  any kind of relationship.
               HUANG Lihua, Fudan University
A hypermedia database




     HUANG Lihua, Fudan University
    Multidimensional Database
• A variation of the relational model
• Use multidimensional structures to organize data and
  express the relationship between data.
• A dimension of the data : a side of a cube.
      • ①多维数组
         – (北京,1999年,彩电,10000)
         – (地理位置,年份,产品类型,销售额)
      • ②维的层次
         – 例如:年、季度、月份、日期
         –   国家、地区、省、城市
      • ③维内元素的类
         – 例如: 按产品的价格分成高、中、低档。
         –       按原材料的成本价格分类
                HUANG Lihua, Fudan University
Multidimensional data model




        HUANG Lihua, Fudan University
         Specialized Databases
• There are many specialized databases, depending
  on the type or format of data stored.
   – A geographical information database contains
     locational data for overlaying on maps or images.
   – A knowledge database stores decision rules used to
     evaluate situations and help users make decisions like
     an expert.
   – A multimedia database stores data on many media—
     sounds, video, images, graphic animation, and text.

                    HUANG Lihua, Fudan University
                 Content
• Database Concepts & Technology
   – Experiment: ACCESS
• Database Trends
• Data Trends of Application
   – Data Warehouse
   – OLAP
   – DATA Mining
• Creating Database Environment
              HUANG Lihua, Fudan University
  3. Data Trends of Application

• Data Warehouse
• OLAP
• DATA Mining




             HUANG Lihua, Fudan University
  From Database to Data
         Warehousing
• 随着信息技术在企业的广泛应用,企业积累了
大量数据
• 企业所面对的问题不是简单地处理数据,而是
如何使用数据
 ----从操作处理(Operational Processing) 到分
析处理(Analytical Processing)


           HUANG Lihua, Fudan University
Operational vs. Decision Support
            Systems
• Operational Processing in Operational Systems
   – Support day to day transactions
   – Contain current, “up to date” data
   – Examples: customer orders, inventory levels, bank
     account balances
• Analytical Processing in Decision Support
  Systems
   – Support strategic decision making
   – Contain historical, “summarized” data
   – Examples: performance summary, customer
     profitability, market segmentation

                   HUANG Lihua, Fudan University
     Why data warehouses?
• Decision Support Data
    – Are found in many different databases
       • within the company
       • outside the company
    – In practical terms, locating and integrating all this
      information in real time is very difficult
• Solution:
    – Create separate repositories of data for decision
      support
    – => data warehouses

                   HUANG Lihua, Fudan University
          Data Warehousing
• Stores current and historical data
  – Consolidates data for management analysis and
    decision making
  – Supports reporting and query tools
• “数据仓库之父” W. H. Inmon给的定义:
  数据仓库就是一个用以更好地支持企业或
  组织的决策分析处理的,面向主题的,集
  成的,不可更新的,随时间不断变化的数
  据集合。
                HUANG Lihua, Fudan University
      Characteristics of Data Warehouses
1)   Organization. Data are organized by detailed subjects.
2)   Consistency. Data in different operational databases may be
     encoded differently. In the warehouse they will be coded in a
     consistent manner.
3)   Time variant. The data are kept for 5 to 10 years so they can be
     used for trends, forecasting, and comparisons over time.
4)   Non-volatile. Once entered into the warehouse, data are not
     updated.
5)   Relational. The data warehouse uses a relational structure.
6)   Client/server. The data warehouse uses the client/server to provide
     the end user an easy access to its data.

                       HUANG Lihua, Fudan University
     Data Warehouse Suitability
Data warehousing is most appropriate for
 organizations in which some of the following
 apply.
 Large amounts of data need to be accessed by end-users.
 The operational data are stored in different systems.
 An information-based approach to management is in use.
 There is a large, diverse customer base.
 The same data are represented differently in different
  systems.
 Data are stored in highly technical formats that are difficult to
  decipher.
 Extensive end-user computing is performed.

                    HUANG Lihua, Fudan University
 Comparison with DB Systems
• DB Systems                       • DWS Systems
  – 数据库:操作型数据                           – 数据仓库:分析型数
    (Operational Data),增、                 据(Analytical Data),
    删、改操作频繁                               极少有更新操作
  – 数据库核心:功能强大,                         – 数据仓库管理系统:
    面向OLTP (Online                        因极少有更新操作,
    Transaction Processing) 应用            故功能简单
  – 数据库工具:以查询工                          – 数据仓库工具:以分
    具为主                                   析工具为主


                   HUANG Lihua, Fudan University
                                            OLAP
                                            在线分析处理

                                             DSS
                                            决策支持系统
  Data
Warehouse                             Data Mining
                                             数据挖掘




Database                                    OLTP
                                            在线业务处理

            HUANG Lihua, Fudan University
        数据仓库的层次结构
 高度                                        分产线的
综合数据                                      月销售数据




综合数据                                      分产线的
                                          周销售数据




 原始数据                                    汇总的销售数据



 档案数据                                      销售数据

         HUANG Lihua, Fudan University
                             数据仓库的实现
           Operational
                                            Data Warehouse
           Databases
                                                     主题1
                                                             SS
                                              主题数据

Internal
data
source
                           取      数
                                                         S
                                                  主题2             Infomart
                                                                    Infomart
                           出      据
                                                         S
                                           主题数据

External
                           数      转
data
source
                           据      换
                                                  主题3
                                                         S        Datamart
                                                                   Datamart

                                                         S
                                           主题数据



                         Infomart(信息超市): 这是一个应用功能(或从应用功能产生的输出)可多次运行
                                          数据仓库,它是对应业务问题的答案。
                         Datamart(数据超市): 这是数据仓库的一个数据子集,它对应最终用户的信息需
                                    HUANG Lihua, Fudan University
                                           求。它比之数据仓库中的数据要更加归纳、汇总一些。
                    Data Marts
Data Marts are an alternative used by many other firms is
  creation of a lower cost, scaled-down version of a data
  warehouse. They refer to small warehouses that focus on
  specific aspects of a company, such as for a strategic
  business unit (SBU) or a department.
               Two major types of Data Marts:
  1) Replicated (dependent) Data Marts. In such cases one
  can replicate functional subsets of the data warehouse in
  smaller databases.
  2) Stand-Alone Data Marts. A company can have one or
  more independent data marts without having a data
  warehouse.        HUANG Lihua, Fudan University
              数据仓库的展现
Operational
Databases
                                                  主题1
                                       主题数据
                                                        SS

              取   数                                            Infomart

              出   据
                                           主题2
                                                    S                     查询
                                                    S
                                   主题数据
              数   转                                                       报表

              据   换
                                           主题3
                                                    S        Datamart

                                                    S
                                   主题数据

                                                                        在线
                                                                        分析


                  Metadata: 关于数据的数据
                  HUANG Lihua, Fudan University
            数据仓库的数据、信息流
         运行
OLTP     数据
                              数据仓库
  Operational
数据库 定义
  Databases                          主题1
                                 主
                                 题          信息
                                 表                            新数据集
                数据确认                        信息
                                                       数据取用
                数据转换                                   处理工具
                                 汇          信息                 再分析
                                                       在线分析
                                 总
                数据整合             表
                                                       数据挖掘

                数据汇总                                           新展示

                        数       据        市       场

                       HUANG Lihua, Fudan University
           提取、转换、加载(ETL)
          (Extract, Transform, Load)
• Extract
   – Consolidate data from
     several sources
• Transform                                      • 数据清理过程
   – Filter out unwanted data,                   • 解决冲突
     correct incorrect data,
     convert to new data                         • 使用效率的考
     elements, aggregate into                      虑
     new data subsets
• Load
   – Load into data warehourseFudan University
                     HUANG Lihua,
     Example of DWS Application
1、贸易中心需求
贸易中心市内卷烟销售采用三级批的方式:

卷烟收购       贸易中心          25个有限公司                 零售商   消费者
价                                                    集市价
格        收购价     调拨价                           批发价
                                                     网络零售价

销   1.   销售分析
售   2.   资源投放分析
分   3.   价格分析
析   4.   有限公司效益预测
    5.   网络建设信息分析
               HUANG Lihua, Fudan University
2、技术方案:
   SAS+DWA+IntrNet的解决方案

2.1 数据仓库逻辑配置方案
“市内销售分析”主题(Subject_insale)中包含两个层次
(1) 详细数据(DETAIL TABLE):
(2) 数据集合(DATA GROUP):
1) 根据需求分析需建立五个数据集合:
2) 销售量分析集合(01_QuantyData),
3) 货源投放分析集合(02_DistributeData),
4) 价格分析集合(03_PriceData),
5) 效益分析集合(04_BenefitData),
6) 网络建设分析集合(05_SalenetData)

           HUANG Lihua, Fudan University
数据仓库逻辑结构图
                  DW_tobacco_SH

                        Tobacco_DW

                               Global_subject
                                                      plan

                               prod
                                                      sale

                               store
                                                             Subject_insale
                             Subject_outsale
                                                                 Detail Logical Table
                                  Detail Logical Table
                                                                      (Detail Table 1)
                                       (Detail Table 1)
                                                                        ….
                                       …                              (Detail Table n)
                                       (Detail Table n)
                                                                   01_QuantyData
                                  01_MarketData
                                                                   02_DistributeDat
                                                                   a
                                  02_DistributeData
                                                                   03_PriceData
                                  03_PriceData
                                                                   04_BenefitData
                                  04_SubsaleData
                                                                   05_SalenetData
                        ODD_information_pub

                        ODD_insale
            HUANG Lihua, Fudan University
                        ODD_outsale
  SAS/IntrNet配置方案:
  SAS/IntrNet应用发布示意图
                       WEBPGM
                       (需开发部分)               DWA info
 Browser

                      SAS appsrver

 webserver               broker

通用标准B/W平台          SAS/IntrNet               SAS/DWA


             HUANG Lihua, Fudan University
              3、功能分析
销量分析 货源投放 价格分析 效益分析 网建客户信息分析 系统维护

 销量      比较        销量       有限公司1              有限公司2
 对比      同比
日期1 1999年04月01日
日期2 2000年02月28日
统计周期 月

销售性质 所有
牌号 301101 对照表
(产地/代码)
  有限公司   所有
  销售区域                                                 时间

 确定       重选

               HUANG Lihua, Fudan University

 比较复选框
 有限公司
 销售性质
 关注品牌



 牌号对照表

品牌产地        品牌号                          具体牌号
总计        总计                             总计
沪产烟       熊猫                             翻盖中华
代码输入:
外地烟       中华                             中华软壳
外烟        红双喜                            中华礼包
           。
          。。                              。
                                         。。

         HUANG Lihua, Fudan University
货源投放
1    控制牌号投放量分析(系数相关性)

    销量分析    货源投放         价格分析   效益分析          网建客户相关信息分析                     系统维护

    投放量分析    (分析类别)
                                                                              销量
起始日期       1999 年 09 月
截止
截止日期     : 2000 年 02 月                                                       集市价格

牌号选择
                                                                              库存
    确定        重选


                                                                              系数
     相关性分析

                             99/09   99/10   99/11   99/12   00/01   00/02
                           截止




                          HUANG Lihua, Fudan University
2   投放进度分析
     销量分析        货源投放           价格分析     效益分析          网建客户相关信息分析                         系统维护

    投放进度分析(分析类别)

    起始日期     2000 年 01 月 01
    日
                                                                                   销量
    截止日期
    截止       2000 年 02 月 01 日
                                                                集市价格
    统计周期     周

     牌号
                    对照表
     (代码)                                                                           投放

     有限公司        所有

                                                   有限公司库存
        确定       重选
                                  01/1   01/2   01/3   01/4   02/1   02/2   02/3   02/4
                                 截止




                                 HUANG Lihua, Fudan University
3.3 市内销售价格分析
1 价格分析
 销量分析     货源投放        价格分析    效益分析         网建客户相关信息分析                      系统维护

 价格分析       (分析类别)
                                                                          集市均价
起始日期    1999 年 09 月
01 日
截止日期
截止
       2000 年 02 月

统计周期    月

牌号                                                                   集市价
(代码)
            对照表

                                                                  网络批发价

  确定        重选
                          99/09   99/10   99/11   99/12   00/01   00/02
                         截止




                        HUANG Lihua, Fudan University
2 价量分析

销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护

价量分析      (分析类别)

                                                     集市价格
起始日期   1999 年 09 月
截止
截止日期   2000 年 02 月


有限公司      所有                                   条件选定的价格
牌号
            对照表
(代码)

销售性质
统计周期      周                                                 市内销量

 确定        重选         99/09 99/10 99/11 99/12 00/01 00/02
                     截止




                     HUANG Lihua, Fudan University
复旦大学数据管理与数据分析应用
                                   数据展现


主题数据分析层
                                    ROLAP


主题数据层
                                   主题数据库


共享数据层
                                   共享数据库



业务数据层
          人事数据库 学工数据库                 ●●●
                                             教务数据库 科研数据库



                                             教务子系统 科研子系统
                                      ●●●
          人事子系统 学工子系统
             HUANG Lihua, Fudan University
          OLAP, DM, KDD
• ON-LINE ANALYTICAL PROCESSING (OLAP
  在线分析处理): Tools for multi- dimensional
  data analysis
• DATAMINING(DM, 数据挖掘): Tools for
  finding hidden patterns, relationships, for
  predicting trends
• Knowledge Discovery in Databases (KDD知识发
  现):Tools for extracting useful knowledge
  from volumes of data.
               HUANG Lihua, Fudan University
             Multidimensionality
• Modern data and information may have several dimensions.
   – e.g. Management may be interested in examining sales figures in a
     certain city by product, by time period, by salesperson, and by
     store.
• It is important to provide the user with a technology that
  allows him or her to add, replace, or change dimensions
  quickly and easily in a table and/or graphical presentation.
• The technology of slicing(切片), dicing(切块),
  Drilling (钻取)and similar manipulations is called
  Multidimensionality.


                       HUANG Lihua, Fudan University
         Multidimensionality and OLAP


Sale                                                             Product

                                                            P1

                                                            P2

                                                            P3

                                                            P4

                                                            P5
  Americas
             Europe
                      Asi
 Area                 a
                                                    time



                            HUANG Lihua, Fudan University
  Multidimensionality= Flexible Analysis




 Time series analysis                          Product comparison




Area Comparsion                                   Special Analysis
                        HUANG Lihua, Fudan University
             Data Mining
• Exciting new set of tools for using data
  warehouses
• Combination of AI and statistical
  analysis to discover information that is
  “hidden” in the data
• OLAP versus Data Mining


              HUANG Lihua, Fudan University
       Data Mining Analysis
             Method
• associations
   – e.g. linking purchase of pizza with beer
• sequences
   – e.g. tying events together: marriage and purchase
     of furniture
• classifications
   – e.g. recognizing patterns such as the attributes of
     customers that are most likely to quit
• Forecasting
   – e.g. predicting buying habits of customers based
     on past patterns
                    HUANG Lihua, Fudan University
          Applications of Data Mining
   Data Mining is currently being used in the following areas;

 Retailing & Sales                       Insurance
 Banking                                 Policework
 Manufacturing & Production              Government & Defense
                                          Airlines
 Brokerage & Securities
                                          Health care
  trading
                                          Broadcasting
 Computer hardware &
  software                                Marketing


                       HUANG Lihua, Fudan University
 Knowledge Discovery in Databases
             (KDD)
• KDD is the process of extracting useful knowledge from
  volumes of data.
• It is the subject of extensive research.
• KDD’s objective is to identify valid, novel, potentially
  useful, and ultimately understandable patterns in data.
• KDD is useful because it is supported by three
  technologies that are now sufficiently mature:
   – Massive data collection
   – Powerful multiprocessor computers
   – Data mining algorithms


                      HUANG Lihua, Fudan University
                                       Evolution of KDD

                      Stages in the Evolution of Knowledge Discovery
Evolutionary Stage         Business Question            Enabling Technologies        Characteristics
Data Collection            What was my total            Computer, tapes, disks.      Retrospective, static
(1960s)                    revenue in the last five                                  data delivery
                           years?
Data Access (1980s)        What were unit sales in      Relational databases         Retrospective,
                           New England last March?      (RDBMS), structured query    dynamic data delivery
                                                        language (SQL)               at record level
Data Warehousing &         Drill down to Boston?        Online analytic processing   Retrospective,
Decision Support                                        (OLAP), multidimensional     dynamic data delivery
(early 1990s)                                           databases, data warehouses   at multiple levels
Intelligent Data           What’s likely to happen to   Advanced algorithms,         Prospective,
Mining (late 1990s)        Boston unit sales next       multiprocessor computers,    proactive information
                           month? Why?                  massive databases            delivery
Source: Courtesy of Accrue Software.



                                         HUANG Lihua, Fudan University
                 Content
• Database Concepts & Technology
   – Experiment: ACCESS
• Database Trends
• Data Trends of Application
   – Data Warehouse
   – OLAP
   – DATA Mining
• Creating Database Environment
              HUANG Lihua, Fudan University
Creating a Database Environment




          HUANG Lihua, Fudan University
Key organizational elements in the
      database environment
   Data                                        Database
Administration                              Technology and
                                             management
                  Database
                 Management
                   Systems
Data Planning
And Modeling                                Users
Methodology

                  HUANG Lihua, Fudan University
          Data Administration
• Data Administration
  – Develop information policy
  – Define information requirements
  – Plan for data
  – Oversee logical database design and database
    dictionary development
  – Monitor use of information
• Database administrator & database analyst

                HUANG Lihua, Fudan University
Key organizational elements in the
      database environment
   Data                                        Database
Administration                              Technology and
                                             management
                  Database
                 Management
                   Systems
Data Planning
And Modeling                                Users
Methodology

                  HUANG Lihua, Fudan University
            Management Requirements for
                 Database Systems
• Data Planning and Modeling Methodology
   – Enterprise-wide planning for data
   – Identify key entities, attributes, and relationships that constitute the
     organization’s data
• Data Planning process
   – Data planning
       • Develop a model of business processes
   – Requirements Specification
       • Define information needs of end users in a business process
   – Conceptual design
       • Expresses all information requirements in the form of high-level model (ERM)
   – Logical design
       • Translates the conceptual model into the data model of a DBMS
   – Physical Design
       • Determines the data storage structure and access method
                              HUANG Lihua, Fudan University
                Procurement Process
           When purchasing department wrote a purchase order, it sent a
Purchasing copy to accounts payables.

              The vendor sent an invoice to accounts payables when they
  Vendor      delivered goods.


Receiving     When material control received the goods, it sent a copy of
              the receiving document.

              Thus, the account payables received three documents from various
Account       senders:Purchasing order, invoice and receiving document. It was
Payables      up to account payables to match the purchasing against the receiving
              document and the invoice. If they matched, the department issued
              payment. Otherwise, an accounts payable clerk would investigate the
              discrepancy, hold up payment, generate documents and all in all gum
              up the works. HUANG Lihua, Fudan University
Entity-relationship diagram




        HUANG Lihua, Fudan University
Main data table in Procurement
           Process




         HUANG Lihua, Fudan University
Key organizational elements in the
      database environment
   Data                                        Database
Administration                              Technology and
                                             management
                  Database
                 Management
                   Systems
Data Planning
And Modeling                                Users
Methodology

                  HUANG Lihua, Fudan University
   Management Requirements for
        Database Systems

• Database Technology, Management, and
  Users
  – Databases require DBMS software and staff
  – Database design group defines and organizes
    structure and content of database
  – Database administration: establish physical
    database, logical relations, access rules
                HUANG Lihua, Fudan University
                             Wrap-up
• Database Management Systems
    – Concept, importance
    – Be able to use simplest ACCESS
• Major types of database
    – Structure model:
        • Hierchical, Network, Rational, Multidimensional, OO
    – Data type:
        • Operational database ~analytical database ~Deductive Database
        • Simple database ~ Multimedia Database ~ Hypermedia Database
    – Store type
        • Centralized Database ~ Distributed Database
• Database development and access
• Data warehouse
• OLAP, Data mining

                           HUANG Lihua, Fudan University
        Assignment for Session 8-9
•   Individual Review for session 8
    –   Reading Materials: Textbook: chapter 5
    –   P. 168: Review Quiz: write down in your book.
•   Individual Assignment :
    –   P.136: Application exercises 3
    –   P. 169: Application exercises 1
    –   submit to Vcampus in two weeks


                  HUANG Lihua, Fudan University
     Preparation for session 10
• Individual Preparation for session 10
  – Reading Materials: Textbook: chapter 6
  – P. 206: Review Quiz: write down in your book.




                HUANG Lihua, Fudan University

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/16/2012
language:
pages:99