Embed
Email

chap-7

Document Sample

Shared by: panniuniu
Categories
Tags
Stats
views:
0
posted:
12/11/2011
language:
pages:
38
Chapter-7: Case Study

www.AhsanAbdullah.com









1

(c) 2008 Dr. Ahsan Abdullah

Background

 Data being recorded for decades by several

organization, mostly never digitized and never

used for decision making. Under-utilization.



 Data is horizontally wide i.e. 100+ attributes and

vertically deep i.e. tens of thousands of rows.



 Huge potential for long-term and short-term

decision making.



 Decision making not data driven, but based on

“expert” judgment, sometimes with tragic results.

2

(c) 2008 Dr. Ahsan Abdullah

Area under study

 Punjab is the bread-basket of Pakistan, and

administratively divided into eight divisions.



 Multan division is the cotton hub of Punjab,

which consists of district Multan.



 District Multan has three tehsils, which are

furhter divided into central points or Markaz.



 This study is about 10 tehsils of Multan

using data for years 2001 and 2002.

3

(c) 2008 Dr. Ahsan Abdullah

Area under study: Map







Key Markaz

1 Bosan

2 Qadirpurran

3 Multan

4 Makhdum Rashid

5 Mumtazabad

6 Shujabad

7 Hafizwala

8 Jalalpur Pirwala

9 Qasba Marral

4

(c) 2008 Dr. Ahsan Abdullah

Major Players

 A pest is an insect that eats the crop. There

are two types of cotton pests:

 Sucking pests (White Fly, Jassid, Thrips etc.)

 Boll Worms (spotted boll-worm, pink boll-worm)





 A predator is an insect that eats the pest.

 Such as Lady bug Beetle, Spiders, Ants etc.





 Cotton is also effected by virus

 Cotton Leaf Curl Virus (CLCV)

5

(c) 2008 Dr. Ahsan Abdullah

Economic Threshold Level (ETL_A)

 The pest population beyond which it is cost

effective to use pesticide.



 Pesticide is a poison which is used to kill

pests.



 Note that eradicating pests is NOT feasible,

controlling pest population is feasible.





6

(c) 2008 Dr. Ahsan Abdullah

Pests & Predators









Pests Predators

Source: United States Department of Agriculture (USDA) 7

(c) 2008 Dr. Ahsan Abdullah

ETL_A: Graph

Economic Injury



ETL_A

Pest Population









Time









8

(c) 2008 Dr. Ahsan Abdullah

The need

 Extensive losses to cotton crop due to pest attacks

in 1983 resulted in the establishment of Directorate

of Pest Warning in 1984.



 Since 1984 scouts from the Directorate have been

sampling fields and recording data and advising

farmers.



 During 2003-04 season, Boll Worm attack on the

cotton crop resulted in a loss of nearly 0.5 M bales.



 Weather not the only factor, but points to a multitude

of factors, requiring efficient and effective data

analysis, for better decision making.

(c) 2008 Dr. Ahsan Abdullah

9

The need: IT in Agriculture

 The volume of pest scouting data accumulated todate is

enormous both horizontally and vertically.



 A typical pest scouting sheet consists of 35 variables or

attributes.



 Metrological data consists of 50+ variables.



 Coarse estimate of pest scouting data recorded for the cotton

crop alone stands at 5+ million records, and growing.



 Tasking the human brain alone, for synthesis of information

from this data is not only impractical but dangerous too.



 Need a Data Warehouse, OLAP tools and Data Mining to

analyze the data. 10

(c) 2008 Dr. Ahsan Abdullah

Agro-Informatics

 “I.T. sector is at the heart of the economic

revival of Pakistan” President of Pakistan,

Launching of VU, Mar. 23, 2003.



 Agriculture is the backbone of our economy,

upto 74% of the population is dependent on it.



 IT is an enabler, and has the potential to benefit

everyone when applied in Agriculture.



 IT + Agriculture: A win-win scenario.

11

(c) 2008 Dr. Ahsan Abdullah

Agro-Informatics: Relevant URL









To know more about Agro-Informatics, visit



www.agroict.org









12

(c) 2008 Dr. Ahsan Abdullah

How to go about?

 Discussed several DWH implementation

methodologies in lectures 32-35.



 Will adopt a pilot project approach, because:

 A full-blown DWH requires extensive investment.

 Show users the value of DSS.

 Establish blue print for full-blown system.

 Identify problem areas.

 Reveal true data demographics.

 Pilot projects are supposed to work with limited

data. 13

(c) 2008 Dr. Ahsan Abdullah

The 12-step Approach of Shaku Atre









14

(c) 2008 Dr. Ahsan Abdullah

Step-1: Determine User’s needs

 A data warehouse or a data mart?

 User not educated, think DWH is a big Excel sheet.

 Some confuse DSS with Decision Making Systems.

 DWH at Federal level.

 Data marts at district level.





 What are the business objectives?

 Good governance via E-Governance.

 Timely and data driven decision making.

 Fore warning of farmers.

 Timely planning for purchase/stocking of pesticides.



15

(c) 2008 Dr. Ahsan Abdullah

Step-1: Determine Users needs

 Cost/benefit analysis, project estimation & risk

assessment

 Cost of a prototype system fraction of the loss in one

year.

 Required data either not made available or expensive.

 Required data in very raw format.

 People hungry for ideas looking for a solution on a plate.



 Where do I get the data from?

 Data available at Directorate of pest warning.

 Raw form

 Metrological department asking too much for data

collected using public funds..

 Weather forecasts downloaded from web site of

16

www.dawn.com (c) 2008 Dr. Ahsan Abdullah

Steps-2&3: Determine DBMS Sever & Hardware Platform



 Hardware and Software along with trained

people was already available on Campus.



 The pilot DWH was implemented using NCR

Teradata Data Warehousing solution for NT

running on a dual Intel 950 Mhz Xeon

processors server and 1GB of RAM.



 Total internal Hard Disk capacity of the server

amounted to 36 GB while external RAID

control supports 8 additional SCSIs of 18 GB

each. 17

(c) 2008 Dr. Ahsan Abdullah

Step-4: Information & Data Modeling









Dimensional Modeling

18

(c) 2008 Dr. Ahsan Abdullah

Step-4: Information & Data Modeling









Simplified ERD

Other field inputs such as irrigation, fertilizer etc. not included as data not

19

available

(c) 2008 Dr. Ahsan Abdullah

Step-4: Information & Data Modeling

KEY

WFN: White Fly Nymph

WFA: White Fly Adult

W: White

B: Brown

S: Small Larvae

L: Large Larvae









(optional)









Logical & Physical Design 20

(c) 2008 Dr. Ahsan Abdullah

Step-5: Construct Metadata Repository

 What means what was important.



 Who access what was not important as single

user environment.



 Meetings held similar to requirements

definition, and questions transformed into

SQL queries.



 Meta data issues still cropped up, more of

business rules.

21

(c) 2008 Dr. Ahsan Abdullah

Step-5: Surprise case

0.3

Sucking pests Ball Worm Complex

0.25

0.2

Correlation







SBW: Spotted Ball Worm

0.15 ABW: Army Ball Worm

PBW: Pink Ball Worm

0.1

0.05

0









PBW_RF

SBW









PBW_Bolls

ABW_White_Eggs





ABW_Brown_Eggs

WhiteFly_Nymph





WhiteFly_Adult





Thrip





Mite









ABW_Larvae_Small

Jassid









ABW_Larvae_Large

-0.05

-0.1

-0.15

-0.2





If pest population is low, predator population will also be low, because there will

22

be less “food” for(c)predators to live on i.e. pests.

2008 Dr. Ahsan Abdullah

Step-6: Data Acquisition & Cleansing









Hand filled pest scouting sheet









23



Typed pest scouting sheet

(c) 2008 Dr. Ahsan Abdullah

Step-6: Issues

 The pest scouting sheets are larger than A4 size (8.5” x

11”), hence the right end was cropped when scanned on a

flat-bed A4 size scanner.



 The right part of the scouting sheet is also the most

troublesome, because of pesticide names for a single

record typed on multiple lines i.e. for multiple farmers.



 As a first step, OCR (Optical Character Reader) based

image to text transformation of the pest scouting sheets

was attempted. But it did not work even for relatively clean

sheets with very high scanning resolutions.



 Subsequently DEO’s (Data Entry Operators) were

employed to digitize the scouting sheets by typing.

24

(c) 2008 Dr. Ahsan Abdullah

Step-6: Why the issues?

 Major issues of data cleansing had arisen

due to data processing and handling at four

levels by different groups of people

1. Hand recordings by the scouts at the field

level.

2. Typing hand recordings into data sheets at the

DPWQCP office.

3. Photocopying of the typed sheets by DPWQCP

personnel.

4. Data entry or digitization by hired data entry

operators.

25

(c) 2008 Dr. Ahsan Abdullah

Step-7: Transform, Transport & Populate









26

(c) 2008 Dr. Ahsan Abdullah

Motivation For Transformation

 Trivial queries give wrong results.

 Static and dynamic attributes

 Static attributes recorded repeatedly.









27

(c) 2008 Dr. Ahsan Abdullah

Step-7: Resolving the issue

 Solution: Individualization of cultivated fields.

 Technique similar to BSN used to fix names.

 Unique ID assigned to farmers.

 BSN used again, and unique ID assigned to fields.



 Results:









 Limitation: Field individualization not perfect. Some cases

of farmers with same geography, sowing date, same

variety and same area. Such cases were dropped.

28

(c) 2008 Dr. Ahsan Abdullah

Step-8: Middleware Connectivity

 Since the source data is maintained in a

non digital format, hence connectivity with

the data warehouse was irrelevant.



 Once digitized, it was rather straightforward

to load data into the warehouse.



 Furthermore, in the foreseeable future, it

was not anticipated that the scouting

sheets were going to be maintained in a

digitized form. (c) 2008 Dr. Ahsan Abdullah

29

Step-9-11: Prototyping, Querying & Reporting

 Implemented the prototype with user involvement.



 Applications developed

 10. A data mining tool was also developed based on an indigenous

technique that used crossing minimization paradigm for

unsupervised clustering.



 11. A low-cost OLAP tool was indigenously developed; actually it

was a Multi dimensional OLAP or MOLAP.



 Use querying & reporting tools

 The following SQL query was used for validation:



SELECT Date_of_Visit, AVG(Predators),

…………………………AVG(Dose1+Dose2+Dose3+Dose4)

FROM Scouting_Data

WHERE Date_of_Visit 0

GROUP BY Date_of_Visit; 30

(c) 2008 Dr. Ahsan Abdullah

Step-12: Deployment & System Management





Since a pilot project, therefore, the traditional

deployment methodologies and system

management techniques were not followed to

the word, and are not discussed here.









31

(c) 2008 Dr. Ahsan Abdullah

Agri-DSS usage: Data Validation

 Quality and validity of the underlying data is the key to

meaningful and authentic analysis.



 After ensuring a satisfactory level of data quality (based on

cost-benefit trade-off) extremely important to scientifically

validate the data that the DWH will constitute.



 Some very natural checks were employed for this purpose.

Relationship between the pesticide spraying and predator

(beneficial insects) population is a fact well understood by

agriculturists.



 Predator population decreases as pesticide spray increases

and then continually decreases till the end of season.



32

(c) 2008 Dr. Ahsan Abdullah

0

2

4

6

8

10

07/06/01

07/09/01

07/12/01

07/15/01

Predator







07/18/01

07/21/01

07/24/01

Spray









07/27/01

07/30/01

08/02/01

08/05/01

08/08/01

08/11/01









(c) 2008 Dr. Ahsan Abdullah

08/14/01

08/17/01

08/20/01

08/23/01

08/26/01

08/29/01

09/01/01

09/04/01

09/07/01

33

Agri-DSS usage: Data Validation Graph

Agri-DSS usage: Spray Dates

 Working Behaviors at Field Level: Spray dates

 As expected, apparently spray dates and spray

frequency for 2001 and 2002 do not displayed any

correlation; as it is dependent on pest populations,

availability of pesticides etc.



 For deep analysis, moving average of sprays for five

days, and a moving correlation of sprays for five days

were calculated and plotted.



 For the sake of uniformity, the moving average of

spray was normalized using the maximum spray

frequency. (c) 2008 Dr. Ahsan Abdullah

34

Agri-DSS usage: Spray Dates Graph

0.90 Moving Avg

Correlation

Relative values





0.70



0.50



0.30



0.10

8_2







8_7









9_1







9_6

7_23





7_28









8_12







8_17





8_22







8_27









9_11







9_16







9_21





9_26







10_1

-0.10



-0.30

Spray dates (mm_dd) for 2001 & 2002

-0.50





No relationship should have existed for the two years. But note the

surprising finding that most sprays occurring on and around 12th Aug. in

BOTH years with high correlation, appearing as a spike.



Also note the dip in sprays around 11th Sep.! Sowing at predetermined time

makes sense, as it is under the control of the farmer, but that is not true for

spraying. Pests don’t follow calendars; therefore, whenever, ETL_A is

crossed pesticides are sprayed. 35

(c) 2008 Dr. Ahsan Abdullah

Agri-DSS usage: Explaining Findings

14th Aug. is the independence day of Pakistan and a national holiday.



In Pakistan, people are in a habit of sandwiching gazetted holidays with

casual leaves; consequently businesses are closed for a longer period,

including that of pesticide suppliers.



14th Aug. occurred on Tue and Wed in 2001 and 2002, respectively, thus

making it ideal to stretch the weekend.



During Aug/Sep. humidity is also high, with correspondingly high chances

of pest infestations.



Therefore, apparently the farmers decided not to take any chances, and

started spraying around 11th Aug.; evidently even when it was not required.



The weather forecast for 13 Aug. 2001 and 2002 was showers and cloudy,

respectively. Therefore, most likely the pesticide sprayed was washed-off.



Decline in sprays around 9/11 could not be explained.

36

(c) 2008 Dr. Ahsan Abdullah

Agri-DSS usage: Sowing Dates

The results of querying the sowing date based on the day of the week are

shown below:



2001: Sowing week_day 2002: Sow ing w eek_day



500 431 429 500

405 398 409 387

400 367 357

400

303 278

300 300 223

179 174

200 200 124

100 100

0 0









Fri

Fri









Tue









Thu

Mon

Tue









Thu

Mon









Wed









Sun

Sat

Wed









Sun

Sat









Observe least number of sowings done on Thursdays, in each year. This finding

was later confirmed by extension personnel.



Multan is famous for its shrines. Thursdays are usually related with religious

festivals and activities, a mix of devotion and recreation, and usually held at

shrines, hence a tendency of doing less work on Thursdays. Similar behavior was

observed for spraying too.

37

(c) 2008 Dr. Ahsan Abdullah

Conclusions & Lessons

 ETL is a big issue. There are no digitized operational databases, data entry of

these sheets is very expensive, slow and prone to errors.



 Each farmer is repeatedly visited by agriculture extension people. This results in

repetition of information, about land, sowing date, variety etc. Hence, farmer and

land individualization are critical. Such an individualization task is hard to

implement for multiple reasons.



 There is a skewness in the scouting data. Public extension personnel (scouts)

are more likely to visit educated or progressive farmers, as it makes their job of

data collection easy. Furthermore, large land owners and influential farmers are

also more frequently visited by the scouts. Thus the data does not give a true

statistical picture of the farmer demographics.



 All pests are not present all the time (such as ball worms), most of the time

second spray is not done (or not recorded), and hence tables are sparse.

Therefore, it is better to split tables so as to decrease header size and table

space.



 Unlike traditional data warehouse where the end users are decision makers,

here the decision-making goes all the way “down” to the extension level. This

presents a challenge to the analytical operations’ designer, as the findings must

be fairly simple to understand and communicate.

38

(c) 2008 Dr. Ahsan Abdullah



Related docs
Other docs by panniuniu
organization_of_slp_working_files_3-23-10
Views: 1  |  Downloads: 0
Lesson 2 2011 key
Views: 0  |  Downloads: 0
Site Survey
Views: 2  |  Downloads: 0
alt energy project SP11
Views: 1  |  Downloads: 0
Effie Biography
Views: 0  |  Downloads: 0
Download-Organization-application-letter
Views: 0  |  Downloads: 0
TWIN_Nomination_form_2010
Views: 0  |  Downloads: 0
Engineering Change Order Master Log
Views: 2  |  Downloads: 0
360654.f1
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!