Sandbox‎ > ‎IPT 2010-11‎ > ‎Cheuk's HSC Page‎ > ‎Class Work‎ > ‎

29/11/10 - Databaes Research Task

posted Nov 28, 2010, 2:53 PM by Unknown user   [ updated Nov 29, 2010, 3:33 AM ]
Each of the following items relates to (a) DATA WAREHOUSES and (b) DATA MINING.
1. Describe what each is.
Data warehouses is a collection of data designed to support management decision-making.
Data mining involves the use of software that looks for hidden patterns in a group of data.
2. Outline the benefits that can be gained by organisations that keep these.
  • A common data model for all data of interest regardless of the source of the data. This makes it easier to report and analyse information than it would be if multiple data models were used to retrieve information such as sale invoices and order receipts.  
  • Inconsistencies can be identified and resolves prior to loading data into the data warehouse. This greatly simplifies reporting and analysis.
  • Information in the data warehouse is under the control of data warehouse users so that even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.
  • As they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.
  • Data warehouses facilitate decision support system applications such as trend reports (e.g. the items with the the most sales in a particular area within the last two years) and reports that show actual performance versus goals.
  • Data warehouses can work in conjunction with, and hence the value of, operational business applications.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
3. Explain with examples how each could potentially be abused.
Data warehouses are essentially centralised places of data storage, which comes with its related security and privacy issues. Access to the data has to be carefully considered to allow only appropriate people to access specific data. Otherwise, it opens the door to misuse and abuse of information.
Data mining raises issues of privacy and ownership of data. Separate chunks of data will probably not be able to reveal enough of a person to identify preferences, weaknesses and habits. However, if these chunks of data were to combined (e.g. membership in companies/clubs, online purchasing history), data on what we buy, how we pay for it, and how much we earn can be found and linked to provide holistic view of a person.
After trends have been identified, it is debatable whether the new information belongs to the individuals or organisations. 
4. Using the internet, identify two companies that do each (for a total of four companies) and justify (with examples) the use of such methods for each company.
WikiLeaks - Data mining is used by WikiLeaks in an attempt to promote transparency. However, in practice, this involves the release of otherwise confidential and sensitive documents.
Google - Google provides multiple online services to the worldwide community (e.g. Google Chrome, Gmail, Picasa). All these services all combined with a common user profile and account under Google Accounts.
Paypal - Paypal essentially serves as the middleman between buyers, sellers, and payment facilities. Data supplied by all 3 entities are stored by Paypal. The use of data warehouses is important in keeping all data centralised.

Microsoft - By considering multiple factors including distance, traffic flow and signals, direction changes, and the probability of accidents through the use of data mining, the fastest route can be picked.