VIEWS: 12 PAGES: 17 POSTED ON: 4/10/2012
Online Foreign Currency Comparison Tool Student: Jamie Law Student ID: 7146174 Supervisor: Sean Bechhofer Final Year Project Report Computer Science BSc University of Manchester, May 5, 2010 1 1 Abstract The goal of this project is to produce an online comparison tool similar to those available on MoneySupermarket.coma . The comparison tool will be capable of allowing a user to quickly determine which company offers the best rate on a particular foreign currencyb . This project is successful in extracting the foreign currency rates from several online providers and producing a sortable table of results for comparison by the user. Title: Online Foreign Currency Comparison Tool Student: Jamie Law Student ID: 7146174 Supervisor: Sean Bechhofer a MoneySupermarket.com is a website specialising in comparing the prices of lots of companies for a wide range of products such as insurance (Car, Travel, Home etc) to mobile phone contracts to ﬁnd the best deal. b Thus saving them the trouble of having to manually ﬁnd and search 10+ different providers to ﬁnd the best rate. 2 2 Acknowledgements Upon entering third year I had the task of choosing what to do for my third year project. It was my supervisors early involvement that prompted me in the right direction and lead me on to a suitable project to undertake. Over the course of the year he has been of great help in making sure I stook to the task at hand and prevented me from straying off in the wrong direction. I’m grateful for his help and support in my ﬁnal year and his insight on how best to tackle the challenging aspects I’ve faced with my project. 3 Contents 1 Abstract 2 2 Acknowledgements 3 3 Introduction 5 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Project Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.4 Existing Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.5 Tackling the Problem . . . . . . . . . . . . . . . . . . . . . . . . 7 3.6 Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Design 8 4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8 5 Implementation 10 5.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Update Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.3 Index Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.4 Sorting Routines . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 Results 11 6.1 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 11 7 Testing and Evaluation 11 7.1 Testing Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.2 Updating data with fresh data . . . . . . . . . . . . . . . . . . . . 12 4 7.3 Checking the correct values are being extracted . . . . . . . . . . 12 7.4 Estimating the accuracy of the data . . . . . . . . . . . . . . . . . 12 7.5 Summary of Testing . . . . . . . . . . . . . . . . . . . . . . . . . 13 8 Conclusions 13 8.1 Achievement of Objectives . . . . . . . . . . . . . . . . . . . . . 13 8.2 Changes to Original Plan and Expectations . . . . . . . . . . . . . 14 8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8.4 Conclusion and Personal Comments . . . . . . . . . . . . . . . . 16 3 Introduction 3.1 Background Online comparison tools are cropping up on an almost daily basis on the Inter- net. More and more people are using them as a means to quickly check several providers for the best rates available for a particular product in a bid to save both time and money. The idea behind these websites is that the owners proﬁt from the commis- sions earned through the referral of visitors to the respective websites. But, in the foreign currency market because of the very nature of it there is no afﬁliate incentive scheme and so there is only one online comparison tool able to compare the foreign currency providers. 3.2 Project Proposal The proposal was to create an online comparison tool that would produce a sortable table to allow a user to determine which company offers the best exchange rate for a particular foreign currency. The tool would be updated automatically at set 5 time intervals with fresh data to reﬂect the latest exchange rates available from the respective companies. The user would then be able to click through and visit the site from the table in order to purchase the currency at the rate displayed. A user should be able to see if a particular company offers the foreign cur- rency requested. The user should also be able to sort the table alphabetically and numerically in order to check their preferred company exchange rates if desiredc . 3.3 Use Case As an example of the type of person who would use this online comparison tool, consider a person who is travelling abroad next week and hasn’t got the time to pop in to town to purchase some foreign currency from the travel agents. Either at work or home, the user surfs the internet and types in ’foreign cur- rency exchange rates’. The ﬁrst resultd is my site which the user surfs to and sorts the table by AUDe which then ﬁlters the data numerically according to the best rates available. The user then proceeds to click through to the online company to purchase their foreign currency. The alternative being for the user to have to take time out of their schedule to pay a visit in to the town centre to purchase currency. With this comes the choice of having to visit several shops in order to ﬁnd out who has the best rate and by how much. Or, surﬁng to the known online retailers to see what their exchange rates are manually. c It is well known that users tend to stick to established, well-known trust brands to make purchases over the Internet. d This is just a suggested use case as there are thousands of possible keywords the user could type. Naturally for a keyword such as ’foreign currency exchange rates’ with lots of big competi- tion the chances of my newly established site appearing ﬁrst are slim to none but this is merely a theroertical example. e Australian Dollars. 6 This comparison tool will alleviate the additional time and effort taken by presenting the user with the necesary data to make a decision all in one place. The need for a one-stop-shop is clearly there as can be attributed to the success of online comparison sites such as MoneySupermarket.com who in the UK alone are seeing millions of users a year visit their website to use their comparison tools. 3.4 Existing Work This project is based on the use of the Simple HTML DOM Parserf which is a PHP library. The art of scraping data from sites isn’t new but the ability to extract dynamic data is something that more and more sites are trying to tackle. With the evolution of algorithms such as Genetifyg we need to come up with more robust solutions to manipulating data from websites. The Simple HTML DOM Parser hasn’t been updated in several years meaning that once websites make use of better dynamic algorithms to boost sales then the parser will become ineffective for tackling data of this nature. This is because the algorithm used in Genetify learns over time from the behaviour of visitors to the website so identifying characteristics of data on external websites could be manipulated by the Genetify algorithm to better server the visitors to their website leaving you with no way of directly accessing that particular peice of data if it changes. That is of course if we only take into account the semantics of the data as appose to the date values. f PHP Simple HTML DOM parser is a php library which allows you to extract and manipulate data from websites g Genetify is similar in nature to Google Website Optimizer except that you can apply weighting to the split A/B and multi-variate tests 7 3.5 Tackling the Problem There are a handful of different methods used to scrape data from websites. As I’ve previously used the Simple HTML DOM Parser I felt at ease in using that li- brary to fulﬁll the requirements I had set for the project. Alternative solutions could have included converting the websites to xml in order to provide some meaning and format to the data in order to use xquery on it. Using a lexical analyser is possible which would have allowed me to convert the websites to a series of tokens which would then be processed and analysed. 3.6 Chapter Overview Design: This chapter will explain the design process behind the comparison tool aswell as demonstrating how it automatically updates the data to keep it fresh. Implementation: This chapter will explain how the tool was implemented with special attention being paid to the problems encountered at the different stages. Testing: This chapter will explain the testing methodology used and some exam- ples of the data being veriﬁed manually. Evaluation: This chapter will critically evalutate the project and reﬂect upon both the successes and failures of the project. Conclusion: This chapter will give a personal overview of the project and how it went along with a look into what the future holds for the tool. 4 Design This chapter explains the high-level design aspects of the project and how the rea- sons for why they were chosen. While the exact implementation of these features is left for later chapters. 8 4.1 System Architecture The system is split up into: • Database. • Update Scripts. • Index Page. • Sorting Routines. The database is what holds the data from the various providers and stores for easy and efﬁcient manipulation. It contains three tables: currencies, currency values and sites. Each one is updated by the update scripts to overwrite the data when and if it changes. I chose the MySQL Database Management system as it is both free and extensively documented online. I’m also at ease with it as I’ve used it for several years now to store all kinds of different data. The update scripts contain the code to extract the data from the external web- sites to be stored in the database for later manipulation. These update scripts are triggered by cronjobs running on the local server to force them to run every 6 hours. An update script will be run once the cronjob executes it and then will check the respective company website to see if the data has changed and if neces- sary update the database with the fresh data. The index page is what holds the front-end code to display the table and pull it all together. This page makes several calls to the database to pull the data and put it into a table for viewing. The sorting routines are what allow the table to be organised alphabetically and numerically. 9 The development process used during my project was an incremental one, using the Uniﬁed Process (UP)h . The most important aspect of following this process was identifying critical tasks early on. In the case of this project that was making sure that sites were scrapable and would allow it. I implemented the Model-View-Controller (MVC) design pattern, which iso- lates the application logic which in this case is the extraction of data into the database from the UI thus permitting independant development and testing of each because one isn’t directly reliant on another per-se. The table can be thought of as the View. The controller can be thought of as the sorting mechanism on the table. The database and data extraction (i.e. the back-end) can be thought of as the Model. 5 Implementation 5.1 Database The database contains three separate tables for the different types of currency, currency values and the sites which are to be scraped. These tables are queried by the update script in order to both update them and use the data to scrape the relevant sites. 5.2 Update Scripts There is a speciﬁc script for each site which contains both the site url and the type of currencies available from that site. The scripts contain the code necessary to access the elements on the respective websites in order to extract the database and h The Uniﬁed Software Development Process is a popular iterative and incremental software development process framework with a code ﬁrst emphasis. 10 update it if necessary. 5.3 Index Page This is a php ﬁle which creates a html table and imports data from the database into the correct rows and columns to populate it. 5.4 Sorting Routines The sorting routines allow the user to order the columns both numerically and alphabetically. 6 Results 6.1 Discussion of Results This project was successful in creating a comparison tool that the user can use to establish which company offers the best exchange rate on a particular foreign currency. 6.2 Summary of Results In this chapter we have looked at the tool and provided an overview of the func- tionality available in the table to the user. In the next chapter I will talk about how I went about testing the tool. 11 7 Testing and Evaluation 7.1 Testing Approach The testing approach I used involved a combination of both white and black box testing of the system. The white box testing involved running tests with the knowl- edge of the internal structure of the system e.g. database. This allowed me to check for example if the data was being stored correctly in the database and if it was getting updated with fresh data. The black box testing involved external test- ing of the system without any knowledge of how the internal structure worked. These tests involved ensuring the correct values were in each cell in the table and that this matched up with what was being displayed on the company website. 7.2 Updating data with fresh data In order to test if fresh data was replacing old data as it should do I ﬁrst had to ﬁnd some old data. Once I did I then proceeded to manually run the update script for that speciﬁc site to see if the old was replaced and the table updated accordingly. I’m pleased to say it was and so everything on this end is working ﬁne. 7.3 Checking the correct values are being extracted This was a problem I had noticed during the demo and in testing that the identiﬁers I was using to extract data were sometimes changing meaning that effectively I was pulling the wrong values. In order to combat this I need to tackle this problem I need to introduce some more robust integrity checks to ensure that the data being extracted is reasonable and in-line with that of the other data for the same currency from other sites. For the majority of data the correct values were being extracted and had not 12 changed over a 3-month period. But, a more robust solution to tackle the problem if something does change is something I will be implementing over Summer. 7.4 Estimating the accuracy of the data In order to estimate the accuracy of the data I need to manually check each com- pany website to see if the value is accurate at the time of checking. As this is likely to take around an hour and the values could be updated in this time I opened the company websites in tabs to ensure I had the values all the same at the time of checking. There are 48 different values in the table which needed checking and 41 of those values was accurate. By accurate I mean it was the same value that was displayed on the company website and therefore up-to-date and fresh. This means that approximately 85 percent of values are accurate which I’m reasonably happy with. Ideally I’d like that to be in the 95+ range so I will monitor the results of this test on a daily basis for a week to see if there are any regular occurences such as a particular site which updates more often than others. If possible then I will increase the frequency in which I scrape the data from that speciﬁc site to make sure the data is as fresh as possible. 7.5 Summary of Testing In this chapter we have looked at how I tested the tool in order to determine that the data was accurate. These tests included: • Updating data with fresh data. • Checking the correct values are being extracted. 13 • Estimating the accuracy of the data by determining what percentage is up- to-date. 8 Conclusions 8.1 Achievement of Objectives When I set out to tackle this project my objectives were to: • Create a table of foreign currencies from several different providers. • Allow the table to be sorted both alphabetically and numerically. • Maintain fresh data so the exchange rates are accurate or very close. • Simplify the task of searching for the best exchange rate online. I sucessfully managed to import all the necessary data from several different online providers into a database in order to allow users to manipulate it according to the foreign currency they desired. This data was both sortable alphabetically and numerically by the type of currency and also by the provider. The data is replenished in the database on a 6-hour rotation to ensure that the external servers of the providers are not hit to frequently to trigger a ban and also to ensure data is updated regularly throughout the day. I feel conﬁdent in saying that the task of searching for the best exchange rate is simpliﬁed by the use of the comparison tool. A user can now visit the site and search up to 10 different providers auto- matically by sorting the table which is a great saving on time in comparison to the manual method of searching each individual provider for the correct currency. 14 8.2 Changes to Original Plan and Expectations One important lesson this project has taught me is that of time management. It is so easy to slip in to the trap of underestimating the time taken to complete parts of the project which has lead to me continually have to update my gantt chart to reﬂect the latest project timeframe. A software engineering priniciple is that what ever timeframe you estimate for something you should always triple it. It is only in hindsight and with the experience gained on this project that I have realised that. In an ideal world I would have liked to have had the tool looking more polished on the UI front. Whilst the tool performs as expected and fulﬁlls the objectives it would have been nice to have it looking visually appealing. This is something that I plan to do over the Summer and is discussed in the ’Future Work’ subsection. 8.3 Future Work The tool as it stands is functional but the UI and general aesthetics need improve- ment. This is something that will be done as the number of visitors increases on the site and the Genetify algorithm has been running for long enough to satisfy me that there are clear changes to make based on the results of the weighted el- ements. I plan to run a series of experients to determine things such as which colour is most effective and what naming conventions users prefer for currencies e.g. AUD or Australian Dollars. Google is continually improving in the application ﬁeld which paves the way for automated graphs to be created to allow a historical view of foreign currency rates. This would allow a user to see where a particular currency has been in terms of value over a set period of time. Ideally, I’d like to get to the stage where a user can select a currency and then choose a period of time to display on the graph. 15 Having searched for this myself before purchasing currency online I know this to be a useful feature. One of the more important aspects is the ability to integrate more robust val- idation of data into the back-end in order to spot anomalies in the extracted data. Online stores are continually adapting and becoming more advanced and algorith- mic in their approach to displaying data on their site. In this day and age it is no longer a given that data ﬁelds are likely to remain the same for long. Technology and advances in online algorithms such as the Genetify algorithm means that more sophiscated methods have to be used. It would be nice to track when and if particular sites update at a regular time. If they do this would allow me to time the update scripts to ensure maximum freshness of data and possibly reduce the number of times I scrape the sites on a daily basis. 8.4 Conclusion and Personal Comments This project has resulted in the creation of an online comparison tool capable of comparing the exchange rates from several different online providers for a number of different currencies. Having always wanted to understand how comparison sites such as MoneySu- permarket.com work and make their money I feel a certain sense of accomplish- ment in knowing that I could replicate many of their tools with the knowledge gained from this project. It will also allow me to contribute my knowledge to the existing MoneySavingExpert.comi comparison tools and hopefully allow me to design and create some new ones for them. i MoneySavingExpert.com is a website dedicated on providing advice to consumers on how to ﬁnd the best deals on things and has the consumer interest at heart in their motto of ”standing up for the little guy”. 16 I expect the big push by Google into the use of microformats will change how data scraping is done and make it more simpliﬁed as the Internet evolves into a more standardised markup. I hope in the coming years that future students are able to look at this report as a basis on how best to tackle the problems of data scraping and the various ways it can be achieved when there is no standardised markup in use. I hope to continue this project over the Summer and develop it into a very clean and polished UI. I then plan to push it out as a widget through the WordPress repositories and develop more comparison tools for use by the average consumer. Whilst this project has been lengthy it has been of great beneﬁt in exploring the issues of the semantic web and sparked an ambition for me to convert my own personal sites into microformat where possible to allow better indexation. References  MoneySupermarket.com: Alexa Site Informa- tion, Alexa provides trafﬁc stats for websites. http://www.alexa.com/siteinfo/moneysupermarket.com  SimpleHTMLDOM PHP library: A HTML DOM parser writ- ten in PHP5+ let you manipulate HTML in a very easy way. http://simplehtmldom.sourceforge.net/  Genetify: An algorithm which applies weighting to A/B and multi-variate tests. http://wiki.github.com/gregdingle/genetify/  Travel Money: Find the best online deal for your holiday cash. http://travelmoney.moneysavingexpert.com/ 17
"Online Foreign Currency Comparison Tool"