Database Systems Research: Where it is (or should be) Headed? (aka looking for a “perfect” candidate) Laks V.S. Lakshmanan Dept. of Computer Science Univ. of British Columbia December 6, 2001. Disclaimers and Stage Setting not meant to be comprehensive necessarily biased database intended in a very broad sense – e.g., relational databases, OO, object-relational, … – legacy systems (hierarchical/network DBs) – file system, spreadsheets, network directories – text, media, maps – time series, biological sequences – data on the web, XML data management research – more apt term DB Research Paradigms three major streams: – database theory (connections to math. logic, finite model theory, …) – principles (data modeling, design, query languages, query optimization, …) – systems (database tuning, benchmarking, …) all three have their place in general but there are limitations DB Research: A data-driven perspective data mining, OLAP OO - data on the web - business data, scientific, - biological data relational - alphanumeric - rigid structure unstructured data semi-structured spatial multi-media data & XML temporal -raster, video, - text/doc domination mobility -audio - surprising e.g.:AcEDB DB Research: A process- driven perspective classical: e.g., transactions, triggers, integrity checking modern: – richer transaction models – active databases – workflow – data warehousing – data integration Note: last two have a substantial data modeling, query answering, algorithmic component. Some Database Theory what are queries? First (bad) answer: any computable INOUT function. Okay, efficiently computable ones: why is this still bad? What about the following “queries”? – Find the 10th tuple in relation emp. – Find the employees with an odd salary. – Find the employees the internal representation of whose name is odd! More on queries What went wrong: representation dependence. Queries are computable functions that commute (i.e. they are generic): Q DB Ans Rep Rep Rep(DB) Rep(Ans) Q Interesting Questions what are meaningful queries for a given data model/application class? how do you design declarative query languages and algebras? build novel indices for new data types? design optimal strategies for clustering data deal with size: data compression, approximation, summarization, etc. resource conscious designs scalable algorithms for analysis queries (incl. data mining) IQ (contd.) liberating data mining from present-day mindset answering queries using views and view maintenance semi-structured data management mixing paradigms: e.g., database style querying and information retireval or media retrieval foundational questions in new domains: e.g., what does it mean to query sequences? Profile of a perfect candidate some obvious desirables: is a hardcore system builder, architect of extensions has vision in traditional or new domains (e.g., web, biology, mobility, …) – vision just as important as technical skills raises difficult questions and provides surprisingly elegant and/or efficient solutions complements the DB group’s strengths has unbounded energy and enthusiasm!!!!