Database Queries Logic and Complexity Moshe Y. Vardi, Rice University

Database Queries – Logic and Complexity Moshe Y. Vardi, Rice University Mathematical logic emerged during the early part of the 20 Century, out of a f...
Author: Charity Johns
3 downloads 0 Views 84KB Size
Database Queries – Logic and Complexity Moshe Y. Vardi, Rice University Mathematical logic emerged during the early part of the 20 Century, out of a foundational investigation of mathematics, as the basic language of mathematics. In 1970 Codd proposed the relational database model, based on mathematical logic: logical structures offer a way to model data, while logical formulas offer a way to express database queries. This proposal gave rise to a multi-billion dollar relational database industry as well as a rich theory of logical query languages. This talk will offer an overview of how mathematical logic came to provide foundations for one of today's most important technologies, and show how the theory of logical queries offer deep insights into the computational complexity of evaluating relational queries. Moshe Y. Vardi is the George Distinguish Service Professor in Computational Engineering and Director of the Ken Kennedy Institute for Information Technology Institute at Rice University. He is the co-recipient of three IBM Outstanding Innovation Awards, the ACM SIGACT Goedel Prize, the ACM Kanellakis Award, the ACM SIGMOD Codd Award, the Blaise Pascal Medal, and the IEEE Computer Society Goode Award. He is the author and co-author of over 400 papers, as well as two books: Reasoning about Knowledge and Finite Model Theory and Its Applications. He is a Fellow of the Association for Computing Machinery, the American Association for Artificial Intelligence, the American Association for the Advancement of Science, and the Institute for Electrical and Electronic Engineers. He is a member of the US National Academy of Engineering, the American Academy of Arts and Science, the European Academy of Science, and Academia Europea. He holds honorary doctorates from the Saarland University in Germany and Orleans University in France. He is the Editor-in-Chief of the Communications of the ACM.

Scientific Data Management: Not your everyday transaction Anastasia Ailamaki, EPFL Lausanne Today's scientific processes heavily depend on fast and accurate analysis of experimental data. Scientists are routinely overwhelmed by the effort needed to manage the volumes of data produced either by observing phenomena or by sophisticated simulations. As database systems have proven inefficient, inadequate, or insufficient to meet the needs of scientific applications, the scientific community typically uses specialpurpose legacy software. When compared to a general-purpose data management system, however, application-specific systems require more resources to maintain, and in order to achieve acceptable performance they often sacrifice data independence and hinder the reuse of knowledge. With the exponential growth of dataset sizes, data management technology are no longer luxury; they are the sole solution for scientific applications. I will discuss some of the work from teams around the world and the requirements of their applications, as well as how these translate to challenges for the data management community. As an example I will describe a challenging application on brain simulation data, and its needs; I will then present how we were able to simulate a meaningful percentage of the human brain as well as access arbitrary brain regions fast, independently of increasing data size or density. Finally I will present some of the dat management challenges that lie ahead in domain sciences. Anastasia Ailamaki is a Professor of Computer Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. Her research interests are in database systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating database management to support computationally-demanding and demanding dataintensive scientific applications. She has received a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), seven best-paper awards at top conferences (2001-2011), and an NSF CAREER award (2002). She earned her Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is a member of IEEE and ACM, and has also been a CRA-W mentor.

Open Data François Bancilhon Open Data consists in making available to the general public and to private and public organization PSI (public sector information) for access and reuse. More and more open data is becoming available in most democratic countries following the launch of the data.gov initiative in the US in 2009. The availability of this new information brings a number opportunities and raises a number of challenges. The opportunities are the new applications that companies and organisations can build using this data and the new understanding given to the people who access it. The challenges are the following: most of this data is usually in a poor format (poorly structured xls tables or in some cases even pdf), it is often of poor quality, and it is fragmented in thousands or millions of files with duplicate and/or complementary information. To use these fragmented, poorly structured and poor quality files, several approaches can be used, not necessarily mutually exclusive. One is to move the intelligence from the data into the application and to develop search based applications which directly manages the data as is. Another one is to bring some order in the data using a semantic web approach: converting the data in rdf, identifying entities and linking them from one data set to the other. And a final one is to structure the data by aligning data sets on common attribute and structure, to get closer to a uniform data base scheme. François is currently CEO of Data Publica, a key actor of the Open Data space in France and CEO of the Mobile Services Initiative for INRIA. He has co-founded and/or managed several software startups in France and in the US (Data Publica, Mandriva, Arioso, Xyleme, Ucopia, O2 Technology). Before becoming an entrepreneur, François was a researcher and a university professor, in France and the US, specializing in database technology. François holds an engineering degree from the École des Mines de Paris, a PhD from the University of Michigan and a Doctorate from the University of Paris XI.

Web archiving Julien Masanès The Web represents the largest source of open information ever produced in history. Larger than the printed sphere by several order of magnitude, it also exhibit specific characteristics compared to traditional media, such as it's collaborative editing to which a large fraction of humanity participates, even marginally, it's complex dynamics and the paradoxical nature of traces it conveys, both ubiquitous and fragile at the same time. These unique features also led the web to become a major source for modern information, analysis and study, and the capacity to preserve its memory an important issue for the future. But these features also require to lay new methodological and practical foundations in the well-established field of cultural artefacts preservation. This presentation will outline the salient properties of the Web viewed from the somewhat different angle of its preservation and offer some insight into how its memory can be built to serve science in the future. Julien Masanès is Director of the Internet Memory, a non-profit foundation for web preservation and digital cultural access. Before this he directed the Web Archiving Project at the Bibliothèque Nationale de France since 2000. He also actively participated in the creation of the International Internet Preservation Consortium (IIPC), which he has coordinated during the first two years. He contributes in various national and international initiatives and provides advices for the European Commission as an expert in the domain of digital preservation and web archiving. He has also launched and presently chairs the International Web Archiving Workshop (IWAW) series, the main international rendezvous in this field. Julien Masanès studied Philosophy and Cognitive Science, gaining his MS in Philosophy from the Sorbonne in 1992 and his MS in Cognitive Science from the Ecole des Hautes Etudes en Sciences Sociales (EHESS) in 1994. In 2000 he gained a MS in librarianship at the Ecole Nationale Supérieure des Sciences de l'information et des Bibliothèques (ENSSIB).

Static Analysis and Verification Victor Vianu, U.C. San Diego Correctness and good performance are essential desiderata for database systems and the many applications relying on databases. Indeed, bugs and performance problems are commonly encountered in such systems and can range from annoying to catastrophic. Static analysis and verification provide tools for automatic reasoning about queries and applications in order to guarantee desirable behavior. Unfortunately, such reasoning, carried out by programs that take as input other programs, quickly runs against fundamental limitations of computing. In the cases when it is feasible, it often requires a sophisticated mix of techniques from logic and automata theory. This talk will discuss some of the challenges and intrinsic limitations of static analysis and verification and identify situations where it can be very effective. Victor Vianu is a Professor of Computer Science at the University of California, San Diego. He received his PhD in Computer Science from the University of Southern California in 1983. He has spent sabbaticals at INRIA, Ecole Normale Superieure (Cachan and Ulm) and Telecom Paris. Vianu's interests include database theory, computational logic, and Web data. His most recent research focuses on static analysis of XML-based systems, and specification and verification of data-driven Web services and workflows. Vianu's publications include over 100 research articles and a graduate textbook on database theory. He received the PODS Alberto Mendelzon Test-of-Time Award in 2010 and has given numerous invited talks including keynotes at PODS, ICDT, STACS, the Annual Meeting of the Association of Symbolic Logic, and the Federated Logic Conference. Vianu has served as General Chair of SIGMOD and PODS, and Program Chair of the PODS and ICDT conferences. He is currently Editor-in-Chief of the Journal of the ACM and Area Editor of ACM Transactions on Computational Logic. He was elected Fellow of the ACM in 2006.

Data crowdsourcing Tova Milo Crowd-based data sourcing is a new and powerful data procurement paradigm that engages Web users to collectively contribute data, analyze information and share opinions. Crowd-based data sourcing democratizes data-collection, cutting companies' and researchers' reliance on stagnant, overused datasets and bears great potential for revolutionizing our information world. Yet, triumph has so far been limited to only a handful of successful projects such as Wikipedia or IMDb. This comes notably from the difficulty of managing huge volumes of data and users of questionable quality and reliability. Every single initiative had to battle, almost from scratch, the same non-trivial challenges. The ad hoc solutions, even when successful, are application specific and rarely sharable. In this talk we consider the development of solid scientific foundations for Web-scale data sourcing. We believe that such a principled approach is essential to obtain knowledge of superior quality, to realize the task more effectively and automatically, be able to reuse solutions, and thereby to accelerate the pace of practical adoption of this new technology that is revolutionizing our life. We will consider the logical, algorithmic, and methodological foundations for the management of large scale crowd-sourced data as well as the the development of applications over such information. Tova Milo received her Ph.D. degree in Computer Science from the Hebrew University, Jerusalem, in 1992. After graduating she worked at the INRIA research institute in Paris and at University of Toronto and returned to Israel in 1995, joining the School of Computer Science at Tel Aviv university where she is now a full Professor and Department head. Her research focuses on advanced database applications such as data integration, XML and semi-structured information, Web-based applications and Business Processes, studying both theoretical and practical aspects. Tova served as the Program Chair of several international conferences, including PODS, ICDT, VLDB, XSym, and WebDB. She is a member of the VLDB Endowment and the ICDT executive board and is an editor of TODS, the VLDB Journal and the Logical Methods in Computer Science Journal. She has received grants from the Israel Science Foundation, the US-Israel Binational Science Foundation, the Israeli and French Ministry of Science and the European Union. She is a recipient of the 2010 ACM PODS Alberto O. Mendelzon Test-of-Time Award and of the prestigious EU ERC Advanced Investigators grant.

Extracting Data from the Web Georg Gottlob, Oxford University This talk deals with the problem of semi-automatically and fully automatically extracting data from the Web. Data on the web are usually presented to meet the eye, and are not structured. To use these data in business data processing applications, they need to be extracted and structured. In the first part of this seminar, the need of web data extraction is illustrated using examples from the business intelligence area. In the second part, a theory of web data extraction based on monadic second-order logic and monadic datalog is presented, and some complexity results are discussed. The third part of this talk briefly illustrates the Lixto tool for semi-automatic data extraction. This datalog-based tool has been used for a variety of commercial applications. Finally, in the fourth part of the talk we discuss the problem of fully automated data extractions from domain-specific web pages and present first results of the DIADEM project, which is funded by an ERC advanced grant at Oxford University. Georg Gottlob is a Professor of Informatics at Oxford University, a Fellow of St John's College, Oxford, and an Adjunct Professor at TU Wien. His interests include data extraction, database theory, graph decomposition techniques, AI, knowledge representation, logic and complexity. Gottlob has received the Wittgenstein Award from the Austrian National Science Fund, is an ACM Fellow, an ECCAI Fellow, a Fellow of the Royal Society, and a member of the Austrian Academy of Sciences, the German National Academy of Sciences, and the Academia Europaea. He chaired the Program Committees of IJCAI 2003 and ACM PODS 2000, was the Editor in Chief of the Journal Artificial Intelligence Communications, and is currently a member of the editorial boards of journals, such as CACM and JCSS. He is the main founder of Lixto (www.lixto.com), a company that provides tools and services for web data extraction. Gottlob was recently awarded an ERC Advanced Investigator's Grant for the project "DIADEM: Domaincentric Intelligent Automated Data Extraction Methodology" (see also http://diadem.cs.ox.ac.uk/) . More information on Georg Gottlob can be found on his Web page: http://www.cs.ox.ac.uk/people/georg.gottlob/

Knowledge Harvesting from the Web Gerhard Weikum The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, ReadTheWeb, and YAGONAGA, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating naturallanguage text, deep question answering, and semantic search for entities and relations in Web and enterprise data. This talk discusses recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. Gerhard Weikum is a Scientific Director at the Max Planck Institute for Informatics in Saarbruecken, Germany, where he is leading the department on databases and information systems. He is also an Adjunct Professor at Saarland University, and a principal investigator of the DFG Cluster of Excellence on Multimodal Computing and Interaction. Earlier he held positions at Saarland University in Saarbruecken, Germany, at ETH Zurich, Switzerland, at MCC in Austin, Texas, and he was a visiting senior researcher at Microsoft Research in Redmond, Washington. He graduated from the University of Darmstadt, Germany. Gerhard Weikum's research spans transactional and distributed systems, self-tuning database systems, DB&IR integration, and the automatic construction of knowledge bases from Web and text sources. He co-authored a comprehensive textbook on transactional systems, received the VLDB 10-Year Award for his work on automatic DB tuning, and is one of the creators of thee Yago knowledge base. Gerhard Weikum is an ACM Fellow, a Fellow of the German Computer Society, and a member of the German Academy of Science and Engineering. He has served on various editorial boards, including Communications of the ACM, and as program committee chair of conferences like ACM SIGMOD, Data Engineering, and CIDR. From 2003 through 2009 he was president of the VLDB Endowment. He received the ACM SIGMOD Contributions Award in 2011.

Reasoning on Web Data Semantics Marie-Christine Rousset, Grenoble University, Institut Universitaire de France Providing efficient and high-level services for integrating, querying and managing Web data raises many difficult challenges, because data are becoming ubiquitous, multi-form, multi-source and musti-scale. Data semantics is probably one of the keys for attacking those challenges in a principled way. A lot of effort has been done in the Semantic Web community for describing the semantics of information through ontologies.In this talk, we will show that description logics provide a good model for specifying ontologies over Web data (described in RDF), but that restrictions are necessary in order to obtain scalable algorithms for checking data consistency and answering conjonctive queries. We will show that the DL-Lite family has good properties for combining ontological reasoning and data management at large scale, and is then a good candidate for beeing a Semantic Web data model. Marie-Christine Rousset is a Professor of Computer Science at the University of Grenoble. She is an alumni of The Ecole normale supÈrieure (Fontenay-aux-Roses) from which she graduated in Mathematics (1980). She obtained a PhD (1983) and a ThËse d'Etat (1988) in Computer Science from UniversitÈ Paris-Sud. Her areas of research are Knowledge Representation, Information Integration, Pattern Mining and the Semantic Web. She has published over 90 refereed international journal articles and conference papers, and participated in several cooperative industry-university projects. She received a best paper award from AAAI in 1996, and has been nominated ECCAI fellow in 2005. She has served in many program committees of international conferences and workshops and in editorial boards of several journals. She has been junior member of IUF (Institut Universitaire de France) from 1997 to 2001, and has just been nominated in 2011 as a senior member of IUF for developing a five-year research project on Artificial Intelligence and the Web.

Social networks Pierre Senellart, Telecom ParisTech Social networking services on the Web are a tremendously popular way to connect with friends, publish content, and share information. We will talk about some of the research challenges they present: 1) How to crawl, index, and query social networks? 2) How to explain the particular small-world characteristics of social networking graphs? 3) How to use social connections to improve the quality of Web search or recommendations? Dr. Pierre Senellart is an Associate Professor in the DBWeb team at Télécom ParisTech, the French leading engineering school specialized in information technology. He is an alumnus of the École normale supérieure and obtained his M.Sc. (2003) and his Ph.D. (2007) in computer science from Université Paris-Sud, studying under the supervision of Serge Abiteboul. Pierre Senellart has published articles in internationally renowned conferences and journals (PODS, AAAI, VLDB Journal, Journal of the ACM, etc.) He has been a member of the program committee and participated in the organization of various international conferences and workshops (including WWW, CIKM, ICDE, VLDB, SIGMOD, ICDT). He is also the Information Director of the Journal of the ACM. His research interests focus around theoretical aspects of database management systems and the World Wide Web, and more specifically on the intentional indexing of the deep Web, probabilistic XML databases, and graph mining. He also has an interest in natural language processing, and has been collaborating with SYSTRAN, the leading machine translation company.