commercial aggregator products/vendors: EBSCO, Gale, ProQuest
data: structured, unstructured
governance: lineage [active versus lazy lineage], provenance, validity – backward and forward tracing queries, downstream and upstream applications
collection: information retrieval, web scraping;
preparation: quality: [profiling, cleaning, auditing]
storage: database, cloud
information: abstract, bibliometrics, classification [faceted], catalogue [description |transcribed, identifier/locator|composed], index (derived or assigned), metadata [attributes/elements/tags/markers], subject, topic directory language: [controlled vocabulary [subject headings, thesauri], machine language [html, xml], natural language].
information resource: audio, bibliographic record/metadata, database, digital object, document, image, map, spreadsheet, video, webpage,
information retrieval/searching: Boolean, clustering, keyword, KWIC, KWOC, limiting, phrase, proximity, relevance ranking, semantic searching, stop words, stemming, truncation.
web mining: [crawling (name resolution – client socket), machine learning, data mining]. Terms related to retrieval: perseverance, inaccuracy, exhaustivity, specificity, co-extensive entry, recall, relevance, precision, pre-coordinate, post-coordinate.
internet: website: browser: [Internet Explorer, Google, Google Chrome, FireFox] search engine: Bing, DuckDuckGo, Excite, Google, HotBot, Inktomi, Lycos, Northern Light, Yahoo (Alta Vista)…] webpage: [hyperlink – hypertext – link, memes, URL, hypertext transport protocol (http)]
programming: algorithm, object library, code, regular expressions, metacharacters
software applications: Cloudera, Excel, FAME, Hadoop, MapReduce, SAS (statistics), Tableau (visualisation)
standards: Anglo American Cataloguing Rules (AACR2), Dewey Decimal Classification (DDC), Dublin Core, Functional Requirements of Bibliographic Records (FRBR), Resource Description Framework (RDF), International Federation of Library Associations and Institutions (IFLA), Library of Congress Subject Headings (LCSH), MAchine Readable Cataloguing (MARC), SEARS List of Subject Headings, Subject Authority Cooperative Program (SACO), Universal Decimal Classification (UDC), Virtual International Authority File VIAF,
statistics descriptive, correlation, linear and logistic regression, ANOVA, chi-square, cluster, factor analysis
ANOVA (Analysis of variance)
API (application program interface)
CGI (common gateway interface)
ER (Entity Relationship)
FAME (Forecasting Analysis and Modeling Environment)
IP address (Internet Protocol)
UML (Unified Modeling Language)
VIAF (Virtual International Authority File)
Terms waiting to be listed
Bayesian inference problem
Cornell SMART system
denial of service (DoS) attack
Domain Name Service (DNS)
latent semantic indexing
linked data architecture
MIME is multipurpose Internet mail extensions
ontology (concrete syntactic structure that models the semantics of a domain-conceptual framework-in a machine readable form. Jacob 2003 p19)
shingling (a way of detecting near duplicate websites)
sockets (blocking and nonblocking)
statistical pattern recognition
syntactic and statistical analysis
zlib (compression library)
History: Who and When 
The Talmud, with its heavy use of annotations and nested
commentary, and the Ramayana and Mahabharata, with their branching, nonlinear
discourse, are ancient examples of hypertext .
Ted Nelson coined the term hypertext in 1965 .
Tim Berners-Lee, 1980 begins developing the internet
Memex: photo-electrical-mechanical storage and computing device .
Jerry Yang and David Filo, Ph.D. students at Stanford University, created the Yahoo!2 directory (www.yahoo.com/) 1994
Larry Page and Sergey Brin, Ph.D. students at Stanford University create BackRub (early Google) 1996
- Data Science;
- High Performance Computing;
- Cloud Computing;
- Python Programming
UNIX -> command line interface
Convolution: – Basically pattern matching across spatial locations, but… – The patterns (filters) are not designed a priori, but learned from the data and task. Pooling: – Accumulating local statisitcs of filter responses from the convolution layer. – Leads to local spatial invariance for the learned patterns.
- Soumen Chakrakarti Mining the Web Discovering Knowledge from Hypertext
[*] T. Nelson. A file structure for the complex, the changing, and the indeterminate. In Proceedings of the ACM National Conference, pages 84–100, 1965.
- Data The Advent of the Algorithm
Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description.Synthesis Lectures on the Semantic Web: Theory and Technology April 2015, 154 pages, (doi:10.2200/S00620ED1V01Y201412WBE012)
- Brunner, R & Kim, E 2016, ‘Teaching Data Science’, Procedia Computer Science, vol. 80, no. 1, pp. 1947-1956.
- Hider, P & Harvey, R 2008, Organising Knowledge in a Global Society Principles and Practice in Libraries and Information Centres, Elsevier Science, Burlington.