Data and Information Subject Headings and Terms


commercial aggregator products/vendors: EBSCO, Gale, ProQuest

data: structured, unstructured
governance: lineage [active versus lazy lineage], provenance, validity – backward and forward tracing queries, downstream and  upstream  applications
collection: information retrieval, web scraping;
preparation: quality: [profiling, cleaning, auditing]
storage: database, cloud
presentation: visualisation

information: abstract, bibliometrics, classification [faceted], catalogue [description |transcribed, identifier/locator|composed], index (derived or assigned), metadata [attributes/elements/tags/markers], subjecttopic directory  language: [controlled vocabulary [subject headings, thesauri], machine language [html, xml], natural language].

information resource: audio, bibliographic record/metadata, database, digital object, document, image, map, spreadsheet, video, webpage,

information retrieval/searching:  Boolean, clustering, keyword, KWIC, KWOC, limiting, phraseproximityrelevance ranking, semantic searching, stop words, stemming, truncation.
web mining: [crawling  (name resolution – client socket), machine learning, data mining]. Terms related to retrieval: perseverance, inaccuracy,  exhaustivity, specificity, co-extensive entry, recall, relevance, precision, pre-coordinate, post-coordinate.

internet: website: browser: [Internet Explorer, Google, Google Chrome, FireFox search engine: Bing, DuckDuckGo, Excite, Google, HotBot, InktomiLycosNorthern Light, Yahoo (Alta Vista)…webpage: [hyperlink – hypertext – link, memesURLhypertext transport protocol (http)]

programming: algorithm, object library, code, regular expressions, metacharacters

programming languages: [C, C++, Fortran, JavaScript, Python, R, Ruby]

software applications: Cloudera, Excel, FAME, Hadoop, MapReduce, SAS (statistics), Tableau (visualisation)

standards: Anglo American Cataloguing Rules (AACR2), Dewey Decimal Classification (DDC), Dublin Core, Functional Requirements of Bibliographic Records (FRBR), Resource Description Framework (RDF), International Federation of Library Associations and Institutions (IFLA),  Library of Congress Subject Headings (LCSH), MAchine Readable Cataloguing (MARC), SEARS List of Subject Headings, Subject Authority Cooperative Program (SACO), Universal Decimal Classification (UDC), Virtual International Authority File VIAF,

statistics descriptive, correlation, linear and logistic regression, ANOVA, chi-square, cluster, factor analysis


ANOVA (Analysis of variance)
API (application program interface)
CGI (common gateway interface)
ER (Entity Relationship)
FAME (Forecasting Analysis and Modeling Environment)
IP address (Internet Protocol)
UML (Unified Modeling Language)
VIAF (Virtual International Authority File)

Terms waiting to be listed
Bayesian inference problem
Berkeley DB
canonical hostname
Cornell SMART system
denial of service (DoS) attack
document repository
Domain Name Service (DNS)
domain specialists
latent semantic indexing
linked data architecture
MD5 algorithm
MIME is multipurpose Internet mail extensions
ontology (concrete syntactic structure that models the semantics of a domain-conceptual framework-in a machine readable form. Jacob 2003 p19)
port 80
relational databases
Rocchio’s method
semantic knowledge
shingling (a way of detecting near duplicate websites)
sockets (blocking and nonblocking)
statistical pattern recognition
syntactic and statistical analysis
TFIDF vectors
topic distillation
vector-space model
Verity’s Search97
vertical portals
zlib (compression library)

History: Who and When [1]

The Talmud, with its heavy use of annotations and nested
commentary, and the Ramayana and Mahabharata, with their branching, nonlinear
discourse, are ancient examples of hypertext [1].

Ted Nelson coined the term hypertext in 1965 [1].

Tim Berners-Lee, 1980 begins developing the internet

Memex: photo-electrical-mechanical storage and computing device [1].

Jerry Yang and David Filo, Ph.D. students at Stanford University, created the Yahoo!2 directory ( 1994

Larry Page and Sergey Brin, Ph.D. students at Stanford University create BackRub (early Google) 1996




Brunner (2016)

  • Data Science;
  • Informatics;
  • Visualization;
  • Probability;
  • Statistics;
  • High Performance Computing;
  • Cloud Computing;
  • Databases;
  • Python Programming


UNIX -> command line interface


Convolution: – Basically pattern matching across spatial locations, but… – The patterns (filters) are not designed a priori, but learned from the data and task. Pooling: – Accumulating local statisitcs of filter responses from the convolution layer. – Leads to local spatial invariance for the learned patterns.


  1. Soumen Chakrakarti Mining the Web Discovering Knowledge from Hypertext
    [*] T. Nelson. A file structure for the complex, the changing, and the indeterminate. In Proceedings of the ACM National Conference, pages 84–100, 1965.
  2. Data The Advent of the Algorithm
  3. Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description.Synthesis Lectures on the Semantic Web: Theory and Technology April 2015, 154 pages, (doi:10.2200/S00620ED1V01Y201412WBE012)

  4. Brunner, R & Kim, E 2016, ‘Teaching Data Science’, Procedia Computer Science, vol. 80, no. 1, pp. 1947-1956.
  5. Hider, P & Harvey, R 2008, Organising Knowledge in a Global Society Principles and Practice in Libraries and Information Centres, Elsevier Science, Burlington.