commercial aggregator products/vendors: EBSCO, Gale, ProQuest
data: structured, unstructured
governance: lineage [active versus lazy lineage], provenance, validity – backward and forward tracing queries, downstream and upstream applications
collection: information retrieval, web scraping;
preparation: quality: [profiling, cleaning, auditing]
storage: database, cloud
processing
presentation: visualisation
security
information: abstract, bibliometrics, classification [faceted], catalogue [description |transcribed, identifier/locator|composed], index (derived or assigned), metadata [attributes/elements/tags/markers], subject, topic directory language: [controlled vocabulary [subject headings, thesauri], machine language [html, xml], natural language].
information resource: audio, bibliographic record/metadata, database, digital object, document, image, map, spreadsheet, video, webpage,
information retrieval/searching: Boolean, clustering, keyword, KWIC, KWOC, limiting, phrase, proximity, relevance ranking, semantic searching, stop words, stemming, truncation.
web mining: [crawling (name resolution – client socket), machine learning, data mining]. Terms related to retrieval: perseverance, inaccuracy, exhaustivity, specificity, co-extensive entry, recall, relevance, precision, pre-coordinate, post-coordinate.
internet: website: browser: [Internet Explorer, Google, Google Chrome, FireFox] search engine: Bing, DuckDuckGo, Excite, Google, HotBot, Inktomi, Lycos, Northern Light, Yahoo (Alta Vista)…] webpage: [hyperlink – hypertext – link, memes, URL, hypertext transport protocol (http)]
programming: algorithm, object library, code, regular expressions, metacharacters
programming languages: [C, C++, Fortran, JavaScript, Python, R, Ruby]
software applications: Cloudera, Excel, FAME, Hadoop, MapReduce, SAS (statistics), Tableau (visualisation)
standards: Anglo American Cataloguing Rules (AACR2), Dewey Decimal Classification (DDC), Dublin Core, Functional Requirements of Bibliographic Records (FRBR), Resource Description Framework (RDF), International Federation of Library Associations and Institutions (IFLA), Library of Congress Subject Headings (LCSH), MAchine Readable Cataloguing (MARC), SEARS List of Subject Headings, Subject Authority Cooperative Program (SACO), Universal Decimal Classification (UDC), Virtual International Authority File VIAF,
statistics descriptive, correlation, linear and logistic regression, ANOVA, chi-square, cluster, factor analysis
ACRONYMS
ANOVA (Analysis of variance)
API (application program interface)
CGI (common gateway interface)
ER (Entity Relationship)
FAME (Forecasting Analysis and Modeling Environment)
IP address (Internet Protocol)
UML (Unified Modeling Language)
VIAF (Virtual International Authority File)
Terms waiting to be listed
Bayesian inference problem
Berkeley DB
canonical hostname
Cornell SMART system
denial of service (DoS) attack
document repository
Domain Name Service (DNS)
domain specialists
facet
foci
informatics
latent semantic indexing
linked data architecture
MD5 algorithm
MIME is multipurpose Internet mail extensions
node
ontology (concrete syntactic structure that models the semantics of a domain-conceptual framework-in a machine readable form. Jacob 2003 p19)
port 80
relational databases
Rocchio’s method
scalability
schema
semantic knowledge
shingling (a way of detecting near duplicate websites)
sockets (blocking and nonblocking)
statistical pattern recognition
syntactic and statistical analysis
tag-tree
TFIDF vectors
topic distillation
vector-space model
Verity’s Search97
vertical portals
zlib (compression library)
History: Who and When [1]
The Talmud, with its heavy use of annotations and nested
commentary, and the Ramayana and Mahabharata, with their branching, nonlinear
discourse, are ancient examples of hypertext [1].
Ted Nelson coined the term hypertext in 1965 [1].
Tim Berners-Lee, 1980 begins developing the internet
Memex: photo-electrical-mechanical storage and computing device [1].
Jerry Yang and David Filo, Ph.D. students at Stanford University, created the Yahoo!2 directory (www.yahoo.com/) 1994
Larry Page and Sergey Brin, Ph.D. students at Stanford University create BackRub (early Google) 1996
Xanadu
Brunner (2016)
- Data Science;
- Informatics;
- Visualization;
- Probability;
- Statistics;
- High Performance Computing;
- Cloud Computing;
- Databases;
- Python Programming
UNIX -> command line interface
Convolution: – Basically pattern matching across spatial locations, but… – The patterns (filters) are not designed a priori, but learned from the data and task. Pooling: – Accumulating local statisitcs of filter responses from the convolution layer. – Leads to local spatial invariance for the learned patterns.
References
- Soumen Chakrakarti Mining the Web Discovering Knowledge from Hypertext
[*] T. Nelson. A file structure for the complex, the changing, and the indeterminate. In Proceedings of the ACM National Conference, pages 84–100, 1965. - Data The Advent of the Algorithm
-
Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description.Synthesis Lectures on the Semantic Web: Theory and Technology April 2015, 154 pages, (doi:10.2200/S00620ED1V01Y201412WBE012)
- Brunner, R & Kim, E 2016, ‘Teaching Data Science’, Procedia Computer Science, vol. 80, no. 1, pp. 1947-1956.
- Hider, P & Harvey, R 2008, Organising Knowledge in a Global Society Principles and Practice in Libraries and Information Centres, Elsevier Science, Burlington.