niso_1.htm
National Academies Press NISO Presentation
Experimental Navigation: Discovery, Exploration, Distillation

Michael Jensen (mjensen@nas.edu)
Director of Publishing Technologies, the National Academies Press (www.nap.edu)
Director of Web Communications, the National Academies (www.nationalacademies.org)


niso_2.htm
Presentational Plan

  • Overview of the NAP
  • The balancing act of our existing Open Access model
  • Description/Demonstration of NAP's Discovery, Exploration, and Distillation and Tools
  • Discussion of "Discovery" vs. "Search"
  • Experimental possibilities
  • The Secret to Discovery Success
niso_3.htm
National Academies Press

  • Publisher for:
    • National Academy of Sciences
    • National Academy of Engineering
    • Institute of Medicine
    • National Research Council
  • Publishes ~ 200 reports/year advising the US and world on issues of science, engineering, technology, medicine, and health.
niso_4.htm
The Open Access Publications of
The National Academies Press today

  • > 3600+ reports fully, freely browsable online (> 550,000 pages available, each printable)
  • > 18,000,000 visitors/year; ~ 20% from developing countries
  • ~ 160,000,000 page views/year (95 million Openbook pages, 65 million other)
  • NAP has been digitizing publications for free online dissemination since 1994 (GIF page images, page-based HTML, PDFs, TEI XML)
niso_5.htm
Overall Missions of
The National Academies Press

Dual, Competing Missions:

a) Dissemination:
generate the most impact by getting reports into the most hands and minds

b) Cost Recovery/Self-Sustainability:
NAP is required to be self-sustaining through sales of content
niso_6.htm
Confronting the Dissemination Mission

  • Balancing act of sales and openness
  • All publications FREE for online browsing/reading by anyone
  • All publications have rich Discovery and Exploration tools
  • About half of our publications are free in PDF; the other half are for sale in PDF; all publications free in Acrobat PDF for researchers in developing countries
niso_7.htm
Recent Sample Publications
niso_8.htm
Challenges to NAP Report Discoverability
  • Book-length documents make single-term searching wacky
  • Vast diversity of content (science, technology, medicine)
  • Terms-of-art
  • Very general (How People Learn) to very specific (Iodotrifluoromethane: A Toxicity Review)

    However:
  • Consistent, coherent, controlled environment
  • In-house talent
  • Institutional willingness to experiment
niso_9.htm
Basic Open Access
  • Table of Contents, for every report, with links to opening page of the chapter
  • Openbook Page, every report page displayed with navigation and search
  • Search inside book, related titles, etc.
niso_10.htm
Unique Knowledge Discovery and Exploration Tools
  • Based on both explicit and implicit metadata, as well as fulltext; lexical work all based on ASCII
  • Active Skim View of every chapter, to ease online browsing
  • Discovery Engine, an integrated search results set with intrinsic further-exploration tools (FMLT, etc)
  • Web Search Builder, a means of using the key terms of any chapter to build targeted searches of Google, etc., as well as the National Academies Press.
  • Reference Finder, a Web form into which one can drop a rough draft or an article to "find more like" it.
niso_11.htm
Discovery vs. Search
  • Search is about precision: finding what you ask for; Discovery is about finding what you wanted, but didn't know to ask for.
  • Successful search is precise; successful discovery is always approximate
  • Goal is to enable serendipitous discovery -- things close to what we want, that expand our understanding
  • Discovery may need expansion or refinement; search rarely does.
  • Discovery requires "find more like this" that is content- and context-aware
  • How does the user find what she wants, in a diverse, interface-driven, multi-container, multi-author, distributed environment?
  • What kind of "knowledge exploration, discovery, and distillation" capabilities are really desired by users?
  • ... and which users?
niso_12.htm
Discovery "Standards"?
  • Difficult to say that Discovery/Exploration results are "right" -- but we will be experimenting with this
  • Lexical discovery, linguistic navigation, and thematic-clustering, etc. may be appropriate for researchers, but also need "Search" for most users
  • Further experiments are of course needed, e.g. Clustering based on common key elements; Lexical Filtering with similar basis
  • Significance weights (of terms, of types, of contexts) will need to be user-driven more and more, as more material online
  • "Implicit metadata" may be a assistive; "explicit metadata" may be determinative. How much engagement from the author/recommender/publisher/provider can be expected?
  • "suggested tags" vs. "recommended key terms" vs. "algorithmic context" vs. "user-driven connectedness over time"
niso_13.htm
Discoverability Challenges
  • Big and little documents
  • Fulltext and abstract-only documents
  • General and specialized documents
  • Interconnected and orphaned documents
  • Standards-based and nonstandard documents
  • Expository text and cookbook text
  • Curated and hoovered
  • Open and gated
  • HTML, PDF, database, etc.
niso_14.htm
What is the Secret to Discovery Success?
    to fail,
       and fail,
        and fail again ...
    but less,
      and less,
        and less.

    (after Piet Hein)