Recommendations
General
The JSTOR search engine provides matches to search terms with both natural language and fielded searches. Regardless of the type of catalogued item, JSTOR provides matches to text and images using words provided by users. When reviewing your collection’s metadata, this implies:
- Generally more text is better
- A unique title for each item will increase discoverability
- A unique description of each item will increase discoverability
- Information about the item itself will increase the number of ways your item is useful to users for different tasks in their research.
JSTOR faces a global audience. When selecting the title of your collection, or items, consider that people with different worldviews will discover your collection.
For example, a collection titled The 1980s is very broad. By naming your collection so broadly, you may be implying the content contains important events, decisions, actions taken during that period, especially if your collection is actually much more focused. Consider how someone from South Africa would view The 1980s - they may expect to see something about the anti-Apartheid movement.
Another example is the words Civil War. For an American audience, this would most likely be taken as the American Civil War (1861-1865); others around the world could assume that it is the Russian Civil War, or one of the civil wars on the African continent. Label for the global audience. If your collection that you originally titled The 1980s is in fact a selection of school yearbooks from that decade, consider a more specific title such as University of Michigan Yearbooks from the 1980s, which is much more descriptive and does not erase other major events and trends of this period.
It is also important to be cognizant of the harmful biases that may be depicted in your items and provide context in the description of your collection as well as in individual items. As a library resource, we understand that we must provide access to information that is offensive or considered harmful in modern context. If an item depicts racial bias, gender bias, bias against a specific sexual orientation, ethnic bias, nationalistic bias, religious biases, biases against people with disabilities, or depictions of violence, we kindly ask that you rationalize the inclusion of these items in your collection in the item’s description field so that users have a reference point to discover more information. Ideally, the supplementary information that you provide will enable users to think critically about what is depicted in your items.
While these are general best practices for your metadata, there are positive search related side effects to these suggestions. Your items will have additional contextual information, so they will match more keywords, thereby matching and ranking in ways more helpful to the users on JSTOR.
Item dates
A common search tactic used by users on JSTOR is to filter results by their publication date. This enables users to focus their search to items created within a specific range of years. While a publication date is not required to publish to JSTOR, a publication date is required for those items to match searches that include publication date range criteria.
JSTOR search users can also sort their results by publication date in ascending or descending order. Items that do not include a publication date are sorted to the bottom when either ascending or descending order is chosen by a user. This functionality is enabled by the Precise Date field on your items.
Some items do not have a precise publication date. For example, an Ancient Greek statue may only be known to have been created within some range of decades two thousand years ago. JSTOR search publication date criteria do not yet allow for querying a range of years - however, JSTOR Forum does provide a means to catalogue your items with a date range using temporal_coverage (as described here). We recommend cataloguing your items with a temporal_coverage in JSTOR Forum if applicable.
See also Artstor Specific Fields and JSTOR Target Field List – JSTOR Forum Support.
For images and other non-textual item types
JSTOR search takes keywords provided by the user and matches them in the item metadata. In the case of non-text content, the metadata is primarily what makes your item match keywords provided by users. You can find information below about what metadata fields are used in different search experiences on JSTOR.
Default and Available Search Fields
search index field name | Description | Basic/Advanced Search Fields | Community Collections Pages Search Fields | Available to Search Explicitly Anywhere |
ti | title | ✅ | ✅ | ✅ |
tb | book title | ✅ | ✅ | ✅ |
tbsub | book sub title | ✅ | ❌ | ✅ |
ab | abstract | ✅ | ✅ | ✅ |
au | author | ✅ | ✅ | ✅ |
ra | review author | ❌ | ✅ | ✅ |
ocr | ocr data or text of digitized content | ✅ | ✅ | ✅ |
collection_titles_tokenized | collection title | ❌ | ✅ | ✅ |
collection_hierarchies_tokenized | collection hierarchy | ❌ | ✅ | ✅ |
holding_institution_tokenized | holding institution | ❌ | ✅ | ✅ |
project_title_tokenized | project title | ❌ | ✅ | ✅ |
browse_categories_tokenized | browse categories | ❌ | ✅ | ✅ |
compilation_titles | compilation title | ❌ | ✅ | ✅ |
cc_custom_fields | searchable field for custom fields | ❌ | ✅ | ✅ |
cc_image_view_description | community collection image view description | ❌ | ✅ | ✅ |
cc_locations | community collection locations | ❌ | ✅ | ✅ |
cc_origin_description | origin | ❌ | ✅ | ✅ |
cc_physical_attributes | physical attributes | ❌ | ✅ | ✅ |
cc_repository | repository | ❌ | ✅ | ✅ |
cc_work_type | ❌ | ✅ | ✅ | |
primary_agents | ❌ | ✅ | ✅ | |
secondary_agents | ❌ | ✅ | ✅ | |
cc_portal_title_tokenized | portal title for searching | ❌ | ✅ | ✅ |
cc_container_title_tokenized | container title for searching | ❌ | ✅ | ✅ |
accession number | Not in the ccda afaict | ❌ | ❌ | ❌ |
doi | ❌ | ❌ | ✅ | |
eisbn | ❌ | ❌ | ✅ | |
eissn | ❌ | ❌ | ✅ | |
isbn | ❌ | ❌ | ✅ | |
issn | ❌ | ❌ | ✅ | |
local | Not used by any record - legacy field created for books | ❌ | ❌ | ❌ |
ps_subject | ❌ | ❌ | ✅ | |
ps_desc | ❌ | ❌ | ✅ | |
la | language | ✅ | ❌ | ✅ |
While you can enable OCR for your project in JSTOR Forum, please ensure that the OCR data produced is accurate. If you process any other item type than documents using OCR, you may end up with inaccurate data, leading to your items matching for random, irrelevant, or possibly harmful or inappropriate queries. To avoid doing so, make sure you only process documents with OCR and verify the validity of the OCR output.
JSTOR recommends applying a Resource Type from our controlled list of values (available as a list in Forum). Each Resource Type then rolls up to one of five Content Types (Books, Serials, Documents, Images and Audiovisual). Books, Serials and Documents are for textual content and labeled in the search index as 'text'. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as "Images". The five Content Types are used as facets for filtering search results.
The Resource Type used on image content should be one that rolls up to the Content Type “Images” and not one of the textual Content Types because text and images are differentiated in the search index and treated differently in search results and facets. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as Content Type "Images", so a Resource Type is not absolutely necessary on image content items.
For text
For items that are images depicting text documents, we recommend that you enable OCR for your project. Any metadata included with your item will be included in the search index as well. Search queries will match both the metadata and OCR data.
If your text item is in the form of a PDF, at this time, we do not extract the text or OCR these documents. See below for information about future improvements.
JSTOR recommends applying a Resource Type from our controlled list of values (available as a list in Forum). Each Resource Type then rolls up to one of five Content Types (Books, Serials, Documents, Images and Audiovisual). Books, Serials and Documents are for textual content and labeled in the search index as 'text'. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as "Images". The five Content Types are used as facets for filtering search results.
The Resource Type used on textual content should be one that rolls up to one of the text Content Types (“Books”, “Serials”, or “Documents”) and not “Images” because text and images are differentiated in the search index and treated differently in search results and facets.
If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as Content Type "Images", so a textual Resource Type and Content Type is especially important for text content. If text content is mislabeled as Content Type “Images”, the items will be displayed as images in search results and will not be included in faceted filtering for textual content.
Future Work
We are continuing to refine how searching works in JSTOR, and will update this page as developments are made. Please contact us for more information.
Comments
0 comments
Please sign in to leave a comment.