Data Sources
Last updated: 12 March 2026
Scholise retrieves bibliographic metadata from established, openly-accessible academic databases. Every source shown in the application has a verified identifier — we never fabricate or hallucinate citations.
1. OpenAlex
OpenAlex is a free, open catalogue of the global research system, maintained by OurResearch. It indexes over 250 million scholarly works across all disciplines.
From OpenAlex, Scholise retrieves:
- Title, authors, publication date, and journal/venue name
- DOI and open-access status
- Citation count and source type (journal article, book chapter, proceedings, etc.)
- Institutional affiliations of authors
- Abstract snippets (inverted index format, reconstructed for display)
2. Semantic Scholar
Semantic Scholar is a free AI-powered academic search engine from the Allen Institute for AI. It indexes over 200 million papers across science and medicine.
From Semantic Scholar, Scholise retrieves:
- Title, authors, publication year, and abstract
- Citation count and DOI (when available)
- TLDR — short AI-generated summaries displayed as "AI Summary" on result cards
Results from Semantic Scholar are merged with OpenAlex and Crossref by DOI. Sources that originate from Semantic Scholar display an "S2" badge for transparency.
3. Crossref
Crossref is a not-for-profit organisation that maintains the DOI registration system used by most academic publishers. Its API provides authoritative metadata for over 150 million records.
From Crossref, Scholise retrieves:
- Title, authors, publication date, and publisher name
- DOI, ISSN, and container (journal) title
- Citation count and document type
- Author affiliations when available
4. Unpaywall
Unpaywall is a database of open-access versions of scholarly articles. For every paper in search results that has a DOI, Scholise checks Unpaywall for a free legal full-text PDF link.
From Unpaywall, Scholise retrieves:
- Open-access status (is_oa)
- Direct PDF URL when a free legal full-text version exists
When a free PDF is available, we display a green "Free PDF" button on the result card. Results are cached for 30 days to reduce API load. Unpaywall does not require an API key; we identify ourselves with a contact email as requested by their API terms.
5. AI-enhanced search (Claude)
In addition to OpenAlex and Crossref, Scholise uses Anthropic's Claude AI with web search capabilities to discover academic sources across the open web. This includes results from:
- Google Scholar and PubMed
- ResearchGate, SSRN, and arXiv
- University institutional repositories
- Publisher websites and conference proceedings
How it works: Your research question is sent to the AI provider, which performs web searches targeting academic and scholarly pages. The AI extracts structured bibliographic data (title, authors, DOI, year, venue, abstract) from the pages it finds. These results are then normalised into the same format as OpenAlex and Crossref data and merged with de-duplication.
Pro users also benefit from AI query expansion — Claude generates optimised academic search terms with discipline-specific terminology and synonyms, which improves the quality of results from OpenAlex and Crossref.
Important: AI-discovered sources are extracted from web pages and may occasionally contain incomplete or slightly inaccurate metadata. Always verify AI-found sources using their DOI link or publisher page before citing them in academic work.
6. Draft Check
The Draft Check feature analyses your pasted text sentence by sentence, labelling each as Supported, Needs Citation, Over-claiming, Low Confidence, or Opinion. It uses Claude to classify your sentences and match them against your project's saved sources.
Data used: Your pasted draft text and the titles of sources saved in your project are sent to the AI provider. The AI returns sentence-level labels and suggested source IDs. When displaying suggested sources, we also fetch citation count, reference count, and free PDF links from our cached paper metadata (Semantic Scholar) and Unpaywall data — the same sources used for search results.
7. De-duplication
Because a single paper often appears in multiple providers (OpenAlex, Crossref, Semantic Scholar, and Claude AI search), Scholise de-duplicates results by matching on DOI. When the same work is found in multiple sources, we merge the records and pick the most complete metadata from each. The provenance badge in the app indicates the origin:
- OpenAlex — metadata came exclusively from OpenAlex.
- Crossref — metadata came exclusively from Crossref.
- S2 — metadata came from Semantic Scholar (TLDR summaries and citation counts may be included).
- Claude — metadata was found by AI web search only.
- Both — metadata was found in multiple sources and merged.
8. DOI requirement
Scholise strongly prioritises sources with a Digital Object Identifier (DOI). DOIs provide a permanent, resolvable link to the publisher's landing page. Sources without a DOI may still appear if they have sufficient metadata, but DOI-backed sources are ranked higher because their provenance is verifiable.
9. Open-access indicators
Where available, Scholise displays an open-access badge for sources that can be read without a subscription. This information comes from OpenAlex's open-access dataset and may include links to free full-text versions hosted on publisher sites, institutional repositories, or preprint servers.
For papers with a DOI, we also check Unpaywall for a free legal PDF. When available, a green "Free PDF" button appears on the result card, linking directly to the full-text document.
10. Peer-reviewed focus
Scholise uses a heuristic scoring system to prioritise sources that are likely peer-reviewed. Signals include:
- Document type (journal articles, conference proceedings, book chapters score higher)
- Publisher type (university presses and known academic publishers score higher)
- Presence of an ISSN (indicates a serial publication, often peer-reviewed)
- Institutional affiliations of authors
- Citation count (well-cited work is more likely to have undergone review)
This is a heuristic, not a guarantee. The scoring helps surface quality sources, but it cannot certify that any individual work has been peer-reviewed. If peer-review status is critical for your use case, verify it directly with the publisher or journal.
11. Source verification
Every source in Scholise links to verifiable external records:
- DOI resolution — clicking a DOI link resolves to the publisher's landing page via doi.org.
- Publisher information — Crossref provides publisher and member details for each registered DOI.
- OpenAlex work page — each source has a corresponding OpenAlex record with full bibliographic detail, citation graphs, and related works.
We encourage users to follow these links to verify any source before relying on it in their work.
12. Limitations
- Metadata quality depends on what publishers register with Crossref and what OpenAlex indexes. Some fields (abstracts, affiliations) may be incomplete.
- Very recent publications may take days or weeks to appear in either database.
- Non-English sources are indexed but may have less complete metadata.
- Grey literature, theses, and non-DOI preprints have limited coverage.
- AI-discovered sources are extracted from web pages and may occasionally have incomplete metadata (e.g. missing abstracts or approximate citation counts).
13. Questions
If you notice incorrect metadata or have questions about our data sources, contact us at support@scholise.com.