Migrating from ref/zotagent
This page answers three things:
- what
ref/zotagentalready implements - how far the current Rust
zotruntime matches it - how the remaining gaps should be filled
Summary first
| Type | Conclusion |
|---|---|
| Covered | DOI / URL / file import, citation-key entry, single-PDF extraction, library semantic index/search |
| Partial substitute | local attachment indexing, full-text retrieval, index status inspection |
| Explicitly missing | s2, add --s2-paper-id, search-in, metadata, read, expand, zotagent-style status, zotagent-style sync, one-shot basic-field item creation |
| Extra capability in this repo | notes, tags, collections, saved searches, feeds, annotation creation, workspaces, Scite, duplicate merge, profile config, publication-status sync |
What zotagent implements
From ref/zotagent/README.md and ref/zotagent/src/cli.ts, the current CLI focuses on four groups:
add- import by DOI
- create an item from basic fields
- search via
s2and import bypaperId
sync/status- scan Zotero attachments
- extract and index PDF / EPUB / HTML / TXT
- report local index status and paths
search- FTS5 full-text keyword search by default
- vector search with
--semantic - document-scoped search with
search-in - bibliography-field search with
metadata
read- block-window reads with
read - cleaned full text with
fulltext - hit expansion with
expand - all addressable by
itemKeyorcitationKey
- block-window reads with
Current zot comparison
zotagent feature | Current zot | Status | Notes |
|---|---|---|---|
| DOI / URL import | item add-doi / item add-url | Covered | Also supports --attach-mode |
| File import | item add-file / item create --pdf | Covered | Can also enrich from --doi |
| citation-key entry | library citekey | Covered | Resolve citekey first, then continue with item workflows |
| Single-PDF full text / outline / annotation reads | item pdf / item fulltext / item outline | Covered | Focused on PDF attachments |
| Library-level vector search | library semantic-index / library semantic-search | Covered | Supports bm25 / semantic / hybrid |
| Curated workspace indexing and Q&A | workspace index / workspace query | Covered | This is a surface zotagent does not have |
| One-shot basic-field item creation | none | Missing | Current flow is DOI / URL / PDF / file first, then update |
s2 search | none | Missing | No Semantic Scholar search command exists today |
add --s2-paper-id | none | Missing | No direct paperId import exists today |
Attachment-wide indexing sync | no equivalent command | Partial | library semantic-index --fulltext covers metadata + PDF only |
Index status | doctor + library semantic-status | Partial | Not a single equivalent command; missing zotagent-style path and error reporting |
| Default keyword full-text search | no equivalent command | Partial | library semantic-search --mode bm25 requires the semantic index first and is not the same FTS5 syntax |
Document-scoped search-in | none | Missing | Current fallback is extract text first, then locate in the agent |
Bibliography-field metadata search | none | Missing | library search has filters, but not field-scoped metadata search |
Block-window read | none | Missing | No block model or block-window read API exists |
Context expand | none | Missing | No block-based expansion surface exists |
citationKey-driven fulltext / expand | none | Missing | Today you must resolve citekey before switching to item key |
| Multi-attachment merged logical document | none | Missing | Current reads are mostly centered on the first PDF attachment |
The biggest semantic mismatch
1. sync has the same name but a different meaning
zotagent sync: attachment extraction + indexingzot sync update-status: publication-status checks for preprints
This should not be hidden behind an alias. In this repo, sync already has a stable meaning.
2. The existing retrieval base is RagIndex, not the zotagent document-manifest surface
This repo already has:
library semantic-index --fulltextlibrary semantic-search --mode bm25|semantic|hybridworkspace index/workspace query
But that stack is mainly built for:
- metadata chunks
- PDF chunks
- RAG and semantic retrieval
It is still not:
- a document-scoped block read surface
search-inexpand- bibliography metadata search
- a unified extractor for PDF / EPUB / HTML / TXT attachments
3. The Rust CLI already has many Zotero-native workflows beyond zotagent
This repo already adds:
- notes / tags / collections read-write flows
- saved search
- feeds
- annotation creation
- duplicates / merge
- workspace
- Scite
- profile config
- publication-status sync
So the right completion path is not to clone zotagent command names. It is to fit the missing capability into the existing library / item / workspace structure.
Completion plan
Phase 1: extend the indexing foundation
Goal:
- turn the current PDF-leaning RAG base into a stable attachment text index
Main work:
- add a unified extractor abstraction in
zot-localfor PDF / EPUB / HTML / TXT - persist manifests, extraction status, and error catalog state
- let multi-attachment items form one logical document
- add a read-only status API that reports index paths, attachment counts, error counts, and last update time
Constraints:
- do not change the meaning of
sync update-status - keep persistent index files out of temp directories
Verification:
- fixture coverage for PDF / EPUB / HTML / TXT
- re-runs skip unchanged files cleanly
- status output is stable and testable
Phase 2: fill the read and retrieval surfaces
Goal:
- give agents an honest full-text read / expansion / document-search entrypoint
Main work:
- add a document-scoped search command equivalent to
search-in - add a block-window read command equivalent to
read - add a block-context expansion command equivalent to
expand - add field-scoped metadata search
- add
citationKeyselectors to text-reading commands
Naming rule:
- keep the
library/itemsplit - do not restore flat top-level aliases
Verification:
- integration coverage for hit blocks, radius expansion, and citation-key routing
- multi-attachment items must keep stable block numbering
Phase 3: fill the import and Semantic Scholar gaps
Goal:
- close the two largest holes inside the current
zotagent addstory
Main work:
- add a Semantic Scholar search surface
- add direct import by
paperId - add one-shot item creation from basic metadata fields
Constraints:
- continue writing through the Zotero Web API only
- keep the current JSON envelope
Verification:
- DOI, paperId, and manual metadata import flows each need regression coverage
- missing credentials must fail clearly instead of half-succeeding
Phase 4: close the loop in skills, docs, and regression fixtures
Goal:
- make sure agents and docs only promise real capability
Main work:
- update
skills/zot-skills/SKILL.md - add ref\zotagent migration prompts and evals
- sync the bilingual docs
Verification:
- docs site build succeeds
- skill regression prompts cover the
search-in,status, ands2gaps
Current agent boundary
Until those phases land, the correct behavior is:
- for
search-in/expand: state that the CLI has no native equivalent, then fall back toitem fulltext/item pdf - for
status: usedoctor+library semantic-status - for
s2/paperIdimport: say plainly that it is not implemented yet - for zotagent-style
sync: explain that onlylibrary semantic-index --fulltextorworkspace indexexists as a partial substitute
The rule is simple: do not document planned capability as if it already ships.