arXiv PDF Extractor
상업적 사용 OK
380+ 모델
워터마크 없음
가입이 필요하지 않습니다
모델:
+ GPT-5, Claude, Gemini
Drop an arXiv preprint, journal paper, or thesis chapter — AI converts it into clean LaTeX-flavored text. Math equations stay as equations, multi-column layouts get unwound, citations preserved. Powered by Meta Nougat-base.
Reading equations + unwinding columns… ~10 sec/page
고급 옵션
결과
토큰이 부족해요
Get More Tokens
더 먹고 싶어?
하루 5K 토큰 + 10K 보너스 무료 가입
무료로 가입하세요
귀하의 요청을 처리 중...
Drop an arXiv preprint, get clean LaTeX-flavored text with every equation rendered inline. Multi-column layouts handled, references kept intact. Free, AI-powered.
사용 방법 arXiv PDF Extractor
1
입력을 입력하십시오
텍스트를 입력하거나 파일을 업로드하거나 원하는 내용을 설명하세요. 계정이 필요하지 않습니다.
2
생성하기를 클릭하십시오
당사의 AI는 최고의 오픈 소스 모델을 사용하여 몇 초 만에 요청을 처리합니다.
3
다운로드 및 공유
다운로드, 복사 또는 결과를 공유. 개인 및 상업용 무료.
API를 통해 이 도구를 사용
이 도구를 자신의 코드로 자동화하세요. OpenAI 호환 REST 엔드포인트, 베어러 토큰 인증, 추가 SDK 필요 없음. 토큰 비용은 웹 인터페이스와 일치합니다.
curl -X POST https://api.free.ai/v1/chat/ \
-H "Authorization: Bearer sk-free-..." \
-H "Content-Type: application/json" \
-d '{"model": "qwen7b", "messages": [{"role": "user", "content": "Use the arXiv PDF Extractor tool on: ..."}]}'
arXiv PDF Extractor — FAQ
Drop in an arXiv preprint and the AI converts the entire paper into clean LaTeX-flavored text. Equations come back as proper LaTeX, multi-column layouts unwound, references intact. Built on Meta Nougat, trained specifically on millions of arXiv pages.
Nougat's training corpus was arXiv preprints — so it absolutely shines on the IEEE / ACM / NeurIPS / ICML / arXiv layout family. Other PDF extractors choke on multi-column math; this one was designed for it.
Download the PDF from arXiv (e.g. arxiv.org/pdf/2401.12345), upload it here, get back a single .txt file with the full paper as LaTeX-flavored text. No arXiv API key needed; we just need the PDF.
Yes — that's the headline feature. Inline math is `$...$`, displayed math `$$...$$`. Even raster-rendered equations in older papers come through correctly because the model treats each page as an image.
Auto-handled. Two-column IEEE-style is the most common arXiv layout and Nougat unwinds it into proper reading order without a config flag.
Yes — inline `[12]` / `[Smith2020]` markers stay where they belong, and the full reference list at the end is extracted intact for downstream BibTeX / Zotero use.
~8-15 sec/page. A 12-page conference paper takes ~2-3 min. NeurIPS-style 30+ page papers with appendices: 8-12 min. Submit and walk away.
300 tokens/page, floor 600. Most arXiv conference papers (8-15 pages) are 2,400-4,500 tokens. Daily free pool covers ~1-2 papers/day for signed-in users; paid plans get unlimited.
Feed it to ChatGPT / Claude for "explain this paper", build personal RAG over your saved papers, semantic-search your reading list, copy equations into your own LaTeX project, or read the paper as plain text on your phone.
Yes — Nougat OCRs internally. arXiv has been LaTeX-rendered for 25+ years so most preprints are clean digital. Older scanned papers work but math fidelity drops slightly; rescan at 300+ DPI for best results.
PDFs deleted right after extraction. LaTeX output is kept 24h (anonymous) / 7 days (paid share link). Never used for training. arXiv PDFs are public CC-BY anyway, but we don't store them either way.
Yes — POST multipart `file` to /v1/document/academic-pdf/. JSON response with `text_url`, `pages`, `preview`, `tokens`, `share_url`. Bearer auth (sk-free-…) gives 10K free tokens/month. /api/ for curl example.
이 도구를 어떻게 평가하시겠습니까?