Technology

DeepMatch™ — contextual plagiarism detection

Two independent models. A multi-source index of 60B+ pages. A report that explains every flag. No black box, no inflated numbers — just honest detection.

Your text
The industrial revolution fundamentally reshaped the relationship between labor and capital, producing tensions that erupted across the next century.
the_industrial_revolutionreshaped_labor_capitaltensions_erupted
MATCH
0.94
Matched source · britannica.com
"The Industrial Revolution fundamentally reshaped the relationship between labor and capital, spawning conflicts that continued through the 20th century."
VerbatimParaphrased
DeepMatch™ engine
1.8s · 1,000 words
99.2%
Plagiarism recall
Verbatim & near-verbatim, internal benchmark
98.5%
Plagiarism precision
Flagged sentences that are true matches
94.0%
AI-text F1 score
GPT-4o, Claude, Gemini, Llama 3 mix
1.8s
Median scan time
Per 1,000 words, including AI pass
How DeepMatch works

Three ideas you will not find in free checkers

01 · Contextual analysis

We read meaning, not words

Most checkers compare strings. We turn every sentence into a semantic vector — a 768-dimensional fingerprint that captures meaning, syntax, and topic. Two sentences with no shared words can still be a match if they mean the same thing.

The phrase "bank" appears in:
A"…sat on the river bank watching…"context: nature
B"…deposit at the central bank…"context: finance
→ Same word, different meaning. We compare meaning, not strings.
02 · Fuzzy matching

Reworded text still gets caught

Swapping a few synonyms used to be enough to fool plagiarism checkers. Our matcher computes similarity at the level of meaning, then verifies with a second-pass alignment. Paraphrased passages and patchwork rewrites surface in the same report.

ORIGINAL
The committee unanimously approved the proposed amendment.
↓ reworded
DRAFT
The board unanimously by full agreement approved passed the suggested change.
String match
22% — would miss this
Fuzzy match
91% — flagged
03 · Conditional scoring

Every match is weighted in context

A flat percentage hides the truth. Our originality score weights each match by type, length, citation status, and how common the phrasing is. A cited quotation barely moves the needle. A verbatim paragraph without a citation moves it a lot.

S1raw 0.62
common phrase
S2raw 0.91
verbatim, no citation
S3raw 0.78
paraphrased
S4raw 0.55
cited correctly
→ Each match is weighted in context. A cited phrase ≠ stolen idea.
The pipeline

From paste to report, step by step

STEP 01

Text preparation

Your text is normalized — case, punctuation, whitespace, and Unicode are cleaned. Sentences are segmented with a tokenizer trained on academic writing, so a citation inside a sentence does not split it.

STEP 02

Semantic fingerprinting

A transformer-based encoder turns every sentence into a 768-dim vector. Fingerprints capture meaning, syntax, and topic, which is what makes paraphrase detection possible.

STEP 03

Source matching

Vectors are compared against our index of web pages, academic publications, and prior submissions. Candidates are ranked by similarity, then verified with a second-pass exact / near-exact alignment.

STEP 04

AI scoring

In parallel, every sentence is scored by an independent AI-detection model that reads perplexity, burstiness, and stylometric signatures from major generators.

STEP 05

Conditional scoring

Matches are deduplicated and weighted in context. Citation status, match type, and phrase commonness all factor in before the final originality score is computed.

STEP 06

Report assembly

You get a sentence-level, color-graded report. Click any flagged sentence to see the source, the match confidence, and what to do about it.

What we scan

Six source layers, one verdict

Free checkers usually scan the open web and stop. We add five more layers, because most plagiarism does not live where it is easy to find.

60B+ pages

Open web

Billions of indexed pages refreshed continuously. Blogs, news, encyclopedias, course pages, forums.

110M+ articles

Academic publications

Journal articles, conference papers, theses, and dissertations from major open-access and licensed repositories.

15M+ titles

Books & reference

Public-domain texts, Project Gutenberg, and licensed reference titles. Where partial-match plagiarism often hides.

Opt-in

Prior submissions

A private institutional store of previously submitted work — only used when an institution has opted in.

14 languages

Cross-language

Aligned corpora across 14 languages. Suspect passages are mapped to a common space to catch translated plagiarism.

40+ years

News archives

Major news archives going back decades. Useful for current-events essays and journalism programs.

AI detection

An independent model, trained on what generators actually produce

The AI detector is a separate model from the plagiarism pipeline. It scores each sentence on perplexity, burstiness, and stylometric signatures we have observed in GPT-4o, Claude, Gemini, Llama, and a long tail of fine-tuned variants.

We retrain regularly as new models appear, and we report a confidence score, not a yes/no — because a yes/no on a single sentence is almost always wrong.

sentence_7.json
// sentence-level output
sentence_id: 7
perplexity: 18.4
burstiness: 0.32
style_signature: gpt-4o-likely
ai_confidence: 0.91
// → flagged with reasons, not just a number
ColorGrade™ feedback

Every sentence, color-coded by match type

A flat percentage is useless if you can't see where. Our report colors every flagged sentence by type — verbatim, paraphrased, missing citation, or common phrase — so you can fix the actual problems instead of guessing.

  • Click any highlight to see the matched source.
  • Filter the report by severity, type, or source.
  • Export as PDF, docx, or share a private link with a reviewer.
Originality report
history_essay_v3.docx
76%
original

The Industrial Revolution began in Britain in the late eighteenth century.

New machinery and the steam engine transformed the production of goods and the organization of labor. The shift from cottage industry to mechanized factories was the catalyst for broader social upheaval.

Workers migrated en masse from rural villages to urban centers, where they encountered conditions both promising and perilous.

Verbatim · 11%Paraphrased · 6%Missing citation · 5%Common phrase · 2%Original · 76%
Privacy & security

Your text is not training data

Not training data

We never use your submissions to train generative AI models. Detection models are trained on licensed and public corpora only.

Free scans are ephemeral

Free-tier text is processed in memory and discarded the moment your result returns. Nothing is written to long-term storage.

Account scans are yours

Account submissions are stored privately under your account so you can revisit reports. Delete any report in one click.

Encrypted everywhere

TLS in transit, AES-256 at rest, role-based access controls, audit logs. Sensitive endpoints are rate-limited and monitored.

60B+
web pages indexed
14
languages supported
99.2%
verbatim recall
1.8s
per 1k words

See DeepMatch™ on your own text

Free, no signup, sentence-level results in under two seconds.

Try the checker