Technology

DeepMatch — contextual plagiarism detection

Two independent models. A multi-source index of 60B+ pages. A report that explains every flag. No black box, no inflated numbers — just honest detection.

Your text

The industrial revolution fundamentally reshaped the relationship between labor and capital, producing tensions that erupted across the next century.

the_industrial_revolutionreshaped_labor_capitaltensions_erupted

MATCH

0.94

Matched source · britannica.com

"The Industrial Revolution fundamentally reshaped the relationship between labor and capital, spawning conflicts that continued through the 20th century."

VerbatimParaphrased

DeepMatch engine

1.8s · 1,000 words

How DeepMatch works

Three ideas you will not find in free checkers

01 · Contextual analysis

We read meaning, not words

Most checkers compare strings. We turn every sentence into a semantic vector — a 768-dimensional fingerprint that captures meaning, syntax, and topic. Two sentences with no shared words can still be a match if they mean the same thing.

The phrase "bank" appears in:

A"…sat on the river bank watching…"context: nature

B"…deposit at the central bank…"context: finance

→ Same word, different meaning. We compare meaning, not strings.

02 · Fuzzy matching

Reworded text still gets caught

Swapping a few synonyms used to be enough to fool plagiarism checkers. Our matcher computes similarity at the level of meaning, then verifies with a second-pass alignment. Paraphrased passages and patchwork rewrites surface in the same report.

ORIGINAL

The committee unanimously approved the proposed amendment.

↓ reworded

DRAFT

The board ~~unanimously~~ by full agreement ~~approved~~ passed the suggested change.

String match

22% — would miss this

Fuzzy match

91% — flagged

03 · Conditional scoring

Every match is weighted in context

A flat percentage hides the truth. Our originality score weights each match by type, length, citation status, and how common the phrasing is. A cited quotation barely moves the needle. A verbatim paragraph without a citation moves it a lot.

S1raw 0.62

common phrase

S2raw 0.91

verbatim, no citation

S3raw 0.78

paraphrased

S4raw 0.55

cited correctly

→ Each match is weighted in context. A cited phrase ≠ stolen idea.

The pipeline

From paste to report, step by step

STEP 01

Text preparation

Your text is normalized — case, punctuation, whitespace, and Unicode are cleaned. Sentences are segmented with a tokenizer trained on academic writing, so a citation inside a sentence does not split it.

STEP 02

Semantic fingerprinting

A transformer-based encoder turns every sentence into a 768-dim vector. Fingerprints capture meaning, syntax, and topic, which is what makes paraphrase detection possible.

STEP 03

Source matching

Vectors are compared against our index of web pages, academic publications, and prior submissions. Candidates are ranked by similarity, then verified with a second-pass exact / near-exact alignment.

STEP 04

AI scoring

In parallel, every sentence is scored by an independent AI-detection model that reads perplexity, burstiness, and stylometric signatures from major generators.

STEP 05

Conditional scoring

Matches are deduplicated and weighted in context. Citation status, match type, and phrase commonness all factor in before the final originality score is computed.

STEP 06

Report assembly

You get a sentence-level, color-graded report. Click any flagged sentence to see the source, the match confidence, and what to do about it.

What we scan

Six source layers, one verdict

Free checkers usually scan the open web and stop. We add five more layers, because most plagiarism does not live where it is easy to find.

60B+ pages

Open web

Billions of indexed pages refreshed continuously. Blogs, news, encyclopedias, course pages, forums.

110M+ articles

Academic publications

Journal articles, conference papers, theses, and dissertations from major open-access and licensed repositories.

15M+ titles

Books & reference

Public-domain texts, Project Gutenberg, and licensed reference titles. Where partial-match plagiarism often hides.

Opt-in

Prior submissions

A private institutional store of previously submitted work — only used when an institution has opted in.

14 languages

Cross-language

Aligned corpora across 14 languages. Suspect passages are mapped to a common space to catch translated plagiarism.

40+ years

News archives

Major news archives going back decades. Useful for current-events essays and journalism programs.

AI detection

An independent model, trained on what generators actually produce

The AI detector is a separate model from the plagiarism pipeline. It scores each sentence on perplexity, burstiness, and stylometric signatures we have observed in GPT-4o, Claude, Gemini, Llama, and a long tail of fine-tuned variants.

We retrain regularly as new models appear, and we report a confidence score, not a yes/no — because a yes/no on a single sentence is almost always wrong.

sentence_7.json

// sentence-level output

sentence_id: 7

perplexity: 18.4

burstiness: 0.32

ai_confidence: 0.91

// → flagged with reasons, not just a number

Privacy & security

Your text is not training data

Not training data

We never use your submissions to train generative AI models. Detection models are trained on licensed and public corpora only.

Free scans are ephemeral

Free-tier text is processed in memory and discarded the moment your result returns. Nothing is written to long-term storage.

Account scans are yours

Account submissions are stored privately under your account so you can revisit reports. Delete any report in one click.

Encrypted everywhere

TLS in transit, AES-256 at rest, role-based access controls, audit logs. Sensitive endpoints are rate-limited and monitored.

Read the full Privacy Policy

60B+

web pages indexed

languages supported

99.2%

verbatim recall

1.8s

per 1k words

Figures reflect Plagiarism Checker Plus internal benchmarks on our own evaluation sets and may vary with document type and language.

See DeepMatch on your own text

Free, no signup, sentence-level results in under two seconds.

Try the checker