Every week, thousands of PhD students paste their literature notes, methodology drafts, and unpublished findings into ChatGPT. It feels harmless. The interface is clean, the summaries are good, and the alternative — reading everything yourself and remembering it — is exhausting.
But there is something most researchers do not think about until it is too late.
What actually happens when you paste text into a cloud AI
When you paste content into ChatGPT, Claude, or Gemini, that text leaves your machine. It travels to a server you do not control, processed by a company whose primary business interest is not protecting your intellectual property.
OpenAI's data policy states that by default, conversations may be used to train future models unless you manually opt out — and that opt-out is not retroactive. Content you pasted last month, before you discovered the setting, may already have been logged.
Google's NotebookLM sends everything you upload to Google's infrastructure. If you are an EU-based researcher, this creates genuine GDPR exposure — personal data about research participants, for example, should not leave your jurisdiction without appropriate safeguards.
This is not a hypothetical risk. It is an operational one.
What the NIH and NSF actually say
The US National Institutes of Health is explicit: peer reviewers are prohibited from uploading any content from grant applications to non-approved generative AI tools. Doing so violates NIH peer review confidentiality requirements.
The National Science Foundation goes further, stating that any information uploaded to generative AI tools not behind NSF's own firewall is considered entering the public domain — and that this poses significant risks to researchers' control over their ideas.
For most researchers, this means: if you are reviewing someone else's grant, or writing your own, and you paste any of that content into ChatGPT or NotebookLM, you may be violating the policies of the funding body your entire career depends on.
This is not widely known. Most PhD students find out from a supervisor, after the fact.
The subtler risk: your own unpublished work
Grant review policies are the clearest case, but the softer risk is more common: pasting your own unpublished ideas, methodology, or data into a cloud AI.
Consider a researcher six months from submitting a paper. They paste their draft methodology into ChatGPT to get feedback on the framing. The feedback is useful. The paper is better. But the methodology — not yet published, not yet protected by any formal claim of precedence — has now been processed by an external server.
In most cases, nothing bad happens. But in competitive fields, where scooping is a real concern, the risk is not zero. And the researcher had no reason to take it.
Why researchers keep doing it anyway
Cloud AI tools are significantly better at reasoning and synthesis than local alternatives have historically been. A researcher using ChatGPT to summarise a dense paper gets a genuinely useful output. A researcher using a small local model got, until recently, something considerably worse.
That calculation is changing. Models like Llama 3, Mistral, and Gemma 4 — running locally on a standard research laptop — now produce outputs that are genuinely useful for the tasks researchers actually need: summarising a paper, extracting methodology, cross-referencing claims across sources, answering specific questions about a document.
They are not GPT-4. But for the specific task of working with your own research library — sources you have already curated, papers you have already selected — a local model grounded in your actual documents produces reliable, cited answers. The model cannot hallucinate a source that isn't in your library, because it only has access to your library.
The practical decision
If you are working with:
- Publicly available papers you did not write
- Content with no confidentiality constraints
- General background reading with no institutional affiliation
Cloud AI is probably fine for that. The convenience is real.
If you are working with:
- Your own unpublished writing, methodology, or data
- Grant applications or reviews
- Content involving human participants or sensitive data
- Anything covered by an NDA or institutional policy
The calculation is different. The convenience of a cloud tool is not worth the risk — especially now that local alternatives have become genuinely usable.
The honest version of this advice: know what you are pasting and where it is going. Most researchers do not think about this until someone asks them to. Now you have.