Overview
In plain English
AIDRAN uses source-specific ingestion tasks for public discourse, public article discovery, official releases, developer ecosystem signals, regulatory records, and enrichment watchlists. We do not access private messages, locked accounts, or paywalled article bodies.
AIDRAN’s ingestion layer is made of source-specific adapters. The active source set includes arXiv, Bluesky, Hacker News, Google News, Reddit, Twitter/X, YouTube, Exa, Websets, OpenAlex, and Hugging Face. The corpus also recognizes optional expansion lanes for official web sources, GitHub, package registries, Stack Exchange, regulatory and filing sources, Product Hunt, Mastodon, and GDELT. Those workers write source configuration rows and public record rows while preserving actual publisher or platform attribution. Analysis, signal detection, story generation, and the web app read from that corpus through the Delivery API.
The system is built around public evidence. It does not read private messages, locked accounts, authentication-only pages, or paywalled article bodies. When a public source exposes author handles, bylines, timestamps, links, or engagement metrics, those fields may be stored so AIDRAN can attribute and weigh the record.
Display Buckets
In plain English
The website may show article records as News. That is a display bucket, not a hidden source list.
Source kind and public display label are not always the same thing. The database keeps the upstream source kind on each record, while the web app groups article-category records under a reader-facing News label where that is clearer than naming a discovery or enrichment provider.
Google News is one of the scheduled ingestion sources. arXiv, Exa, Websets, and OpenAlex are also article-category sources. Official web, GitHub, package registries, regulatory sources, Product Hunt, and GDELT are article-category sources too. Hugging Face keeps a distinct Hugging Face label because readers need to distinguish model, paper, and dataset watchlist records from generic article discovery. External articles surfaced during story enrichment may also appear in the same News bucket with the actual publisher domain shown when available. News is therefore a presentation bucket for public article records and web citations, not a separate private feed.
Cadence And Status
In plain English
Ingestion is run by source-specific Cloudflare workflow tasks. Cadence and enabled status are operational settings, not public promises.
Each scheduled ingestion source has its own Cloudflare workflow task and deployment trigger. Cadence, limits, and whether a credentialed source is enabled can change as upstream APIs, quotas, and reliability change. Enrichment article records are created by story-enrichment work or curated imports, not by a public scrape schedule. Delivery exposes source rows with enabled status and recent record volume; public stories and citations are generated from the records and citation links actually present in the corpus.
In plain English
Public subreddit discussions provide structured community-level AI discourse.
Reddit records come from public AI-related subreddit listings. The ingestion worker iterates a maintained subreddit set, skips removed or deleted text, and stores public post fields such as title, text when present, URL, author handle, subreddit, score, comment count, flair, and permalink.
- Record category: Discourse
- Record type: Public posts from AI-related subreddits
- Context stored: Subreddit, link metadata, and public engagement fields
Bluesky
In plain English
Public Bluesky search results capture AT Protocol posts about AI.
Bluesky records come from public AT Protocol search results. The adapter searches for AI-related language, paginates through public posts, and stores text, author handle or DID, URL, language, and public reply, repost, and like counts.
- Record category: Discourse
- Record type: Public posts
- Context stored: Handles, language, and public engagement fields
Hacker News
In plain English
Hacker News provides public technical-community discussion through its read-only API.
Hacker News records come from the public Firebase API. The current worker reads the public top-stories feed, fetches item details, filters dead or deleted items, and stores title, text when present, URL, author, score, descendant count, and item metadata.
- Record category: Discourse
- Record type: Public story items
- Context stored: Points, descendant counts, item type, and public links
Google News
In plain English
Google News is a public RSS discovery source for AI-related articles.
Google News records come from public RSS search results. AIDRAN stores article titles, descriptions from the feed, publisher names when present, publication times, and the stable Google News redirect URL. It does not bypass publisher paywalls or claim to store the full article body from the linked site.
- Record category: Article
- Record type: Public RSS article entries
- Context stored: Publisher, description, publication time, and source URL
YouTube
In plain English
YouTube records capture public video metadata and engagement statistics.
YouTube records come from the YouTube Data API. The worker searches recent public videos about AI and stores video title, description, channel information, publication time, thumbnail URL, public view, like, and comment counts, and tags when the API provides them.
- Record category: Discourse
- Record type: Public video metadata
- Context stored: Channel, description, thumbnail, available tags, and public metrics
arXiv
In plain English
arXiv provides public research preprints in AI-relevant categories.
arXiv records come from the public Atom API. AIDRAN searches AI-relevant categories, including cs.AI, cs.LG, cs.CL, and stat.ML, and stores paper title, abstract, authors, categories, publication time, and canonical arXiv URL.
- Record category: Article
- Record type: Public preprint metadata and abstracts
- Context stored: Authors, categories, abstract, and canonical URL
OpenAlex
In plain English
OpenAlex provides public scholarly works metadata for AI-relevant research records and enrichment.
OpenAlex Works records come from the public OpenAlex API. AIDRAN stores public scholarly metadata such as titles, abstracts or inverted abstracts when available, authorship and venue metadata, DOI or OpenAlex identifiers, publication dates, concepts, and canonical source URLs.
- Record category: Article
- Record type: Public scholarly works metadata
- Context stored: Work identifiers, authorship, venue, concepts, abstract metadata, and source URLs
X (Twitter)
In plain English
Twitter/X captures public recent-search posts when API access is configured.
Twitter/X records come from the API v2 recent-search endpoint when a bearer token is configured. The adapter searches public English-language AI posts, excludes retweets and replies in its query, and stores text, author id, public URL, publication time, language, public metrics, and expanded URLs.
- Record category: Discourse
- Record type: Public recent-search posts
- Context stored: Public metrics, language, linked URLs, and tweet URL
Exa and Websets
In plain English
Public web articles can supplement story context. The site displays them as News with publisher or domain attribution when available.
Story enrichment can use public web article results from Exa and curated Webset article imports when a story needs outside article context. Live Exa results may be stored as article records, and curated Webset entries can be imported as article records. Some cited web sources appear only as external citation links attached to a story rather than as scheduled ingestion rows.
- Record category: Article
- Record type: Public web article results and curated public article entries
- Context stored: Publisher or domain, title, snippet or excerpt, URL, publication date when available, and provider metadata
Hugging Face
In plain English
Hugging Face records track public AI model, paper, and dataset pages without treating them as generic News.
Hugging Face records come from public watchlist targets on huggingface.co, such as model, paper, organization, and dataset pages relevant to AI discourse. AIDRAN stores public page metadata, titles, URLs, timestamps when available, and provider metadata needed for attribution.
- Record category: Article
- Record type: Public AI repository and research watchlist items
- Context stored: Title, URL, public page metadata, and provider metadata
Official Web
In plain English
Official web sources track public release notes, changelogs, model cards, docs, and policy pages from configured publishers.
Official web records come from configured public feeds, sitemaps, and pages for AI labs, companies, standards bodies, and product teams. AIDRAN stores public titles, summaries, canonical URLs, publisher names or domains, timestamps when available, and source metadata needed to distinguish the configured official source from the linked publisher.
- Record category: Article
- Record type: Public release, documentation, policy, model-card, and changelog pages
- Context stored: Publisher, configured source, URL, summary, and publication or update time when available
GitHub
In plain English
GitHub records track public release metadata from configured repositories.
GitHub records come from configured public repositories. The worker currently focuses on release metadata, storing release names, tags, URLs, authors when present, publication times, and bounded release-note text from the public API.
- Record category: Article
- Record type: Public repository release metadata
- Context stored: Repository, release tag, author, URL, and release-note excerpt
Package Registries
In plain English
Package registry records track public npm and PyPI package release metadata from configured watchlists.
Package registry records come from configured npm and PyPI packages relevant to AI infrastructure. AIDRAN stores public package names, versions, descriptions, release times when available, registry URLs, and provider metadata. It does not install packages or inspect private registry content.
- Record category: Article
- Record type: Public package and version metadata
- Context stored: Registry, package, version, URL, description, and publication time when available
Stack Exchange
In plain English
Stack Exchange records capture public technical Q&A about AI and developer tooling.
Stack Exchange records come from public search results on configured Stack Exchange sites. AIDRAN stores question titles, bounded public excerpts or body text when returned by the API, canonical question URLs, author display names, tags, scores, answer counts, and accepted-answer status.
- Record category: Discourse
- Record type: Public Q&A questions
- Context stored: Site, tags, scores, answer counts, public author display name, and canonical URL
Regulatory and Filings
In plain English
Regulatory records track public filings, notices, standards, and official RSS sources.
Regulatory records come from configured public sources such as the Federal Register, SEC EDGAR company filings, and official RSS feeds. AIDRAN stores titles, summaries, agency or company names, forms or docket identifiers when present, canonical URLs, and publication times.
- Record category: Article
- Record type: Public filings, notices, standards, and official records
- Context stored: Agency or company, document identifiers, URL, summary, and publication time
Product Hunt
In plain English
Product Hunt is an opt-in launch metadata source and remains restricted until API/commercial-use review approves activation.
Product Hunt records, when explicitly enabled, come from the Product Hunt GraphQL API and describe public product launches. The task is disabled by default and requires configured credentials plus operator approval. Raw Product Hunt records are not served to customer API keys until API and commercial-use review approves that access.
- Record category: Article
- Record type: Public launch metadata
- Context stored: Product name, tagline, launch URL, maker names, topics, rankings, votes, and publication time
Mastodon
In plain English
Mastodon records capture public Fediverse posts from configured accounts, instances, or hashtags.
Mastodon records come from public instance API responses for configured accounts and hashtags. AIDRAN stores post text, canonical URLs, account identifiers, publication time, language when present, and public reply, favorite, and reblog counts.
- Record category: Discourse
- Record type: Public Fediverse posts
- Context stored: Instance, account or hashtag, URL, language, and public engagement fields
GDELT
In plain English
GDELT records broaden public article discovery while preserving publisher or domain attribution.
GDELT records come from public GDELT DOC article-list results for configured AI-related queries. AIDRAN stores article titles, URLs, publisher or domain attribution, language, source country when available, and GDELT metadata. The linked publisher remains the attributed source for the article.
- Record category: Article
- Record type: Public article discovery metadata
- Context stored: Publisher or domain, URL, language, source country, and discovery-query metadata
What We Don't Collect
In plain English
No private messages, locked accounts, or paywalled article bodies. Public attribution fields may be stored when the source provides them.
- Private messages, DMs, or non-public account content
- Content behind authentication walls that the public cannot access
- Paywalled article bodies or paywall bypasses
- Reader behavior profiles or user-submitted private material
- Content from locked or private accounts
- Private identity enrichment beyond public handles, bylines, and source attribution fields
Content Removal
In plain English
If your public post or article metadata appears in AIDRAN and you want it removed, send us the original URL or source identifier.
If you are the author or rights holder for content that appears in AIDRAN and would like it removed, please contact us at privacy@aidran.ai with a link to the original content or enough source information for us to identify the record. We will process removal requests within 30 days.