Here's the full pipeline, from the moment you upload a file to the chart on your screen.
1. Data is uploaded and parsed in your browser
When you drop a CSV file, JavaScript running in your browser reads every row and column. It detects which columns are numbers and which are text (categorical), and identifies Likert scales, ordinal ranges, and multi-select columns automatically. Nothing has left your computer yet.
Tech: Client-side FileReader API + custom CSV parser. Types are inferred by attempting Number() conversion and scanning for known ordinal patterns (e.g. "Strongly Agree...Strongly Disagree").
2. The raw file and parsed data are stored
Your original CSV goes to R2 (Cloudflare's object storage) as a permanent backup. The parsed JSON goes to KV (Cloudflare's key-value store) for fast retrieval. A project row is created in D1 (Cloudflare's SQL database) to track your progress, settings, and which dataset you're working with.
Tech: R2 key = dataviz/{username}/{datasetId}.csv. KV key = dataviz-data:{username}:{datasetId}. D1 table = projects (one row per student, keyed by username).
3. Pattern scanning runs in your browser
When you click "Scan for Patterns," a rule engine analyzes every variable without any server calls. It computes skewness (mean vs. median), detects outliers (values 3+ standard deviations out), identifies tight clustering, checks for uniform distributions, and looks for correlated pairs. Notable findings appear as colored dots on your variable cards.
Tech: All computation is client-side JavaScript. Insights are stored in state.sweepInsights and rendered as badges on variable cards. No data leaves the browser during scanning.
4. AI interprets the patterns (optional)
If you click "Get AI Guidance," a summary of the pattern scan results (not your raw data) is sent to a Cloudflare Worker, which forwards it to a language model. The AI reads the statistical findings and writes a plain-English interpretation: what's interesting, what to investigate, and which chart types might reveal the story.
Tech: The LLM (Gemma 3 12B) runs locally on a GPU via Ollama, accessed through a Cloudflare Tunnel. It receives only the aggregated findings, never your individual rows. Rate-limited per session.
5. Data cleaning: text becomes numbers
In the Clean Data step, you recode categorical responses (like "Strongly Agree") to numeric values (5, 4, 3, 2, 1). The tool auto-detects common Likert and ordinal patterns and suggests mappings. You accept or customize them. All recoded values are stored and applied to every downstream chart.
Tech: Recode mappings are stored in state.recodeMappings as key-value objects per column. Applied at chart-generation time by the Worker, so original data is never modified.
6. Univariate review: every variable at a glance
The Analyze step computes descriptive statistics for every selected column entirely in your browser: count, mean, median, standard deviation, min, max for numbers; frequency counts, mode, and unique values for categories. A mini chart is drawn for each variable. You confirm which variables to carry forward.
Tech: All computation is client-side JavaScript. Mini charts are independent ECharts instances, created and destroyed as you navigate. Confirming your selection unlocks the Build a Visual tab.
7. Chart generation: your variables, computed on the server
When you pick a chart type and variables, the Worker loads your parsed data from KV and runs the appropriate aggregation: grouping, averaging, cross-tabulation, box-plot quartiles, hierarchical nesting, or flow counting depending on chart type. It returns an ECharts "option" object (a JSON description of the chart) plus a plain-English summary.
Tech: No AI is involved in chart generation. It's deterministic computation in the Worker. The option JSON tells ECharts exactly what to draw: axes, series, labels, colors. The chart is saved to D1 with its analysis_level (bivariate or multivariate).
8. The chart renders in your browser
ECharts (an open-source charting library) reads the option JSON and draws an interactive chart on an HTML canvas. You can hover for tooltips, zoom, and pan. Theme colors, fonts, titles, and toggles from Step 3 are applied as a layer on top.
Tech: ECharts v5, canvas renderer. Themes are applied client-side via applyThemeToOption() which deep-clones the option and injects your color palette, font family, and axis/legend visibility.
9. Thematic analysis: AI + human coding
For open-ended text columns, you can run thematic analysis. The AI reads your responses and generates a codebook of themes (e.g., "Safety concerns," "Community pride"). You review, edit, and approve the themes, then manually tag each response. The tool builds a frequency chart of themes and, when finalized, creates a new categorical variable you can use in charts.
Tech: Codebook generation sends response text to the LLM via the Worker. The coding step is entirely client-side. The new variable is injected into state.parsedData and persists with the project.
10. Everything auto-saves
Project settings (theme, colors, titles, step progress, curated variables, recode mappings) auto-save to D1 after 3 seconds of inactivity. Charts are saved immediately when created. When you log in again, the Worker loads everything back and your browser reconstructs the full UI state, including which analysis stages are unlocked.
Tech: Debounced save (3s) via scheduleSave(). On load, the Worker returns the full project in one response. A project snapshot is automatically saved on each load as a safety net.
What about cost? The language model (Gemma 3) is open-source and runs on hardware the school already owns. Cloudflare's free and paid tiers handle the storage and serverless compute. There are no per-question API fees.