We Asked 4 AIs to Build an AI Skincare Routine. Here's What Happened.
Your AI Skincare Routine Might Be Wrong. Here's Why.
Ask your skincare enthusiast friends or the friend who finally wants to start a proper routine and chances are they have already asked a generative AI to help. I have done it. You type your skin type into a chatbox, describe your concerns, and seconds later you have a ten-step routine complete with product names, ingredient explanations, and instructions delivered with the confidence of a Beauty Advisor. The AI skincare routine is becoming one of the most searched and shared formats in beauty content, and the appeal makes sense. It feels personalized. It feels scientific. It feels like finally getting the answer you have been looking for.
But is it?
We wanted to find out. Not by guessing and not by reading about what AI might recommend, but by running a controlled experiment across four major AI tools and analyzing the outputs through the lens of K-beauty ingredient expertise. What came back was more nuanced than a simple yes or no, and it says a lot about where AI sits in the skincare conversation right now.
Table of contents
First, Let's Talk About What 'AI in Skincare' Means
The phrase gets used to cover a lot of ground, and most of the conversation collapses two very different things into one.
On one end: institutional AI. The kind built and deployed with clinical rigour. L'Oréal's SkinConsult AI, developed after acquiring ModiFace, was trained on thousands of clinical images and validated with dermatologists. Its deep learning algorithms can assess wrinkles, pigmentation and texture across diverse skin tones from a single selfie. Proven Skincare's Skin Genome Project evaluated over 20,000 ingredients and millions of peer-reviewed studies to build its recommendation engine. Shiseido's Optune goes further still, pulling real-time environmental data — UV index, pollution levels, ambient humidity — to adjust your routine dynamically based on what is happening outside your window that day. These systems are fed objective, measurable data about the skin itself. They learn and refine over time. They were built by dermatologists, chemists, and data scientists working in parallel.
On the other end: the chatbox. A general-purpose language model given only what you choose to tell it about yourself, producing a routine in seconds. No imaging. No sensor measurements. No clinical validation. No feedback loop beyond your next message.
A well-crafted prompt still produces a well-crafted guess. Whatever the interface, the limiting factor is always data quality and no prompt engineering changes that.
This is not a reason to dismiss the chatbox. It is a reason to understand what you are working with when you use one. And with AI skincare routines becoming a mainstream behaviour rather than a niche experiment, that understanding matters more than it did a few years ago.
A 2025 peer-reviewed paper published in Cureus on AI in customized skincare noted that while AI-driven tools show genuine promise in personalization, a core limitation remains: most consumer-facing systems still rely on subjective self-assessments, which reduces their reliability compared to objective diagnostic inputs. That gap is exactly what this experiment was designed to explore — not in theory, but in practice.
Why We Ran This Experiment
At K-Beauty Real Gems, the premise we work from is simple: not every product that is popular is the right product for every skin. Korean skincare has an extraordinary depth of formulation knowledge behind it — ingredient-led, skin-type specific, built around understanding what is actually happening in the skin rather than what is trending. That is the curation standard we hold ourselves to.
AI has been creeping into pretty much every corner of daily life for a while now, and skincare is no exception. What caught our attention was something more specific: people turning to generative AI to build their own skincare routines and taking those outputs as gospel. We wanted to put that to the test not to dismiss AI, and not to perform a hit piece. We were genuinely curious: how does an AI skincare routine hold up when you apply the same level of ingredient scrutiny we apply to every product we recommend?
More specifically, we wanted to know whether different AI tools demonstrate meaningful differences in K-beauty ingredient knowledge, formulation logic, and the ability to hold competing skin concerns in tension — because that is where the real complexity lives.
We ran the same prompt, word for word, across four of the most widely used AI tools: ChatGPT, Gemini, Perplexity, and Claude. First output only, no regenerations. Here is what we found.
Meet Maya: Our Test Profile
Maya is 27 and lives in a city that does not give her skin any easy breaks. Think of any dense urban environment where the humidity is medium-high, the pollution is real, and the UV index is not something you can ignore. She works in an office, and anyone who has spent a full day in one knows what that does to skin — recycled air, low humidity, the kind of environment that quietly pulls moisture out without you noticing until your cheeks feel tight by 3pm.
Her skin is combination-oily. The T-zone produces enough oil by midday that she notices it, but her cheeks pull tight and dry, especially in the afternoon. Not an unusual combination, but one that makes product selection genuinely complicated — what keeps her T-zone in check tends to strip her cheeks, and what her cheeks need can feel heavy everywhere else.
Her biggest concern is hormonal acne. It shows up along her chin and jawline, roughly cyclically, and when it goes, it does not go cleanly. It leaves marks — post-inflammatory hyperpigmentation, the kind of dark spots that take months to fade and feel like a second round of the original problem. Fading those marks is her second priority, close behind preventing the next breakout.
The third layer is sensitivity. Her skin is reactive partly by nature, partly the result of a previous skincare phase that involved a bit too much enthusiasm with exfoliants and not enough patience. We have all been there. She paid for it with a compromised barrier, and her skin has a long memory. She is also fragrance-sensitive, and has learned that high concentrations of drying alcohols are a fast track to looking worse than when she started.
She knows the K-beauty basics. She has heard of actives but not confident on when to use them, how to layer them, and what to avoid pairing. She is ready to build a real routine if someone can explain the reasoning, not just the steps.
Maya is probably your friend sitting in your DMs asking for a routine recommendation right now. She is definitely in our audience and she is the profile we handed to four AIs.
The Prompt We Used
The prompt below was submitted identically to all four AIs. No variations, no follow-up messages, no regenerations. Same input, four different outputs, assessed against the same standard.
I am building a K-beauty skincare routine for someone with the following profile. Please read carefully before recommending anything.
Skin type: Combination-oily. Oily T-zone, occasionally dry and tight on the cheeks, particularly in air-conditioned environments.
Climate and environment: Humid, high-pollution urban environment. High daily UV exposure.
Concerns, in order of priority:
- Hormonal acne. Primarily chin and jawline breakouts.
- Post-inflammatory hyperpigmentation (PIH) left by previous breakouts.
- Redness and sensitivity. Skin is reactive and has a compromised barrier. history from previous over-exfoliation
Sensitivities: Reacts poorly to heavy fragrance and high concentrations of drying alcohols.
Routine parameters: Full AM and PM routine. 4 to 6 steps each. The person is intermediate-level — familiar with K-beauty layering basics, knows terms like essence, serum, and actives, but not yet confident combining strong actives.
What I need from you:
- Recommend a complete AM and PM routine, step by step, using real K-beauty products.
- For each product, name the brand and product, and specify the key active ingredients and why they suit this profile.
- Explain the function of each step in the context of this specific skin profile, not generically.
- Flag any ingredient pairings or combinations to avoid in this routine, and explain the conflict.
- Flag any ingredient pairings that work particularly well together in this routine, and explain the synergy.
- Do not recommend anything that would conflict with a reactive, sensitivity-prone skin barrier.
- If you recommend exfoliating actives, specify the type, strength, and frequency, and explain why that is appropriate for this profile.
Please structure your response clearly with AM and PM sections, and include a brief summary at the end of the key ingredient dos and don'ts for this profile. Focus exclusively on K-beauty brands.
The Results
Same prompt, four very different answers. The routines shared a basic structural logic — cleanse, treat, protect — but the moment we looked at ingredient choices, formulation reasoning, and what each AI actually understood about this specific skin profile, the gaps became hard to ignore. Here is where it got interesting.
ChatGPT: Competent, Safe, and Slightly Undershooting on PIH
ChatGPT on ingredient synergies
"Niacinamide + Propolis (Glow Serum) → Oil control + anti-inflammatory+ PIH fading (perfect trio for your concerns)"
ChatGPT produced a well-structured routine with a sensible barrier-first philosophy. The product picks — Beauty of Joseon Glow Serum, COSRX Snail Mucin, Heimish All Clean Balm — are legitimate K-beauty choices with solid formulation rationale behind them. SPF was correctly included as the final AM step, and keeping actives separated across AM and PM is a reasonable call for a reactive skin type.
The more substantive gap is in how ChatGPT handled PIH, Maya's second-ranked concern. It identified PIH fading as a priority, named it confidently in the synergies section, then addressed it with 2% niacinamide via the Beauty of Joseon Glow Serum. Functional, but on the lower end for meaningful brightening work on post-inflammatory hyperpigmentation. Tranexamic acid — which targets melanin synthesis from a more upstream angle — did not appear anywhere in the routine.
Perplexity: The Most Ambitious Routine, and the Most Risky
Perplexity on the exfoliation step
"Use Isntree Chestnut AHA 8% Clear Essence 2-3x weekly (key actives: 8% lactic AHAfrom chestnut, hyaluronic acid; mild chemical exfoliant). Gently clears chin/jaw acnedebris and fades PIH without barrier disruption — lactic's larger molecule suitssensitivity, low frequency prevents over-exfoliation rebound."
Perplexity's live web search creates an appearance of rigour by citing sources throughout. But those sources are beauty blogs and K-beauty retailers, not clinical literature or fact checked sources. The citations add authority to the format while doing relatively little for the substance.
To its credit, Perplexity was the first to recommend tranexamic acid — the Haruharu Wonder Centella 4% TXA Dark Spot Go Away Serum — which is the right call for Maya's PIH concern, and one most people would not think to ask for. TXA interrupts the plasmin pathway that triggers melanin synthesis upstream, rather than managing pigment after the fact. For hormonal and inflammatory PIH, this is the right recommendation.
The problem is what appears alongside it. An 8% lactic AHA in the PM routine for a skin type explicitly flagged as reactive with a compromised barrier history. The justification — that lactic acid's larger molecular size makes it gentler — holds partially true at low concentrations. At 8%, on a barrier already damaged by prior over-exfoliation, it is a real risk. The most potentially harmful recommendation across all four outputs, and the one delivered with the most confidence.
There was also no AM moisturiser in Perplexity's routine — a big miss for any barrier-compromised profile, particularly when managing a polluted, high-UV environment every day.
Gemini: The Most K-Beauty Literate Output
Gemini on the Mugwort + Centella Pairing "Mugwort + Centella: These two are the Redness Rescue duo. Using the Mugwort Essencefollowed by the Centella Ampoule provides a dual-layer anti-inflammatory effect thatcalms reactive skin faster than either ingredient alone."
Gemini's routine was the cleanest editorially and the one that demonstrated the strongest grasp of K-beauty ingredient philosophy. The Mugwort Essence and Centella Ampoule pairing, named and explained as a traditional duo rather than two separately matched actives, is the kind of call that comes from genuinely understanding how these ingredients have been used together in Korean skincare. None of the other AIs made this connection.
The Axis-Y Dark Spot Correcting Glow Serum at 5% niacinamide — with an explicit note that this concentration sits at the effective sweet spot for brightening without irritation — was the most formulation-aware pick across all four routines. SPF correctly included in AM.
The gap: no tranexamic acid, despite PIH being Maya's second-ranked concern. Niacinamide alone may not move the needle fast enough on post-inflammatory pigmentation when the breakouts are still ongoing. Gemini identified the concern but reached only partway toward solving it.
Claude: The Most Accurate — and the Only One That Knew Its Limits
Claude on the limits of topical skincare "With hormonal acne specifically, topical routines manage the symptom but do noteliminate the trigger. If the chin/jawline breakouts are cyclical and persistent, that is worth a conversation with a dermatologist alongside the topical approach— not instead of it."
Claude's output was the most detailed and the most mechanistically precise, with the detail accurate rather than decorative. Where other AIs described what ingredients do, Claude explained how.
The Anua Niacinamide 10% + TXA 4% Serum stood out as the most well-designed product choice across all four outputs, combining both brightening actives into a single step, which is exactly the right call for an intermediate user who should not be stacking separate treatment serums independently.
Claude was also the only AI to proactively flag retinoids: not recommending them, but naming them explicitly as something to avoid adding independently until the barrier is stable, and explaining why they conflict with BHA use in the short term. That kind of forward-thinking serves an intermediate user in a way that a simple routine list does not.
And then, at the very end of the output, a line no other AI included; an acknowledgement that hormonal acne has a ceiling on what topical skincare can achieve, and that a dermatologist conversation is worth having alongside any routine. Small, but significant. It is the only moment across all four outputs where any AI acknowledged it was working with incomplete information.
Four AIs, One Sunscreen and a Popularity Problem
Every single AI recommended sunscreen in the AM routine without prompting. SPF is non-negotiable for Maya's profile (honestly, for everyone), where daily UV exposure is directly worsening her PIH.
Three of the four — ChatGPT, Perplexity, and Claude — independently landed on the same product: the Beauty of Joseon Relief Sun: Rice + Probiotics SPF50+ PA++++. Only Gemini diverged, recommending the Skin1004 Hyalu-Cica Water-Fit Sun Serum instead.
The Beauty of Joseon Relief Sun is genuinely a good sunscreen for this profile. But three independent AI tools converging on a single product is worth pausing on, because it reveals something about how these systems actually work.
Language models are trained on text from the internet. The Beauty of Joseon Relief Sun is one of the most written-about K-beauty sunscreens in English — high number of blog posts, Reddit threads, YouTube reviews, and 'best of' roundups. When an AI is asked for a K-beauty sunscreen for sensitive, acne-prone skin, it surfaces the product most frequently associated with that query in its training data. Popularity, not formulation fit.
An AI recommends what the internet has already decided is popular. Whether that product is actually the best match for your specific skin is a separate question entirely.
This applies far beyond sunscreen. Products with high English-language content volume will consistently outperform better-fit alternatives in AI outputs regardless of actual formulation quality. Newer launches, less commercially visible brands, and products with a smaller English-language footprint get systematically underrepresented, no matter how strong their formulations are. For our audience, this is one of the most practically useful things to take away from this experiment.
Where All Four Hit the Same Wall
The differences between outputs are interesting. The bar they all share is more important.
Every routine was built on the same input: what Maya chose to tell the AI about herself. No skin imaging, no sebum measurement, no pH reading, no way to assess whether her barrier is currently in acute distress or partial recovery.
More critically: no way to know whether what Maya describes as hormonal acne is actually hormonal acne. Fungal acne, perioral dermatitis, and milia all present similarly and are routinely misidentified by non-specialists. These conditions require fundamentally different treatment approaches — fungal acne often worsens with fatty acids in recommended oils. Perioral dermatitis can flare with heavy, occlusive moisturizers. A routine built on self-reported symptoms cannot catch any of this, and recommends products with full confidence regardless.
This is the fundamental difference between a consumer-prompt AI skincare routine and the institutional systems being built inside dermatology and beauty R&D. Clinical-grade tools have access to objective data about the skin itself. The chatbox has access to your description of it. Only one of the four AIs acknowledged this gap at all.
A very detailed prompt is still just a description of what someone thinks is happening on their skin. The AI works with that description, and only that description, every time.
Key Takeaways
AI tool quality varies significantly. Claude and Gemini showed much stronger ingredient logic than ChatGPT and Perplexity for this profile, a gap critical for product decisions.
Formulation accuracy varied significantly. Perplexity's highly confident recommendation of 8% AHA for reactive, barrier-compromised skin poses a genuine risk if used uncritically, showing that confidence does not equal accuracy.
All four AIs correctly and unpromptedly recommended AM sunscreen, showing the strongest consensus. However, product convergence likely stems from content volume, not independent formula evaluation.
AI recommendations skew toward popular, high-volume, English-language products due to internet bias. The most discussed product isn't necessarily the best for your skin, a distinction AI cannot make.
Only one of the four AIs provided genuinely responsible advice, acknowledging its limitations by suggesting a dermatologist for persistent hormonal acne due to incomplete information. This honesty was rare in the experiment.
So, Is AI a Competent Skincare Advisor?
Of the four AI skincare routines we tested, Claude demonstrated the strongest formulation logic and the most accurate ingredient science for Maya's profile. Gemini showed the deepest K-beauty cultural intelligence. ChatGPT produced a competent, well-structured baseline that undercoated on PIH. Perplexity surfaced the most ambitious ingredients but paired them in ways that could cause real problems for a reactive, barrier-compromised skin type.
None of them could see Maya's skin. None of them could distinguish hormonal acne from fungal acne from perioral dermatitis. None of them knew whether her barrier was in acute distress or partial recovery on the day she asked. Only one acknowledged any of this.
Competent, in places. Confident, consistently. Accountable, never.
And that last part is where it gets interesting because AI skincare routines are already a mainstream behaviour, and millions of people are treating these outputs as personalized advice every day. That makes these tools something they have never officially claimed to be: a purchasing influence at scale. An AI that consistently surfaces the same high-volume brands — regardless of whether they are actually the best fit — is quietly influencing what gets discovered, what gets bought, and what gets left behind. With no disclosure. No accountability. No follow-up on whether any of it worked.
The institutional AI being built within dermatology and beauty R&D is moving somewhere more interesting — towards objective skin data, clinical validation, and adaptive feedback loops. As those tools become more consumer-accessible, the distance between 'AI trained on the internet' and 'AI trained on your actual skin' will become increasingly difficult to ignore.
For now, the most useful thing you can do is know which kind of AI you are using. The chatbox is a genuinely good tool for learning vocabulary, understanding ingredient mechanisms, and building a framework before you shop. Take that knowledge into a conversation with someone who has access to more than your description of your own skin — a dermatologist, a beauty advisor who actually knows their formulations, a curation source you trust.
The skincare world has always rewarded the people who ask better questions. AI can help you get there. Just not all the way.
The KRG Routine For Maya
Every product chosen with one brief: Maya's skin, not anyone else's. Barrier first, treatment second, nothing that fights with anything else.

