Dec 10, 2024

Onionifying Your News

Ever wanted to create your own parody news generator like The Onion? In this tutorial, we'll build a TypeScript application that scrapes news articles from the web, rewrites them in a more satirical tone using OpenAI's GPT-4 API.

The full code for this project is available here

What We'll Cover

Scraping a News Article: Use Puppeteer alternatives like hyperbrowser to extract content from our news article.
Transforming the Article: Use GPT 4o-mini API to rewrite the articles tone.
Command-Line Integration: Allow users to pass URLs directly through the command line using the commander library.

Step 1: Setting Up the Project

First, initialize a new Node.js project:

mkdir onionify-news && cd onionify-news
yarn init -y
yarn add --dev typescript @types/node @types/marked-terminal
yarn add @hyperbrowser/sdk dotenv marked marked-terminal ora zod commander openai

You might see some extra packages (ora, marked, marked-terminal, ora, zod, and commander) in the package.json. Most of them are purely for making the project look prettier and the project would work absolutely fine without them.

Set up TypeScript by creating a tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "node16",
    "allowJs": true,
    "checkJs": false,
    "outDir": "./build",
    "emitDecoratorMetadata": true,
    "experimentalDecorators": true,
    "sourceMap": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "moduleResolution": "node16"
  }
}

Step 2: Scraping a News Article

We’ll use hyperbrowser to fetch the HTML of the webpage and cheerio to parse and extract its content.

Here’s the scrapeArticle function:

import { z } from "zod";
import hyperbrowser from "@hyperbrowser/sdk";
const hb_client = new hyperbrowser.default({
  apiKey: process.env.HYPERBROWSER_API_KEY as string,
});

const ArticleSchema = z.object({
  title: z.string(),
  body: z.string(),
  author: z.string().optional(),
});
type Article = z.infer<typeof ArticleSchema>;

async function extractArticleFeaturesFromMarkdown(
  text: string
): Promise<Article> {
  spinner.text = "Getting article information from markdown";
  spinner.start();
  try {
    const prompt = `From the provided markdown string, extract the features required by the response format. Stick as close as possible to the provided schema.\n${text}`;

    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        {
          role: "system",
          content:
            "You are a data entry operator whose job is to extract certain features from a peice of text.",
        },
        { role: "user", content: prompt },
      ],
      response_format: zodResponseFormat(ArticleSchema, "article"),
      temperature: 0.7,
      max_tokens: 2000,
    });

    const parsedArticleSchema = ArticleSchema.safeParse(
      JSON.parse(response.choices[0].message.content || "")
    );
    if (parsedArticleSchema.success) {
      spinner.succeed("Got Article information from markdown");
      spinner.stop();
      return {
        title: parsedArticleSchema.data.title,
        body: parsedArticleSchema.data.body,
        author: parsedArticleSchema.data.author,
      };
    } else {
      throw new Error(
        `OpenAI produced response doesn't match expected output schema.\nGot ${response.choices[0].message.content}.\n\nZod Error ${parsedArticleSchema.error}`
      );
    }
  } catch (error) {
    spinner.fail("Could not get article info from markdown");
    spinner.stop();
    console.error("Error generating satirical content:", error);
    throw new Error("Failed to onionify article.");
  }
}

async function scrapeArticle(url: string): Promise<Article> {
  spinner.text = "Getting markdown features for article";
  spinner.start();
  try {
    const jobInfo = await hb_client.startScrapeJob({
      url,
      useProxy: false,
      solveCaptchas: false,
    });

    let checkCount = 0;
    while (checkCount < MAX_CHECKS) {
      const scrapeRes = await hb_client.getScrapeJob(jobInfo.jobId);
      if (scrapeRes.status === "completed") {
        if (scrapeRes.data) {
          spinner.succeed("Succeeded in getting markdown from article");
          spinner.stop();
          return extractArticleFeaturesFromMarkdown(scrapeRes.data?.markdown);
        } else {
          throw new Error(
            "Got undefined when extracing markdown from article. Please check"
          );
        }
      } else if (scrapeRes.status === "failed") {
        throw scrapeRes.error;
      }
      await sleep(1000);
    }
    throw new Error(
      "Exceeded maximum checks for getting markdown for article."
    );
  } catch (err) {
    spinner.fail("Failed in getting markdown from article");
    spinner.stop();

    console.log("Could not get article");
    console.error(err);
    throw err;
  }
}

This function returns the article’s title, body, and author (if available). Step 3: Onionify the article using OpenAI

We'll use OpenAIs API to rewrite the article in a more oniony tone. Install the OpenAI Node.js SDK using:

yarn add openai

Here’s the onionifyArticle function:

import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY, // This is the default and can be omitted
});

async function onionifyArticle(article: Article): Promise<Article> {
  spinner.text = "Onionifying article";
  spinner.start();
  try {
    const prompt = `
    Rewrite the following article as if it were written for a satirical news website like The Onion.
    Use humor, irony, and exaggeration to transform the content while trying to stick closely to the original intent of the article:

    Title: ${article.title}
    Body: ${article.body}
    Author: ${article.author}

    Make sure the headline is absurd or humorous, and add funny commentary in the body.
    `;

    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        {
          role: "system",
          content:
            "You are a humorous and satirical writer writing for the online newspaper `The Onion`.",
        },
        { role: "user", content: prompt },
      ],
      temperature: 0.7,
      max_tokens: 2000,
    });

    const satiricalResponse = response.choices[0].message?.content || "";

    // Split the response into a title and body (adjust parsing as needed)
    const lines = satiricalResponse.split("\n");
    const satiricalTitle = lines[0].replace("Title:", "").trim();
    const satiricalBody = lines.slice(1).join("\n").trim();

    spinner.succeed("Succesfully Onionified article ");
    spinner.stop();

    return {
      title: satiricalTitle,
      body: satiricalBody,
      author: `Parodied version of ${article.author || "Unknown"}`,
    };
  } catch (error) {
    spinner.fail("Could not onionify article");
    spinner.stop();

    console.error("Error generating satirical content:", error);
    throw new Error("Failed to onionify article.");
  }
}

Step 4: Adding Command-Line Integration

We’ll use the commander package to allow users to pass a URL via the CLI. Here’s the entry point for our script:

if (!process.env.OPENAI_API_KEY) {
  console.error("Missing Open AI API Key. Exiting");
  process.exit(1);
}
if (!process.env.HYPERBROWSER_API_KEY) {
  console.error("Missing HyperBrowser API Key. Exiting");
  process.exit(1);
}

const program = new Command();
program
  .version("1.0.0")
  .description("Scrape a news article and onionify it")
  .argument("<url>", "The URL of the news article to scrape")
  .action(async (url: string) => {
    try {
      console.log("Scraping the article...");
      const article = await scrapeArticle(url);

      console.log("\nOriginal Article:");
      console.log("Title:", article.title);

      console.log("Onionifying the article...");
      const onionifiedArticle = await onionifyArticle(article);

      console.log("\n--- Onionified Article ---");
      console.log(marked(onionifiedArticle.title));
      console.log(marked(onionifiedArticle.body));
    } catch (error) {
      // @ts-ignore
      console.error("Error:", error.message);
    }
  });

program.parse(process.argv);

Step 5: Running the Script

Compile the script:

yarn build

Run the compiled JavaScript file, passing the article’s URL:

yarn start "https://example.com/news-article"

Onionifying Your News

What We'll Cover

Step 1: Setting Up the Project

Example Output

Get started today!