How does NPM match package names with GitHub repos?

main
5b3d22afix repo

How does it match NPM package names with their GitHub repo? What happens if the repo is private?

Avatar of anthropic/claude-haiku-4.5
Claude Haiku 4.5
commented

Perfect! Now I have a comprehensive understanding. Let me create a summary based on the source code analysis:

##Summary

Based on my analysis of the opensrc repository source code, here's how it matches NPM package names with their GitHub repos and what happens with private repos:

###How NPM Package Names are Matched to GitHub Repos

Process Flow:

  1. Parse the npm package spec (src/lib/registries/npm.ts):

    • Extracts the package name and optional version (e.g., zod@3.22.0)
  2. Query the NPM Registry (fetchNpmPackageInfo):

    const url = `${NPM_REGISTRY}/${encodeURIComponent(packageName)}`;
    
    • Makes a request to https://registry.npmjs.org/<package-name>
  3. Extract Repository URL (extractRepoUrl):

    • Reads the repository field from the npm package metadata
    • Normalizes various git URL formats:
      • git+https://https://
      • git://https://
      • git+ssh://git@https://
      • Removes .git suffix
      • Expands github: shorthand to full GitHub URLs
  4. Resolve the Version:

    • If no version specified, uses the latest version from npm
    • If version specified, verifies it exists in the registry
    • Determines git tag (tries v1.2.3, then 1.2.3, then plain version)
  5. Clone the Repository (cloneAtTag):

    • Uses git clone --depth 1 --branch <tag> for efficient shallow cloning
    • Removes the .git directory after cloning to save space

###What Happens with Private Repos

GitHub API Checks (src/lib/repo.ts):

When resolving a GitHub repository directly, the code makes an API call to verify the repo exists:

const apiUrl = `https://api.github.com/repos/${owner}/${repo}`;
const response = await fetch(apiUrl, {
  headers: {
    Accept: "application/vnd.github.v3+json",
    "User-Agent": "opensrc-cli",
  },
});

if (!response.ok) {
  if (response.status === 404) {
    throw new Error(
      `Repository "${owner}/${repo}" not found on GitHub. ` +
      `Make sure it exists and is public.`
    );
  }
  if (response.status === 403) {
    throw new Error(
      `GitHub API rate limit exceeded. Try again later or authenticate.`
    );
  }
}

Private Repo Behavior:

  1. For NPM packages with private repos: If an NPM package's repository field points to a private GitHub repo, the tool will:

    • Successfully fetch the repo URL from npm metadata (private repos are still referenced in npm package metadata)
    • Fail during git clone because GitHub requires authentication to clone private repos
    • Return an error: Failed to clone repository: <git error message>
  2. For directly specified private repos (e.g., opensrc owner/private-repo):

    • The GitHub API will return a 404 error (private repos appear as "not found" to unauthenticated requests)
    • The tool shows: Repository "owner/private-repo" not found on GitHub. Make sure it exists and is public.
  3. No built-in authentication: The code doesn't implement GitHub authentication. Private repos would require:

    • Git credentials configured locally (SSH keys or git credentials)
    • GitHub PAT (Personal Access Token) environment variable setup

In summary: opensrc only works with public repositories. Private repos fail with clear error messages, either at the API resolution stage (404 for direct specs) or at the git clone stage (authentication failure for npm packages).

Avatar of sgalanb
Santiago Galán
commented

Give me the fetchNpmPackageInfo, extractRepoUrl and the function used to verify if a version it exists in the registry and get the git tag.

Avatar of anthropic/claude-haiku-4.5
Claude Haiku 4.5
commented

Perfect! Here are the three functions you requested:

##1. fetchNpmPackageInfo

/**
 * Fetch package metadata from npm registry
 */
export async function fetchNpmPackageInfo(
  packageName: string,
): Promise<RegistryResponse> {
  const url = `${NPM_REGISTRY}/${encodeURIComponent(packageName).replace(\"%40\", \"@\")}``;

  const response = await fetch(url, {
    headers: {
      Accept: "application/json",
    },
  });

  if (!response.ok) {
    if (response.status === 404) {
      throw new Error(`Package \"${packageName}\" not found on npm`);
    }
    throw new Error(
      `Failed to fetch package info: ${response.status} ${response.statusText}`,
    );
  }

  return response.json() as Promise<RegistryResponse>;
}

##2. extractRepoUrl

/**
 * Extract repository URL from npm package metadata
 */
export function extractRepoUrl(
  info: RegistryResponse,
  version?: string,
): { url: string; directory?: string } | null {
  // Try to get repo info from specific version first, then fall back to top-level
  const versionInfo = version ? info.versions[version] : null;
  const repo = versionInfo?.repository || info.repository;

  if (!repo?.url) {
    return null;
  }

  let url = repo.url;

  // Normalize git URLs
  // git+https://github.com/user/repo.git -> https://github.com/user/repo
  // git://github.com/user/repo.git -> https://github.com/user/repo
  // git+ssh://git@github.com/user/repo.git -> https://github.com/user/repo
  url = url
    .replace(/^git\+/, "")
    .replace(/^git:\/\//, "https://")
    .replace(/^git\+ssh:\/\/git@/, "https://")
    .replace(/^ssh:\/\/git@/, "https://")
    .replace(/\.git$/, "");

  // Handle GitHub shorthand
  if (url.startsWith("github:")) {
    url = `https://github.com/${url.slice(7)}`;
  }

  return {
    url,
    directory: repo.directory,
  };
}

##3. Functions for Version Verification and Git Tag Determination

There are actually two separate functions that handle this:

###getLatestVersion

/**
 * Get the latest version from registry response
 */
export function getLatestVersion(info: RegistryResponse): string {
  return info["dist-tags"].latest;
}

###Version Existence Check + Git Tag (in resolveNpmPackage)

/**
 * Resolve an npm package to its repository information
 */
export async function resolveNpmPackage(
  packageName: string,
  version?: string,
): Promise<ResolvedPackage> {
  const info = await fetchNpmPackageInfo(packageName);

  // If no version specified, use latest
  const resolvedVersion = version || getLatestVersion(info);

  // Verify the version exists
  if (!info.versions[resolvedVersion]) {
    const availableVersions = Object.keys(info.versions).slice(-5).join(\", \");
    throw new Error(
      `Version \"${resolvedVersion}\" not found for \"${packageName}\". ` +
        `Recent versions: ${availableVersions}`,
    );
  }

  const repo = extractRepoUrl(info, resolvedVersion);

  if (!repo) {
    throw new Error(
      `No repository URL found for \"${packageName}@${resolvedVersion}\". ` +
        `This package may not have its source published.`,
    );
  }

  // Determine git tag - try common patterns
  // Most packages use v1.2.3, some use 1.2.3
  const gitTag = `v${resolvedVersion}`;

  return {
    registry: "npm",
    name: packageName,
    version: resolvedVersion,
    repoUrl: repo.url,
    repoDirectory: repo.directory,
    gitTag,
  };
}

##Key Points:

  1. Version verification happens with: if (!info.versions[resolvedVersion]) - checks if the version exists in the versions object from the npm registry response
  2. Git tag determination is simple: const gitTag = v${resolvedVersion}`` - prepends "v" to the resolved version
  3. The actual git clone later tries multiple fallback tags (see the cloneAtTag function which tries v${version}, version, and then falls back to the default branch if none work)

END OF POST

How does NPM match package names with GitHub repos? — vercel-labs/opensrc