Indigo Data Inc. (Headquarters: Chuo-ku, Tokyo; Representative Director: Yoichi Okura) has released "PigData Scraping AI," a new AI service for corporations specializing in data collection.

This service utilizes a novel scraping method where AI analyzes website structures and collects data while understanding the intent (context) of the information to be acquired.

In recent years, the need to utilize information on the web has rapidly increased for market research, price monitoring, competitor analysis, and new business development. However, traditional scraping methods require individual design and implementation for each site, posing risks of collection stoppage due to UI changes and operational burdens.

"PigData Scraping AI" is a next-generation data collection AI that solves these challenges.

Check Service Details

Features of PigData Scraping AI (Differences from Conventional Methods)

Instead of creating dedicated programs with specific acquisition rules for each site as in conventional methods, PigData Scraping AI extracts necessary information by having the AI read the page structure.

This makes it easier to flexibly collect data while reducing the burden of individual site対応. This mechanism enables PigData Scraping AI to achieve the following three features:

Flexibility to handle any website (Reduced man-hours, Improved speed) Traditionally, it took time to organize specifications and implement individually for each site, leading to delays in launch. With PigData Scraping AI, the AI analyzes the site structure, making it easier to proceed with verification even for highly designed or complex sites. Acquisition items (columns) can also be specified in natural language. As a result, it is possible to make an initial judgment on whether data can be collected from a target site in a short period, speeding up the initial response.

Supports scraping of over 1000 sites (Cost and budget reduction) Traditionally, as the number of target sites increased, design and adjustments accumulated, tending to inflate costs. PigData Scraping AI streamlines verification and setup for a large number of sites, making it easier to plan large-scale surveys and monitoring, and to proceed from PoC (Proof of Concept) to full operation.

Overwhelming stability resistant to changes in target sites (Resistant to operational and UI changes) Traditionally, modifications were required with every UI or structural change, leading to a high operational burden. PigData Scraping AI aims for operations that are less affected by changes by attempting acquisition while the AI analyzes and interprets, contributing to reduced rework and operational load.

View Details

Recommended Use Cases

【For Marketing/Sales Planning Professionals】

Want to continuously monitor price differences of competitors and sales channels, but cannot keep up with checking multiple websites.

Example: Continuously check how prices, benefits, and terms of use for our services and products are listed on official sites, distributor sites, and portal sites, and utilize this information for sales strategies and customer support. However, prices are often on the pricing page, benefits on the campaign page, and conditions on the FAQ or disclaimer pages, with expressions differing by site. Simple checking is not enough; information needs to be organized in a comparable format internally and tracked for changes from the previous time.

Want to regularly collect information for proposal and planning, but internal operations cannot keep up with updates.

Example: Regularly collect information from exhibitions and seminars, list themes, target industries, organizers, etc., to use as material for planning themes. However, there are many target sites and information is spread across multiple pages, making it time-consuming to re-collect each time. It is necessary to update monthly and continuously monitor which themes are gathering needs.

→ PigData Scraping AI enables collection of dispersed information from multiple sites and pages with consistent items, leading to operations that can easily track changes even on frequently updated sites.

【For New Business/Business Development Professionals】

Want to gather information necessary for comparing overseas markets and making market entry decisions, but cannot standardize perspectives by country and organize it into a usable format for business plans.

Example: For overseas expansion or new market entry, it is necessary to collect information such as regulations, implementation periods, scope, and major players from news, government agencies, and research institutions, and compile it into comparison tables or approval documents. However, the format and language of information vary, and it takes time to re-read and reorganize from scratch each time. Furthermore, such information needs to be reconfirmed at key points and differences from the previous time tracked, but one-off organization by generative AI makes it difficult to guarantee the reproducibility of re-acquiring and comparing in the same format.

Want to run PoC but cannot "get started" due to preliminary requirement definition and specification review.

Example: Only need supporting data for approval or board meetings, but the collection design is heavy, delaying the initial steps. Cannot "just try to get it," leading discussions back to abstract theories.

→ PigData Scraping AI makes it easier to assess early on whether information from multiple sites and languages can be aligned with the same criteria for comparison. It allows for quickly creating the minimum data necessary for business decisions first, and if candidate sites increase midway, it becomes easier to add confirmation and reflection under the same conditions, making it easier to respond to updates at key points of PoC and business decisions.

【For Research/Research Staff】

Research design is complete, but data creation (collection/formatting) is stuck.

Example: Want to standardize "common items" for job postings, real estate, stores, events, etc., but site structures differ, increasing manual preprocessing (formatting/matching) and making deadlines tight.

Data cleansing (variation in notation/missing data) is too burdensome to start analysis.

Example: Different ways of writing the same meaning (e.g., location notation, walking time, price range) consume time creating normalization rules and making corrections. Data is not in a state where aggregation is possible.

→ PigData Scraping AI makes it easier to align necessary information into "data in the same format," even for sites with different structures or text-heavy pages.

【For Operations/Product Managers (for ongoing operations)】

Want to provide "external data sources" as an additional feature of the product, but costs increase as collection data is expanded according to customer requests, or there is a shortage of personnel. Example: Want to continuously import information from external sites for search, recommendation, price comparison, inventory/listing status supplementation. While it may start with a limited number of target sites, in reality, customer requests to "see this medium too" or "add this competitor" tend to increase. If individual implementation is required for each addition, development and operations cannot keep up, and the feature does not scale.

Modifications occur with every site change, and maintenance costs accumulate, making it "unprofitable." Example: Each time data acquisition stops due to UI or structural changes, cause investigation, modification, and re-testing are required, diverting product development resources to maintenance. The ROI of data utilization cannot be explained, making continuation difficult.

→ PigData Scraping AI facilitates a design that leans towards continuous collection, less affected by changes, even when building data sources assuming a large number of sites, leading to operations where maintenance costs do not accumulate.

Services Offered

"PigData Scraping AI"

AI for corporations that can quickly and automatically collect desired web data.

URL: https://pig-data.jp/ai-scraping/

Inquiries: 03-3551-7556

Service Details Here

FACT BOX

  • Source: PR TIMES
  • Category: 製品リリース