The client needed to create a web scraping
service to collect detailed information, including prices, times, and bonus
points (miles), from one of the airline platforms.
CHALLENGE
The infrastructure of the platform that needed
to be parsed was based on complex routing mechanisms with cookie control and
additional bot protection. To overcome these restrictions, both standard and
real-person emulations were needed. Since the infrastructure served a load of
up to 50 000 requests per day, it needed a more resource-intensive solution.
SOLUTION
To receive flight ticket data from the airline
platform in the customer’s requested specifications, we developed API Gateway
that trigger the lambdd sea function to ensure all the scraping processes.
After web scraping lambda stores sessions data, accounts data, input/output
requests and execution logs in the DynamoDB tables. Then the collected tickets
data back to the customer.
RESULTS & ADVANTAGES
As a result, our customer received a reliable
web scraping tool, allowing him to quickly and efficiently use the information
with the parameters he needs.
TECHNOLOGIES
MailSlurp, SmartProxy, Undetected-chromedriver. The project would be stored and executed from
the AWS platform. We are using DynamoDB table to store our data.