Insights from Petabytes
Peng-Chi Fang discusses his summer internship with CCC Intelligent Solutions and his project investigating a new platform that would help process massive amounts of data.
To understand the volume of data Peng-Chi Fang (MSIT '23) was dealing with this summer, you first have to understand what a petabyte is.
There are bytes, megabytes, gigabytes, and terabytes. Far down the line, there is the petabyte, equal to 1,000 terabytes, or approximately 20 million filled, 5-foot-tall filing cabinets or 500 billion standard pages of text.
That’s the amount of data Fang, a student in Northwestern Engineering's Master of Science in Information Technology (MSIT) program, was focused on during his internship at CCC Intelligent Solutions (CCCIS).
CCCIS specializes in using technology and data to help provide valuable insights to auto insurance companies and collision repair centers. The company's goal is to give actionable insight to help an insurance company determine who is at fault in an accident and how much it should pay to fix the car.
“CCCIS stands out as a software solutions company with a technology-driven culture and a substantial data team of skilled data engineers,” Fang said. “The chance to learn from experienced professionals, immerse myself in the company culture, and explore the advanced data infrastructure appealed to me.”
Fang’s main project this summer involved investigating whether integrating AWS Redshift, an Amazon creation that enables companies to sort through massive amounts of data in one command, would make sense for CCCIS. That’s where the petabytes come in.
Redshift allows the processing of up to 16 petabytes of data in one query, compared to its predecessor Amazon RDS Aurora's maximum size of 128 terabytes.
That data processing speed is vital to a business such as CCCIS, a software-as-a-service (SaaS) company whose mission is to give auto insurers what they need to address claims and, ultimately, get drivers back on the road in their vehicles as quickly as possible.
Fang used a variety of data engineering tools and cloud solutions to build streaming and batch data pipelines in his internship, skills he said he was prepared to take on thanks to MSIT.
“The program's comprehensive course offerings attracted me initially,” he said. “It features not only technology-focused courses but also valuable business and management classes.”
The most valuable MSIT offering to his success with CCCIS was the ability to design an independent study course and work directly with a faculty member, Fang said. He chose the topic "Designing Data-Intensive Applications," which he said significantly enhanced his understanding of underlying design principles when working with a variety of data tools.
The internship itself gave Fang equally valuable lessons he has now taken back into the MSIT classroom.
“The most significant lesson I gained is understanding the comprehensive evaluation process of new solutions conducted by the team,” he said. “This includes assessing performance, feasibility, and cost aspects, providing the key factors considered from a company’s perspective when adopting new solutions.”
Fang said he plans on taking all the lessons learned through the MSIT program, including his internship, into a post-graduation career as a data engineer or data solution architect.
The MSIT program has been all that he’s hoped for in boosting his career potential.
“These experiences have broadened my perspective,” he said. “They have shown me the technical skills and interpersonal capabilities required for my future career growth.”