[파이썬] 데이터 필터링과 쿼리

Data filtering and querying are important tasks in data analysis and manipulation. In Python, there are several powerful libraries that provide efficient tools for performing these operations. In this blog post, we will explore some commonly used techniques and libraries for data filtering and querying in Python.

Filtering Data with Pandas

Pandas is a popular data manipulation library in Python that provides easy-to-use functions for filtering data. Let’s say we have a dataset df with columns such as ‘name’, ‘age’, and ‘salary’. Here’s an example of how we can filter the data based on specific criteria:

import pandas as pd

# Filter data based on age
filtered_data = df[df['age'] > 30]

# Filter data based on salary
filtered_data = df[df['salary'] >= 50000]

In the first example, we filter the data to only include rows where the age is greater than 30. Similarly, in the second example, we filter the data to only include rows where the salary is greater than or equal to 50000.

Pandas also provides various other functions such as isin() for filtering based on multiple values and str.contains() for filtering based on string patterns.

Querying Data with SQLalchemy

SQLalchemy is a powerful SQL toolkit and Object-Relational Mapping (ORM) library that allows us to query data from relational databases in Python. Let’s say we have a SQLite database mydb.db with a table called ‘employees’. Here’s an example of how we can query data using SQLalchemy:

from sqlalchemy import create_engine
import pandas as pd

# Create engine and connect to database
engine = create_engine('sqlite:///mydb.db')
connection = engine.connect()

# Query data from the database
query = "SELECT * FROM employees WHERE salary >= 50000"
result = connection.execute(query)

# Fetch the results into a pandas DataFrame
df = pd.DataFrame(result.fetchall(), columns=result.keys())

# Close the database connection
connection.close()

In this example, we create an engine using create_engine() and connect to the SQLite database. We then execute a SQL query using connection.execute() and fetch the results into a pandas DataFrame for further analysis. Finally, we close the database connection.

SQLalchemy supports various other database systems and provides tools for advanced querying such as joins, aggregations, and subqueries.

Conclusion

Data filtering and querying are essential tasks in data analysis and manipulation. Python provides powerful libraries such as Pandas and SQLalchemy that facilitate these operations. In this blog post, we explored some techniques for filtering data with Pandas and querying data with SQLalchemy. These tools can greatly simplify the process of working with data and enable us to extract valuable insights efficiently.

If you want to dive deeper into data filtering and querying, I recommend checking out the documentations of Pandas and SQLalchemy. Happy filtering and querying!