Data Management

Powerful data manipulation and analysis capabilities with pandas-based operations, filtering, transformation, and statistical analysis tools.

Data Management Features

Comprehensive data manipulation capabilities for scientific workflows.

DataFrame Operations

Comprehensive pandas-based data manipulation

  • Select specific columns
  • Filter rows by conditions
  • Sort data by multiple criteria
  • Drop duplicate entries
  • Merge multiple DataFrames
  • Handle missing data

Data Filtering

Advanced filtering and data cleaning capabilities

  • Conditional row filtering
  • Multiple condition combinations
  • Regular expression matching
  • Numerical range filtering
  • Text pattern matching
  • Null value handling

Data Transformation

Data type conversion and structural modifications

  • Data type conversion
  • Column renaming
  • Value mapping and replacement
  • Data normalization
  • Feature scaling
  • Categorical encoding

Statistical Analysis

Built-in statistical functions and calculations

  • Descriptive statistics
  • Correlation analysis
  • Distribution analysis
  • Outlier detection
  • Data sampling
  • Summary statistics

Data Processing Nodes

Essential nodes for data manipulation and transformation.

Select Columns

Choose specific columns from DataFrame

Extract relevant columns for analysis

select_columns

Input Ports

datadata

Input DataFrame

Output Ports

selected_datadata

Filtered DataFrame

Properties

PropertyTypeDefaultDescription
columnsstringColumn names (comma-separated)
keep_orderbooltrueMaintain column order

Filter Rows

Filter DataFrame rows by conditions

Remove unwanted data or focus on specific subsets

filter_rows

Input Ports

datadata

Input DataFrame

Output Ports

filtered_datadata

Filtered DataFrame

Properties

PropertyTypeDefaultDescription
conditionstringFilter condition
case_sensitiveboolfalseCase sensitivity

Merge DataFrames

Combine multiple DataFrames

Combine data from different sources

merge_dataframes

Input Ports

left_datadata

Left DataFrame

right_datadata

Right DataFrame

Output Ports

merged_datadata

Merged DataFrame

Properties

PropertyTypeDefaultDescription
howstringinnerMerge type (inner, outer, left, right)
onstringColumn to merge on

Drop Duplicates

Remove duplicate rows from DataFrame

Clean datasets with duplicate entries

drop_duplicates

Input Ports

datadata

Input DataFrame

Output Ports

clean_datadata

DataFrame without duplicates

Properties

PropertyTypeDefaultDescription
subsetstringColumns to check for duplicates
keepstringfirstWhich duplicate to keep

Common Data Workflows

Reusable patterns for data processing and analysis.

Data Cleaning Pipeline

Complete workflow for cleaning experimental data

Beginner

Workflow Steps:

  1. 1Load raw data file
  2. 2Remove duplicate entries
  3. 3Filter out invalid measurements
  4. 4Handle missing values
  5. 5Normalize data ranges
  6. 6Save cleaned dataset

Required Nodes:

CSV ReaderDrop DuplicatesFilter RowsSelect ColumnsSave DataFrame

Multi-Source Data Integration

Combine data from multiple experimental sources

Intermediate

Workflow Steps:

  1. 1Load data from different files
  2. 2Standardize column names
  3. 3Merge on common identifiers
  4. 4Handle conflicting data
  5. 5Validate merged dataset
  6. 6Export integrated data

Required Nodes:

Multiple File ReadersSelect ColumnsMerge DataFramesFilter RowsSave DataFrame

Quality Control Analysis

Filter and analyze data quality metrics

Intermediate

Workflow Steps:

  1. 1Import measurement data
  2. 2Apply quality thresholds
  3. 3Calculate quality metrics
  4. 4Flag outlier data points
  5. 5Generate quality report
  6. 6Export filtered dataset

Required Nodes:

Data ReaderFilter RowsCalculate StatisticsGenerate ReportSave Results

Best Practices

Tips for efficient and reliable data management workflows.

Data Validation

Always validate your data after major operations

  • Check data types after loading
  • Verify column names and formats
  • Validate data ranges and constraints
  • Monitor for unexpected missing values
  • Use data preview nodes for inspection

Memory Management

Efficient handling of large datasets

  • Filter data early in the workflow
  • Use sampling for large datasets during development
  • Remove unnecessary columns
  • Monitor memory usage in the interface
  • Use streaming for very large files

Error Handling

Robust workflows handle data issues gracefully

  • Add data validation steps
  • Handle missing data appropriately
  • Use conditional logic for data issues
  • Log data processing steps
  • Implement fallback options

Documentation

Document your data processing steps

  • Add comments to complex workflows
  • Save intermediate results for debugging
  • Document data transformations
  • Include data source information
  • Version control your workflows

Ready to Work with Files?

Learn about file I/O operations to import and export data in multiple formats.